Skip to content
Advertisement

Creating a Dummy Variable Using Groupby and Max Functions With Pandas

I am trying to create a dummy variable that takes on the value of “1” if it has the largest democracy value in a given year using pandas. I have tried numerous iterations of code, but none of them accomplish what I am trying to do. I will provide some code and discuss the errors that I am dealing with. Also, for further transparency, I am trying to replicate an R document using tidyverse in Python. Here is what my R code looks like (which generates the variable just fine):

merged_data <- merged_data %>% group_by(year) %>% mutate(biggest_democ = if_else(v2x_regime == max(v2x_regime), 1, 0)) %>% ungroup()

As stated before, this works just fine in R, but I cannot replicate this in Python. Here are some of the lines of code that I am running into issues with:

merged_data = merged_data.assign(biggest_democ = np.where(merged_data['v2x_regime'].max(), 1, 0).groupby('year'))

This just comes up with the error:

“AttributeError: ‘numpy.ndarray’ object has no attribute ‘groupby'”

I have tried other iterations as well but they result in the same error.

I would appreciate any and all help!

Advertisement

Answer

Here’s one approach using groupby, transform, and a custom lambda function. Not sure if the example data I made matches your situation

import pandas as pd

merged_data = pd.DataFrame({
    'country':['A','B','C','A','B','C'],
    'v2x_regime':[10,20,30,70,40,50],
    'year':[2010,2010,2010,2020,2020,2020],
})

merged_data['biggest_democ'] = (
    merged_data
    .groupby('year')['v2x_regime']
    .transform(
        lambda v: v.eq(v.max())
    )
    .astype(int)
)

merged_data

Output

enter image description here

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement