I am trying to create a dummy variable that takes on the value of “1” if it has the largest democracy value in a given year using pandas. I have tried numerous iterations of code, but none of them accomplish what I am trying to do. I will provide some code and discuss the errors that I am dealing with. Also, for further transparency, I am trying to replicate an R document using tidyverse in Python. Here is what my R code looks like (which generates the variable just fine):
merged_data <- merged_data %>% group_by(year) %>% mutate(biggest_democ = if_else(v2x_regime == max(v2x_regime), 1, 0)) %>% ungroup()
As stated before, this works just fine in R, but I cannot replicate this in Python. Here are some of the lines of code that I am running into issues with:
merged_data = merged_data.assign(biggest_democ = np.where(merged_data['v2x_regime'].max(), 1, 0).groupby('year'))
This just comes up with the error:
“AttributeError: ‘numpy.ndarray’ object has no attribute ‘groupby'”
I have tried other iterations as well but they result in the same error.
I would appreciate any and all help!
Advertisement
Answer
Here’s one approach using groupby
, transform
, and a custom lambda function. Not sure if the example data I made matches your situation
import pandas as pd merged_data = pd.DataFrame({ 'country':['A','B','C','A','B','C'], 'v2x_regime':[10,20,30,70,40,50], 'year':[2010,2010,2010,2020,2020,2020], }) merged_data['biggest_democ'] = ( merged_data .groupby('year')['v2x_regime'] .transform( lambda v: v.eq(v.max()) ) .astype(int) ) merged_data
Output