My df has USA states-related information. I want to rank the states based on its contribution.
My code:
df State Value Year 0 FL 100 2012 1 CA 150 2013 2 MA 25 2014 3 FL 50 2014 4 CA 50 2015 5 MA 75 2016
Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity
df State Value Year State_Capa. Rank 0 FL 100 2012 150 2 1 CA 150 2013 200 1 2 MA 25 2014 100 3 3 FL 150 2014 200 2 4 CA 50 2015 200 1 5 MA 75 2016 100 3
My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.
state_capacity = df[['State','Value']].groupby(['State']).sum()
df['State_Capa.'] = df['State'].map(dict(state_cap))
df
State Value Year State_Capa.
0 FL 100 2012 NaN
1 CA 150 2013 NaN
2 MA 25 2014 NaN
3 FL 50 2014 NaN
4 CA 50 2015 NaN
5 MA 75 2016 NaN
Advertisement
Answer
Try with transform then rank
df['new'] = df.groupby('State').Value.transform('sum').rank(method='dense',ascending=False)
Out[42]:
0 2.0
1 1.0
2 3.0
3 2.0
4 1.0
5 3.0
Name: Value, dtype: float64