My df has USA states-related information. I want to rank the states based on its contribution.
My code:
df State Value Year 0 FL 100 2012 1 CA 150 2013 2 MA 25 2014 3 FL 50 2014 4 CA 50 2015 5 MA 75 2016
Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity
df State Value Year State_Capa. Rank 0 FL 100 2012 150 2 1 CA 150 2013 200 1 2 MA 25 2014 100 3 3 FL 150 2014 200 2 4 CA 50 2015 200 1 5 MA 75 2016 100 3
My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.
state_capacity = df[['State','Value']].groupby(['State']).sum() df['State_Capa.'] = df['State'].map(dict(state_cap)) df State Value Year State_Capa. 0 FL 100 2012 NaN 1 CA 150 2013 NaN 2 MA 25 2014 NaN 3 FL 50 2014 NaN 4 CA 50 2015 NaN 5 MA 75 2016 NaN
Advertisement
Answer
Try with transform
then rank
df['new'] = df.groupby('State').Value.transform('sum').rank(method='dense',ascending=False) Out[42]: 0 2.0 1 1.0 2 3.0 3 2.0 4 1.0 5 3.0 Name: Value, dtype: float64