My df has USA states-related information. I want to rank the states based on its contribution.
My code:
df State Value Year 0 FL 100 2012 1 CA 150 2013 2 MA 25 2014 3 FL 50 2014 4 CA 50 2015 5 MA 75 2016
Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity
df State Value Year State_Capa. Rank 0 FL 100 2012 150 2 1 CA 150 2013 200 1 2 MA 25 2014 100 3 3 FL 150 2014 200 2 4 CA 50 2015 200 1 5 MA 75 2016 100 3
My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.
state_capacity = df[['State','Value']].groupby(['State']).sum()
df['State_Capa.'] = df['State'].map(dict(state_cap))
    df
       State  Value Year   State_Capa.  
    0   FL     100   2012    NaN        
    1   CA     150   2013    NaN         
    2   MA     25    2014    NaN         
    3   FL     50    2014    NaN         
    4   CA     50    2015    NaN         
    5   MA     75    2016    NaN 
    
Advertisement
Answer
Try with transform then rank
df['new'] = df.groupby('State').Value.transform('sum').rank(method='dense',ascending=False)
Out[42]: 
0    2.0
1    1.0
2    3.0
3    2.0
4    1.0
5    3.0
Name: Value, dtype: float64