Python Dataframe Sum and Rank the Rows based on th…

My df has USA states-related information. I want to rank the states based on its contribution.

My code:

df
   State  Value Year
0   FL     100   2012
1   CA     150   2013
2   MA      25   2014
3   FL     50    2014
4   CA     50    2015
5   MA     75    2016

Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity

df
   State  Value Year   State_Capa.  Rank
0   FL     100   2012    150         2
1   CA     150   2013    200         1
2   MA     25    2014    100         3
3   FL     150    2014    200         2
4   CA     50    2015    200         1
5   MA     75    2016    100         3

My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.

state_capacity = df[['State','Value']].groupby(['State']).sum()
df['State_Capa.'] = df['State'].map(dict(state_cap))
    df
       State  Value Year   State_Capa.  
    0   FL     100   2012    NaN        
    1   CA     150   2013    NaN         
    2   MA     25    2014    NaN         
    3   FL     50    2014    NaN         
    4   CA     50    2015    NaN         
    5   MA     75    2016    NaN

Answer

Try with transform then rank

df['new'] = df.groupby('State').Value.transform('sum').rank(method='dense',ascending=False)
Out[42]: 
0    2.0
1    1.0
2    3.0
3    2.0
4    1.0
5    3.0
Name: Value, dtype: float64

Python Dataframe Sum and Rank the Rows based on the group they belong

Advertisement

Answer