My df has USA states-related information. I want to rank the states based on its contribution.
My code:
JavaScript
x
9
1
df
2
State Value Year
3
0 FL 100 2012
4
1 CA 150 2013
5
2 MA 25 2014
6
3 FL 50 2014
7
4 CA 50 2015
8
5 MA 75 2016
9
Expected Answer: Compute state_capacity by summing state values from all years. Then Rank the States based on the state capacity
JavaScript
1
9
1
df
2
State Value Year State_Capa. Rank
3
0 FL 100 2012 150 2
4
1 CA 150 2013 200 1
5
2 MA 25 2014 100 3
6
3 FL 150 2014 200 2
7
4 CA 50 2015 200 1
8
5 MA 75 2016 100 3
9
My approach: I am able to compute the state capacity using groupby. I ran into NaN when mapped it to the df.
JavaScript
1
13
13
1
state_capacity = df[['State','Value']].groupby(['State']).sum()
2
df['State_Capa.'] = df['State'].map(dict(state_cap))
3
df
4
State Value Year State_Capa.
5
0 FL 100 2012 NaN
6
1 CA 150 2013 NaN
7
2 MA 25 2014 NaN
8
3 FL 50 2014 NaN
9
4 CA 50 2015 NaN
10
5 MA 75 2016 NaN
11
12
13
Advertisement
Answer
Try with transform
then rank
JavaScript
1
10
10
1
df['new'] = df.groupby('State').Value.transform('sum').rank(method='dense',ascending=False)
2
Out[42]:
3
0 2.0
4
1 1.0
5
2 3.0
6
3 2.0
7
4 1.0
8
5 3.0
9
Name: Value, dtype: float64
10