I have the following dataframe:
| Location | Student Name |
|---|---|
| D | Amy |
| D | Raj |
| E | Mitch |
| F | Poo |
| F | Mack |
I am trying to generate the following dataframe:
| Location | Student Name |
|---|---|
| D | Amy |
| D | Raj |
| Total Students at D | 2 |
| E | Mitch |
| Total Students at E | 1 |
| F | Poo |
| F | Mack |
| Total Students at F | 2 |
| Grand Total | 5 |
How do I do that?
Advertisement
Answer
I will offer a solution without loops.
df = pd.DataFrame({'Location':['D','D','E','F','F'],
'Student Name':['Amy', 'Raj', 'Mitch', 'Poo', 'Mack']})
df1 = df.groupby('Location', as_index = False).agg({'Student Name':'count'})
df1['Location'] = df1['Location'].apply(lambda row : row + 'Total' )
df2 = pd.concat([df, df1]).sort_values(by = 'Location')
df2['Location'] = df2['Location'].apply(lambda x : 'Total Students at ' + x[:len(x)-len('Total')] if x.endswith('Total') else x)
df2 = df2.reset_index()
df2.drop(['index'], axis = 1, inplace = True)
df2 = df2.append({'Location' : 'Grand Total', 'Student Name' : df1['Student Name'].sum()}, ignore_index = True)
Output :
df2
Location Student Name
0 D Amy
1 D Raj
2 Total Students at D 2
3 E Mitch
4 Total Students at E 1
5 F Poo
6 F Mack
7 Total Students at F 2
8 Grand Total 5