Evening, I have a dataframe that I want to reshape. there are duplicate id vars for some columns, and i want the duplicate values to appear as new rows
my data looks like this, and i want to have the ids as a row, with the group as column, and the choices as the values. if there are multiple choices picked per id within a group, then the row should be replicated as shown below. when I use pivot I end up just getting the mean or sum of the combined values e.g. 11.5 for id i1, group1. all tips very welcome thank you
JavaScript
x
7
1
import pandas as pd
2
import numpy as np
3
4
df = pd.DataFrame({'id': ['i1','i1','i1','i2','i2','i2','i2','i2','i3','i3'],
5
'group': ['group1','group1','group2','group3','group1','group2','group2','group3','group1','group2'],
6
'choice':[12,11,12,14,11,19,9,7,8,9]})
7
JavaScript
1
5
1
pd.DataFrame({'id': ['i1','i1','i2','i2','i3'],
2
'group1': [12,11,11,np.nan,8],
3
'group2': [12,np.nan,19,9,9],
4
'group3':[np.nan,np.nan,14,7,np.nan]})
5
Advertisement
Answer
Use GroupBy.cumcount
with Series.unstack
and DataFrame.droplevel
:
JavaScript
1
15
15
1
g = df.groupby(['id','group']).cumcount().add(1)
2
3
df = (df.set_index(['id','group', g])['choice']
4
.unstack(level=1)
5
.droplevel(level=1)
6
.rename_axis(None,axis=1)
7
.reset_index())
8
print (df)
9
id group1 group2 group3
10
0 i1 12.0 12.0 NaN
11
1 i1 11.0 NaN NaN
12
2 i2 11.0 19.0 14.0
13
3 i2 NaN 9.0 7.0
14
4 i3 8.0 9.0 NaN
15