I’m looking to transpose pandas columns and apply a Groupby
JavaScript
x
13
13
1
df = pd.DataFrame({'ID' : ['ID1', 'ID2', 'ID3', 'ID4'],
2
'Code1' : ['X60', np.nan, 'X66', np.nan],
3
'Code2' : [np.nan, 'X64', 'X78', np.nan],
4
'Code3' : [np.nan, 'X66', 'X81', 'X59'],
5
'Code4' : [np.nan, np.nan, 'X38', 'X60']})
6
df
7
8
ID Code1 Code2 Code3 Code4
9
0 ID1 X60 NaN NaN NaN
10
1 ID2 NaN X64 X66 NaN
11
2 ID3 X66 X78 X81 X38
12
3 ID4 NaN NaN X59 X60
13
How can I achieve this expected output ?
JavaScript
1
9
1
Code NB ID
2
X38 1 ID3
3
X59 1 ID4
4
X60 2 ID1, ID4
5
X64 1 ID2
6
X66 2 ID2, ID3
7
X78 1 ID3
8
X81 1 ID3
9
Advertisement
Answer
Use DataFrame.stack
for reshape with remove missing values and count values by Series.value_counts
, last Series.sort_index
with Series.rename_axis
and
Series.reset_index
for 2 columns DataFrame:
JavaScript
1
11
11
1
df = df.stack().value_counts().sort_index().rename_axis('Code').reset_index(name='NB')
2
print (df)
3
Code NB
4
0 X38 1
5
1 X59 1
6
2 X60 2
7
3 X64 1
8
4 X66 2
9
5 X78 1
10
6 X81 1
11
EDIT: Use DataFrame.melt
and then aggregate by size
and join
in GroupBy.agg
:
JavaScript
1
13
13
1
df = (df.melt('ID', value_name='Code')
2
.groupby('Code', as_index=False)
3
.agg(NB=('Code','size'), ID=('ID',', '.join)))
4
print (df)
5
Code NB ID
6
0 X38 1 ID3
7
1 X59 1 ID4
8
2 X60 2 ID1, ID4
9
3 X64 1 ID2
10
4 X66 2 ID3, ID2
11
5 X78 1 ID3
12
6 X81 1 ID3
13