I’m looking to transpose pandas columns and apply a Groupby
df = pd.DataFrame({'ID' : ['ID1', 'ID2', 'ID3', 'ID4'], 'Code1' : ['X60', np.nan, 'X66', np.nan], 'Code2' : [np.nan, 'X64', 'X78', np.nan], 'Code3' : [np.nan, 'X66', 'X81', 'X59'], 'Code4' : [np.nan, np.nan, 'X38', 'X60']}) df ID Code1 Code2 Code3 Code4 0 ID1 X60 NaN NaN NaN 1 ID2 NaN X64 X66 NaN 2 ID3 X66 X78 X81 X38 3 ID4 NaN NaN X59 X60
How can I achieve this expected output ?
Code NB ID X38 1 ID3 X59 1 ID4 X60 2 ID1, ID4 X64 1 ID2 X66 2 ID2, ID3 X78 1 ID3 X81 1 ID3
Advertisement
Answer
Use DataFrame.stack
for reshape with remove missing values and count values by Series.value_counts
, last Series.sort_index
with Series.rename_axis
and
Series.reset_index
for 2 columns DataFrame:
df = df.stack().value_counts().sort_index().rename_axis('Code').reset_index(name='NB') print (df) Code NB 0 X38 1 1 X59 1 2 X60 2 3 X64 1 4 X66 2 5 X78 1 6 X81 1
EDIT: Use DataFrame.melt
and then aggregate by size
and join
in GroupBy.agg
:
df = (df.melt('ID', value_name='Code') .groupby('Code', as_index=False) .agg(NB=('Code','size'), ID=('ID',', '.join))) print (df) Code NB ID 0 X38 1 ID3 1 X59 1 ID4 2 X60 2 ID1, ID4 3 X64 1 ID2 4 X66 2 ID3, ID2 5 X78 1 ID3 6 X81 1 ID3