Skip to content
Advertisement

Trying to stack several groups of columns into multiple target columns by name

I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns.

id AA_F1R1 BB_F1R1 AA_F1R2 BB_F1R2 AA_F2R1 BB_F2R2 AA_F7R25 BB_F7R25
001 5 xy xx xx zy 1 4 xx
002 6 zzz yyy zzz xw 2 zzz 3 zzz

I found two different solutions that seem to work but for me is giving an error. Not sure if they work with NaN values.

pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')
Output:
0 rows × 1773 columns

Another solution I tried was

df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()

Output:
150677 rows × 2 columns 

the problem with this last one is I couldn’t keep the columns I wanted.

I appreciate any inputs!

Advertisement

Answer

Use suffix=r'w+' parameter in wide_to_long:

df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')

In second solution add dropna=False to DataFrame.stack:

df.set_index('id', inplace=True)
df.columns = df.columns.str.split("_", expand=True)
df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement