I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns. id AA_F1R1 BB_F1R1 AA_F1R2 BB_F1R2 ... AA_F2R1 BB_F2R2 ... AA_F7R25 BB_F7R25 001 5 xy xx xx zy 1 4 xx 002 6 zzz yyy zzz xw 2 zzz 3 zzz I found two different solutions that

Trying to stack several groups of columns into multiple target columns by name

I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns.

id	AA_F1R1	BB_F1R1	AA_F1R2	BB_F1R2	…	AA_F2R1	BB_F2R2	…	AA_F7R25	BB_F7R25
001	5	xy	xx	xx	zy	1		4	xx
002	6	zzz	yyy	zzz	xw	2	zzz	3	zzz

I found two different solutions that seem to work but for me is giving an error. Not sure if they work with NaN values.

pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')
Output:
0 rows × 1773 columns

JavaScript
​x
 
pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')
Output:
0 rows × 1773 columns
​

Another solution I tried was

df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()

Output:
150677 rows × 2 columns

JavaScript
 
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()
​
Output:
150677 rows × 2 columns 
​

the problem with this last one is I couldn’t keep the columns I wanted.

I appreciate any inputs!

Answer

Use suffix=r'w+' parameter in wide_to_long:

df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')

JavaScript
 
df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+')
  .reset_index()
  .drop('dropme', axis=1)
  .sort_values('id')
​

In second solution add dropna=False to DataFrame.stack:

df.set_index('id', inplace=True)
df.columns = df.columns.str.split("_", expand=True)
df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()

JavaScript
 
df.set_index('id', inplace=True)
df.columns = df.columns.str.split("_", expand=True)
df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()
​

Advertisement

Answer