I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns.
| id | AA_F1R1 | BB_F1R1 | AA_F1R2 | BB_F1R2 | … | AA_F2R1 | BB_F2R2 | … | AA_F7R25 | BB_F7R25 |
|---|---|---|---|---|---|---|---|---|---|---|
| 001 | 5 | xy | xx | xx | zy | 1 | 4 | xx | ||
| 002 | 6 | zzz | yyy | zzz | xw | 2 | zzz | 3 | zzz |
I found two different solutions that seem to work but for me is giving an error. Not sure if they work with NaN values.
pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_')
.reset_index()
.drop('dropme', axis=1)
.sort_values('id')
Output:
0 rows × 1773 columns
Another solution I tried was
df.set_index('id', inplace=True)
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()
Output:
150677 rows × 2 columns
the problem with this last one is I couldn’t keep the columns I wanted.
I appreciate any inputs!
Advertisement
Answer
Use suffix=r'w+' parameter in wide_to_long:
df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+')
.reset_index()
.drop('dropme', axis=1)
.sort_values('id')
In second solution add dropna=False to DataFrame.stack:
df.set_index('id', inplace=True)
df.columns = df.columns.str.split("_", expand=True)
df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()