I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns.
id | AA_F1R1 | BB_F1R1 | AA_F1R2 | BB_F1R2 | … | AA_F2R1 | BB_F2R2 | … | AA_F7R25 | BB_F7R25 |
---|---|---|---|---|---|---|---|---|---|---|
001 | 5 | xy | xx | xx | zy | 1 | 4 | xx | ||
002 | 6 | zzz | yyy | zzz | xw | 2 | zzz | 3 | zzz |
I found two different solutions that seem to work but for me is giving an error. Not sure if they work with NaN values.
pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_') .reset_index() .drop('dropme', axis=1) .sort_values('id') Output: 0 rows × 1773 columns
Another solution I tried was
df.set_index('id', inplace=True) df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_"))) df.stack(level = 1).reset_index(level = 1, drop = True).reset_index() Output: 150677 rows × 2 columns
the problem with this last one is I couldn’t keep the columns I wanted.
I appreciate any inputs!
Advertisement
Answer
Use suffix=r'w+'
parameter in wide_to_long
:
df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+') .reset_index() .drop('dropme', axis=1) .sort_values('id')
In second solution add dropna=False
to DataFrame.stack
:
df.set_index('id', inplace=True) df.columns = df.columns.str.split("_", expand=True) df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()