I have the original dataframe like that which contains 1772 columns and 130 rows. I would like to stack them into multiple target columns.
id | AA_F1R1 | BB_F1R1 | AA_F1R2 | BB_F1R2 | … | AA_F2R1 | BB_F2R2 | … | AA_F7R25 | BB_F7R25 |
---|---|---|---|---|---|---|---|---|---|---|
001 | 5 | xy | xx | xx | zy | 1 | 4 | xx | ||
002 | 6 | zzz | yyy | zzz | xw | 2 | zzz | 3 | zzz |
I found two different solutions that seem to work but for me is giving an error. Not sure if they work with NaN values.
JavaScript
x
7
1
pd.wide_to_long(df, stubnames=['AA', 'BB'], i='id', j='dropme', sep='_')
2
.reset_index()
3
.drop('dropme', axis=1)
4
.sort_values('id')
5
Output:
6
0 rows × 1773 columns
7
Another solution I tried was
JavaScript
1
7
1
df.set_index('id', inplace=True)
2
df.columns = pd.MultiIndex.from_tuples(tuple(df.columns.str.split("_")))
3
df.stack(level = 1).reset_index(level = 1, drop = True).reset_index()
4
5
Output:
6
150677 rows × 2 columns
7
the problem with this last one is I couldn’t keep the columns I wanted.
I appreciate any inputs!
Advertisement
Answer
Use suffix=r'w+'
parameter in wide_to_long
:
JavaScript
1
5
1
df = pd.wide_to_long(df, stubnames=['AA','BB'], i='id', j='dropme', sep='_', suffix=r'w+')
2
.reset_index()
3
.drop('dropme', axis=1)
4
.sort_values('id')
5
In second solution add dropna=False
to DataFrame.stack
:
JavaScript
1
4
1
df.set_index('id', inplace=True)
2
df.columns = df.columns.str.split("_", expand=True)
3
df = df.stack(level = 1, dropna=False).reset_index(level = 1, drop = True).reset_index()
4