I am new to pandas. And really confused with working on a dataframe with multi-level index for columns.
I want to
- re-name my level 2 column name by appending with the column.iloc: _0, _1, …
- add a new column New_Max which is the max value of the previous 2 columns. The level 0 & 1 name for New_Max is not important.
Thank you
Current State
Importance| H | H | Category | Cat1 | Cat2 | |Total Assets| AUMs | Firm 1 | 100 | 300 | Firm 2 | 200 | 3400 | Firm 3 | 300 | 800 | Firm 4 | NaN | 800 |
Desired State
Importance| H | H | Category | Cat1 | Cat2 | |Total Assets_0| AUMs_1 | New_Max | Firm 1 | 100 | 300 | 300 | Firm 2 | 200 | 3400 | 3400 | Firm 3 | 300 | 800 | 800 | Firm 4 | NaN | 800 | 800 |
Advertisement
Answer
Use enumerate
for counter for tuples and create MultiIndex
by MultiIndex.from_tuples
:
tups = [(a, b, f'{c}_{i}') for i, (a,b,c) in enumerate(df.columns)] df.columns = pd.MultiIndex.from_tuples(tups) print (df) H Cat1 Cat2 Total Assets_0 AUMs_1 Firm1 100.0 300 Firm2 200.0 3400 Firm3 300.0 800 Firm4 NaN 800
Last for new column by last 2 columns select them by position by DataFrame.iloc
, extract original last tuple and rename last third value to new column name:
new = list(tups[-1]) new[2] = 'New_Max' print (new) ['H', 'Cat2', 'New_Max'] df[tuple(new)] = df.iloc[:, -2:].max(axis=1) print (df) H Cat1 Cat2 Total Assets_0 AUMs_1 New_Max Firm1 100.0 300 300.0 Firm2 200.0 3400 3400.0 Firm3 300.0 800 800.0 Firm4 NaN 800 800.0