I am new to pandas. And really confused with working on a dataframe with multi-level index for columns.
I want to
- re-name my level 2 column name by appending with the column.iloc: _0, _1, …
- add a new column New_Max which is the max value of the previous 2 columns. The level 0 & 1 name for New_Max is not important.
Thank you
Current State
Importance| H | H |
Category | Cat1 | Cat2 |
|Total Assets| AUMs |
Firm 1 | 100 | 300 |
Firm 2 | 200 | 3400 |
Firm 3 | 300 | 800 |
Firm 4 | NaN | 800 |
Desired State
Importance| H | H |
Category | Cat1 | Cat2 |
|Total Assets_0| AUMs_1 | New_Max |
Firm 1 | 100 | 300 | 300 |
Firm 2 | 200 | 3400 | 3400 |
Firm 3 | 300 | 800 | 800 |
Firm 4 | NaN | 800 | 800 |
Advertisement
Answer
Use enumerate for counter for tuples and create MultiIndex by MultiIndex.from_tuples:
tups = [(a, b, f'{c}_{i}') for i, (a,b,c) in enumerate(df.columns)]
df.columns = pd.MultiIndex.from_tuples(tups)
print (df)
H
Cat1 Cat2
Total Assets_0 AUMs_1
Firm1 100.0 300
Firm2 200.0 3400
Firm3 300.0 800
Firm4 NaN 800
Last for new column by last 2 columns select them by position by DataFrame.iloc, extract original last tuple and rename last third value to new column name:
new = list(tups[-1])
new[2] = 'New_Max'
print (new)
['H', 'Cat2', 'New_Max']
df[tuple(new)] = df.iloc[:, -2:].max(axis=1)
print (df)
H
Cat1 Cat2
Total Assets_0 AUMs_1 New_Max
Firm1 100.0 300 300.0
Firm2 200.0 3400 3400.0
Firm3 300.0 800 800.0
Firm4 NaN 800 800.0