I am new to pandas. And really confused with working on a dataframe with multi-level index for columns.
I want to
- re-name my level 2 column name by appending with the column.iloc: _0, _1, …
- add a new column New_Max which is the max value of the previous 2 columns. The level 0 & 1 name for New_Max is not important.
Thank you
Current State
JavaScript
x
9
1
Importance| H | H |
2
Category | Cat1 | Cat2 |
3
|Total Assets| AUMs |
4
Firm 1 | 100 | 300 |
5
Firm 2 | 200 | 3400 |
6
Firm 3 | 300 | 800 |
7
Firm 4 | NaN | 800 |
8
9
Desired State
JavaScript
1
9
1
Importance| H | H |
2
Category | Cat1 | Cat2 |
3
|Total Assets_0| AUMs_1 | New_Max |
4
Firm 1 | 100 | 300 | 300 |
5
Firm 2 | 200 | 3400 | 3400 |
6
Firm 3 | 300 | 800 | 800 |
7
Firm 4 | NaN | 800 | 800 |
8
9
Advertisement
Answer
Use enumerate
for counter for tuples and create MultiIndex
by MultiIndex.from_tuples
:
JavaScript
1
12
12
1
tups = [(a, b, f'{c}_{i}') for i, (a,b,c) in enumerate(df.columns)]
2
3
df.columns = pd.MultiIndex.from_tuples(tups)
4
print (df)
5
H
6
Cat1 Cat2
7
Total Assets_0 AUMs_1
8
Firm1 100.0 300
9
Firm2 200.0 3400
10
Firm3 300.0 800
11
Firm4 NaN 800
12
Last for new column by last 2 columns select them by position by DataFrame.iloc
, extract original last tuple and rename last third value to new column name:
JavaScript
1
16
16
1
new = list(tups[-1])
2
new[2] = 'New_Max'
3
print (new)
4
['H', 'Cat2', 'New_Max']
5
6
df[tuple(new)] = df.iloc[:, -2:].max(axis=1)
7
print (df)
8
9
H
10
Cat1 Cat2
11
Total Assets_0 AUMs_1 New_Max
12
Firm1 100.0 300 300.0
13
Firm2 200.0 3400 3400.0
14
Firm3 300.0 800 800.0
15
Firm4 NaN 800 800.0
16