I am trying to find column name associated with the largest and second largest values in a DataFrame, here’s a simplified example (the real one has over 500 columns):
JavaScript
x
7
1
Date val1 val2 val3 val4
2
1990 5 7 1 10
3
1991 2 1 10 3
4
1992 10 9 6 1
5
1993 50 10 2 15
6
1994 1 15 7 8
7
Needs to become:
JavaScript
1
7
1
Date 1larg 2larg
2
1990 val4 val2
3
1991 val3 val4
4
1992 val1 val2
5
1993 val1 val4
6
1994 val2 val4
7
I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?
Advertisement
Answer
(You don’t have any duplicate maximum values in your rows, so I’ll guess that if you have [1,1,2,2]
you want val3
and val4
to be selected.)
One way would be to use the result of argsort
as an index into a Series with the column names.
JavaScript
1
5
1
df = df.set_index("Date")
2
arank = df.apply(np.argsort, axis=1)
3
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
4
new_frame = pd.DataFrame(ranked_cols, index=df.index)
5
produces
JavaScript
1
9
1
0 1
2
Date
3
1990 val4 val2
4
1991 val3 val4
5
1992 val1 val2
6
1993 val1 val4
7
1994 val2 val4
8
1995 val4 val3
9
(where I’ve added an extra 1995 [1,1,2,2]
row.)
Alternatively, you could probably melt
into a flat format, pick out the largest two values in each Date group, and then turn it again.