I am trying to find column name associated with the largest and second largest values in a DataFrame, here’s a simplified example (the real one has over 500 columns):
Date val1 val2 val3 val4 1990 5 7 1 10 1991 2 1 10 3 1992 10 9 6 1 1993 50 10 2 15 1994 1 15 7 8
Needs to become:
Date 1larg 2larg 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4
I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?
Advertisement
Answer
(You don’t have any duplicate maximum values in your rows, so I’ll guess that if you have [1,1,2,2]
you want val3
and val4
to be selected.)
One way would be to use the result of argsort
as an index into a Series with the column names.
df = df.set_index("Date") arank = df.apply(np.argsort, axis=1) ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]] new_frame = pd.DataFrame(ranked_cols, index=df.index)
produces
0 1 Date 1990 val4 val2 1991 val3 val4 1992 val1 val2 1993 val1 val4 1994 val2 val4 1995 val4 val3
(where I’ve added an extra 1995 [1,1,2,2]
row.)
Alternatively, you could probably melt
into a flat format, pick out the largest two values in each Date group, and then turn it again.