Skip to content
Advertisement

Find the column name of the second largest value of each row in a Pandas DataFrame

I am trying to find column name associated with the largest and second largest values in a DataFrame, here’s a simplified example (the real one has over 500 columns):

Date  val1  val2 val3 val4
1990   5     7    1    10
1991   2     1    10   3
1992   10    9    6    1
1993   50    10   2    15
1994   1     15   7    8

Needs to become:

Date  1larg   2larg
1990  val4    val2
1991  val3    val4
1992  val1    val2
1993  val1    val4
1994  val2    val4

I can find the column name with the largest value (i,e, 1larg above) with idxmax, but how can I find the second largest?

Advertisement

Answer

(You don’t have any duplicate maximum values in your rows, so I’ll guess that if you have [1,1,2,2] you want val3 and val4 to be selected.)

One way would be to use the result of argsort as an index into a Series with the column names.

df = df.set_index("Date")
arank = df.apply(np.argsort, axis=1)
ranked_cols = df.columns.to_series()[arank.values[:,::-1][:,:2]]
new_frame = pd.DataFrame(ranked_cols, index=df.index)

produces

         0     1
Date            
1990  val4  val2
1991  val3  val4
1992  val1  val2
1993  val1  val4
1994  val2  val4
1995  val4  val3

(where I’ve added an extra 1995 [1,1,2,2] row.)

Alternatively, you could probably melt into a flat format, pick out the largest two values in each Date group, and then turn it again.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement