I have a Pandas dataframe as below: I want to append a reason column that gives a standard text + the column name of the minimum value of that row. In other words, the desired output is: I can do incomplete_df.apply(lambda x: min(x),axis=1) but this does not ignore NAN's and more importantly returns the value rather than the name of

Pandas: getting the name of the minimum column

I have a Pandas dataframe as below:

incomplete_df = pd.DataFrame({'event1': [1,     2     ,np.NAN,5     ,6,np.NAN,np.NAN,11    ,np.NAN,15],
                              'event2': [np.NAN,1     ,np.NAN,3     ,4,7     ,np.NAN,12    ,np.NAN,17],
                              'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4     ,9     ,np.NAN,3     ,np.NAN]})
incomplete_df
   event1  event2  event3
0       1     NaN     NaN
1       2       1     NaN
2     NaN     NaN     NaN
3       5       3     NaN
4       6       4       6
5     NaN       7       4
6     NaN     NaN       9
7      11      12     NaN
8     NaN     NaN       3
9      15      17     NaN

I want to append a reason column that gives a standard text + the column name of the minimum value of that row. In other words, the desired output is:

   event1  event2  event3  reason
0       1     NaN     NaN  'Reason is event1'
1       2       1     NaN  'Reason is event2'
2     NaN     NaN     NaN  'Reason is None'
3       5       3     NaN  'Reason is event2'
4       6       4       6  'Reason is event2'
5     NaN       7       4  'Reason is event3'
6     NaN     NaN       9  'Reason is event3'
7      11      12     NaN  'Reason is event1'
8     NaN     NaN       3  'Reason is event3'
9      15      17     NaN  'Reason is event1'

I can do incomplete_df.apply(lambda x: min(x),axis=1) but this does not ignore NAN‘s and more importantly returns the value rather than the name of the corresponding column.

EDIT:

Having found out about the idxmin() function from EMS’s answer, I timed the the two solutions below:

timeit.repeat("incomplete_df.apply(lambda x: x.idxmin(), axis=1)", "from __main__ import incomplete_df", number=1000)
[0.35261858807214175, 0.32040155511039536, 0.3186818508661702]

timeit.repeat("incomplete_df.T.idxmin()", "from __main__ import incomplete_df", number=1000)
[0.17752145781657447, 0.1628651645393262, 0.15563708275042387]

It seems like the transpose approach is twice as fast.

Answer

incomplete_df['reason'] = "Reason is " + incomplete_df.T.idxmin()

Advertisement

Answer