I have a Pandas dataframe as below:
JavaScript
x
16
16
1
incomplete_df = pd.DataFrame({'event1': [1, 2 ,np.NAN,5 ,6,np.NAN,np.NAN,11 ,np.NAN,15],
2
'event2': [np.NAN,1 ,np.NAN,3 ,4,7 ,np.NAN,12 ,np.NAN,17],
3
'event3': [np.NAN,np.NAN,np.NAN,np.NAN,6,4 ,9 ,np.NAN,3 ,np.NAN]})
4
incomplete_df
5
event1 event2 event3
6
0 1 NaN NaN
7
1 2 1 NaN
8
2 NaN NaN NaN
9
3 5 3 NaN
10
4 6 4 6
11
5 NaN 7 4
12
6 NaN NaN 9
13
7 11 12 NaN
14
8 NaN NaN 3
15
9 15 17 NaN
16
I want to append a reason
column that gives a standard text + the column name of the minimum value of that row. In other words, the desired output is:
JavaScript
1
12
12
1
event1 event2 event3 reason
2
0 1 NaN NaN 'Reason is event1'
3
1 2 1 NaN 'Reason is event2'
4
2 NaN NaN NaN 'Reason is None'
5
3 5 3 NaN 'Reason is event2'
6
4 6 4 6 'Reason is event2'
7
5 NaN 7 4 'Reason is event3'
8
6 NaN NaN 9 'Reason is event3'
9
7 11 12 NaN 'Reason is event1'
10
8 NaN NaN 3 'Reason is event3'
11
9 15 17 NaN 'Reason is event1'
12
I can do incomplete_df.apply(lambda x: min(x),axis=1)
but this does not ignore NAN
‘s and more importantly returns the value rather than the name of the corresponding column.
EDIT:
Having found out about the idxmin() function from EMS’s answer, I timed the the two solutions below:
JavaScript
1
6
1
timeit.repeat("incomplete_df.apply(lambda x: x.idxmin(), axis=1)", "from __main__ import incomplete_df", number=1000)
2
[0.35261858807214175, 0.32040155511039536, 0.3186818508661702]
3
4
timeit.repeat("incomplete_df.T.idxmin()", "from __main__ import incomplete_df", number=1000)
5
[0.17752145781657447, 0.1628651645393262, 0.15563708275042387]
6
It seems like the transpose approach is twice as fast.
Advertisement
Answer
JavaScript
1
2
1
incomplete_df['reason'] = "Reason is " + incomplete_df.T.idxmin()
2