I have a pandas dataframe filled with values between 0 and 3 and now I would like to either select the max value and return the column label or return ‘undecided’ if a row does only contain 0 and 1 across multiple columns.
so far I have the first part sorted:
JavaScript
x
2
1
df['result'] = df.idxmax(axis=1) or
2
How do I write the second part?
My dataframe looks like this and I want to add a column ‘result’ to it:
JavaScript
1
5
1
Column0 | Column1 | Column2 | Column3 | Column4 | result
2
0 0 1 2 0 Column3
3
3 1 0 0 0 Column0
4
0 1 0 1 0 undecided
5
Advertisement
Answer
Use mask
to hide rows that do not respect the constraint:
JavaScript
1
9
1
df['result'] = df.mask(df.lt(2).all(1)).idxmax(1).fillna('undecided')
2
print(df)
3
4
# Output
5
Column0 Column1 Column2 Column3 Column4 result
6
0 0 0 1 2 0 Column3
7
1 3 1 0 0 0 Column0
8
2 0 1 0 1 0 undecided
9
Output of mask
:
JavaScript
1
6
1
>>> df.mask(df.lt(2).all(1))
2
Column0 Column1 Column2 Column3 Column4
3
0 0.0 0.0 1.0 2.0 0.0
4
1 3.0 1.0 0.0 0.0 0.0
5
2 NaN NaN NaN NaN NaN
6