Skip to content
Advertisement

select pandas rows based on 2 conditions using or

I have a pandas dataframe filled with values between 0 and 3 and now I would like to either select the max value and return the column label or return ‘undecided’ if a row does only contain 0 and 1 across multiple columns.

so far I have the first part sorted:

df['result'] = df.idxmax(axis=1) or 

How do I write the second part?

My dataframe looks like this and I want to add a column ‘result’ to it:

Column0 | Column1 | Column2 | Column3 | Column4 | result
   0         0         1         2        0       Column3
   3         1         0         0        0       Column0 
   0         1         0         1        0       undecided

Advertisement

Answer

Use mask to hide rows that do not respect the constraint:

df['result'] = df.mask(df.lt(2).all(1)).idxmax(1).fillna('undecided')
print(df)

# Output
   Column0  Column1  Column2  Column3  Column4     result
0        0        0        1        2        0    Column3
1        3        1        0        0        0    Column0
2        0        1        0        1        0  undecided

Output of mask:

>>> df.mask(df.lt(2).all(1))
   Column0  Column1  Column2  Column3  Column4
0      0.0      0.0      1.0      2.0      0.0
1      3.0      1.0      0.0      0.0      0.0
2      NaN      NaN      NaN      NaN      NaN
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement