I have a pandas dataframe filled with values between 0 and 3 and now I would like to either select the max value and return the column label or return ‘undecided’ if a row does only contain 0 and 1 across multiple columns.
so far I have the first part sorted:
df['result'] = df.idxmax(axis=1) or
How do I write the second part?
My dataframe looks like this and I want to add a column ‘result’ to it:
Column0 | Column1 | Column2 | Column3 | Column4 | result 0 0 1 2 0 Column3 3 1 0 0 0 Column0 0 1 0 1 0 undecided
Advertisement
Answer
Use mask
to hide rows that do not respect the constraint:
df['result'] = df.mask(df.lt(2).all(1)).idxmax(1).fillna('undecided') print(df) # Output Column0 Column1 Column2 Column3 Column4 result 0 0 0 1 2 0 Column3 1 3 1 0 0 0 Column0 2 0 1 0 1 0 undecided
Output of mask
:
>>> df.mask(df.lt(2).all(1)) Column0 Column1 Column2 Column3 Column4 0 0.0 0.0 1.0 2.0 0.0 1 3.0 1.0 0.0 0.0 0.0 2 NaN NaN NaN NaN NaN