Skip to content
Advertisement

select non-NaN rows with multiple conditions from a pandas dataframe

Assume there is a dataframe such as

import pandas as pd 
import numpy as np

df = pd.DataFrame({'col1':[1,2,3,4,5],
                   'col2':[11,12,np.nan,24,np.nan]})

df
    col1    col2
0   1       11.0
1   2       12.0
2   3       NaN
3   4       24.0
4   5       NaN

I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks

df1 = df[(df['col1'] < 4 & df['col2'].notna())]
df1

col1    col2

Advertisement

Answer

Because of the operator precedence (bitwise operators, e.g. &, have higher precedence than comparison operators, e.g. <).

Currently, your mask is being evaluated as

>>> df['col1'] < (4 & df['col2'].notna())

0    False
1    False
2    False
3    False
4    False
dtype: bool

That is why no rows are being selected. You have to wrap the first condition inside parentheses

>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()]
>>> df1

   col1  col2
0     1  11.0
1     2  12.0
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement