Assume there is a dataframe such as
import pandas as pd 
import numpy as np
df = pd.DataFrame({'col1':[1,2,3,4,5],
                   'col2':[11,12,np.nan,24,np.nan]})
df
    col1    col2
0   1       11.0
1   2       12.0
2   3       NaN
3   4       24.0
4   5       NaN
I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks
df1 = df[(df['col1'] < 4 & df['col2'].notna())] df1 col1 col2
Advertisement
Answer
Because of the operator precedence (bitwise operators, e.g. &, have higher precedence than comparison operators, e.g. <).
Currently, your mask is being evaluated as
>>> df['col1'] < (4 & df['col2'].notna()) 0 False 1 False 2 False 3 False 4 False dtype: bool
That is why no rows are being selected. You have to wrap the first condition inside parentheses
>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()] >>> df1 col1 col2 0 1 11.0 1 2 12.0