Assume there is a dataframe such as
import pandas as pd import numpy as np df = pd.DataFrame({'col1':[1,2,3,4,5], 'col2':[11,12,np.nan,24,np.nan]}) df col1 col2 0 1 11.0 1 2 12.0 2 3 NaN 3 4 24.0 4 5 NaN
I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks
df1 = df[(df['col1'] < 4 & df['col2'].notna())] df1 col1 col2
Advertisement
Answer
Because of the operator precedence (bitwise operators, e.g. &
, have higher precedence than comparison operators, e.g. <
).
Currently, your mask is being evaluated as
>>> df['col1'] < (4 & df['col2'].notna()) 0 False 1 False 2 False 3 False 4 False dtype: bool
That is why no rows are being selected. You have to wrap the first condition inside parentheses
>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()] >>> df1 col1 col2 0 1 11.0 1 2 12.0