Assume there is a dataframe such as
JavaScript
x
14
14
1
import pandas as pd
2
import numpy as np
3
4
df = pd.DataFrame({'col1':[1,2,3,4,5],
5
'col2':[11,12,np.nan,24,np.nan]})
6
7
df
8
col1 col2
9
0 1 11.0
10
1 2 12.0
11
2 3 NaN
12
3 4 24.0
13
4 5 NaN
14
I would like to select non-NaN rows based on multiple conditions such as (1) col1 < 4 and (2) non-nan in col2. The following is my code but I have no idea why I did not get the 1st two rows. Any idea? Thanks
JavaScript
1
5
1
df1 = df[(df['col1'] < 4 & df['col2'].notna())]
2
df1
3
4
col1 col2
5
Advertisement
Answer
Because of the operator precedence (bitwise operators, e.g. &
, have higher precedence than comparison operators, e.g. <
).
Currently, your mask is being evaluated as
JavaScript
1
9
1
>>> df['col1'] < (4 & df['col2'].notna())
2
3
0 False
4
1 False
5
2 False
6
3 False
7
4 False
8
dtype: bool
9
That is why no rows are being selected. You have to wrap the first condition inside parentheses
JavaScript
1
7
1
>>> df1 = df[(df['col1'] < 4) & df['col2'].notna()]
2
>>> df1
3
4
col1 col2
5
0 1 11.0
6
1 2 12.0
7