What is the idiomatic way to split a dataframe into two by a condition? Or put another way, to filter a dataframe by a condition but keep the filtered out rows in a new dataframe?
I was thinking I could filter the original dataframe for the inverse of the ‘filtered out by condition’ rows using the tilde , like this:
JavaScript
x
4
1
# this obviously doesnt work, though it would seem idiomatic to me.
2
# is there a clean / idiomatic solution?
3
vowels = df[~nonvowels]
4
JavaScript
1
34
34
1
import pandas as pd
2
3
dl = [
4
{'letter':'a','number':1,'vowel':True},
5
{'letter':'b','number':2,'vowel':False},
6
{'letter':'c','number':3,'vowel':False},
7
{'letter':'d','number':4,'vowel':False},
8
{'letter':'e','number':5,'vowel':True},
9
{'letter':'f','number':6,'vowel':False},
10
{'letter':'g','number':7,'vowel':False},
11
{'letter':'x','number':24,'vowel':False},
12
{'letter':'y','number':25,'vowel':None},
13
{'letter':'z','number':26,'vowel':False}
14
]
15
16
df = pd.DataFrame(dl)
17
18
print (df)
19
20
# a filter
21
nonvowels = df[df['vowel'] == False]
22
23
# one solution- i know i can do this with the drop method, passing it the 'filtered df''s index to the original df drop method as a list...
24
vowels = df.drop(nonvowels.index.tolist())
25
26
# but i was curious if there was an idiomatic way to store filtered rows of a
27
# dataframe and also store/keep it's 'inverse'
28
# as stated above, my inuition thinks once i have nonvowels in a df, i should be able to do something like:
29
# vowels = df[~nonvowels]
30
# but that doesnt work...
31
32
print (nonvowels)
33
print (vowels)
34
Advertisement
Answer
You can use df.loc
+ df.index.difference
with nonvowels.index
to achieve what you’re looking for (and yes, this is idiomatic):
JavaScript
1
2
1
vowels = df.loc[df.index.difference(nonvowels.index)]
2
Output:
JavaScript
1
6
1
>>> vowels
2
letter number vowel
3
0 a 1 True
4
4 e 5 True
5
8 y 25 None
6