Python

What is the idiomatic way to split a dataframe into two by a condition? Or put another way, to filter a dataframe by a condition but keep the filtered out rows in a new dataframe?

I was thinking I could filter the original dataframe for the inverse of the ‘filtered out by condition’ rows using the tilde , like this:

# this obviously doesnt work, though it would seem idiomatic to me. 
# is there a clean / idiomatic solution?
vowels = df[~nonvowels]

JavaScript
​x
 
# this obviously doesnt work, though it would seem idiomatic to me. 
# is there a clean / idiomatic solution?
vowels = df[~nonvowels]
​

import pandas as pd

dl = [
    {'letter':'a','number':1,'vowel':True}, 
    {'letter':'b','number':2,'vowel':False}, 
    {'letter':'c','number':3,'vowel':False},
    {'letter':'d','number':4,'vowel':False},
    {'letter':'e','number':5,'vowel':True},
    {'letter':'f','number':6,'vowel':False},
    {'letter':'g','number':7,'vowel':False},
    {'letter':'x','number':24,'vowel':False},
    {'letter':'y','number':25,'vowel':None},
    {'letter':'z','number':26,'vowel':False}
    ]

df = pd.DataFrame(dl)

print (df)

# a filter
nonvowels = df[df['vowel'] == False]

# one solution- i know i can do this with the drop method, passing it the 'filtered df''s index to the original df drop method as a list...
vowels = df.drop(nonvowels.index.tolist())

# but i was curious if there was an idiomatic way to store filtered rows of a 
# dataframe and also store/keep it's 'inverse' 
# as stated above, my inuition thinks once i have nonvowels in a df, i should be able to do something like:
# vowels = df[~nonvowels]
# but that doesnt work...

print (nonvowels)
print (vowels)

JavaScript
 
import pandas as pd
​
dl = [
    {'letter':'a','number':1,'vowel':True}, 
    {'letter':'b','number':2,'vowel':False}, 
    {'letter':'c','number':3,'vowel':False},
    {'letter':'d','number':4,'vowel':False},
    {'letter':'e','number':5,'vowel':True},
    {'letter':'f','number':6,'vowel':False},
    {'letter':'g','number':7,'vowel':False},
    {'letter':'x','number':24,'vowel':False},
    {'letter':'y','number':25,'vowel':None},
    {'letter':'z','number':26,'vowel':False}
    ]
​
df = pd.DataFrame(dl)
​
print (df)
​
# a filter
nonvowels = df[df['vowel'] == False]
​
# one solution- i know i can do this with the drop method, passing it the 'filtered df''s index to the original df drop method as a list...
vowels = df.drop(nonvowels.index.tolist())
​
# but i was curious if there was an idiomatic way to store filtered rows of a 
# dataframe and also store/keep it's 'inverse' 
# as stated above, my inuition thinks once i have nonvowels in a df, i should be able to do something like:
# vowels = df[~nonvowels]
# but that doesnt work...
​
print (nonvowels)
print (vowels)
​

Answer

You can use df.loc + df.index.difference with nonvowels.index to achieve what you’re looking for (and yes, this is idiomatic):

vowels = df.loc[df.index.difference(nonvowels.index)]

JavaScript
 
vowels = df.loc[df.index.difference(nonvowels.index)]
​

Output:

>>> vowels
  letter  number vowel
0      a       1  True
4      e       5  True
8      y      25  None

JavaScript
 
>>> vowels
  letter  number vowel
0      a       1  True
4      e       5  True
8      y      25  None
​

pandas – idiomatic way to stash rows filtered out of dataframe

Advertisement

Answer