What is the idiomatic way to split a dataframe into two by a condition? Or put another way, to filter a dataframe by a condition but keep the filtered out rows in a new dataframe?
I was thinking I could filter the original dataframe for the inverse of the ‘filtered out by condition’ rows using the tilde , like this:
# this obviously doesnt work, though it would seem idiomatic to me. # is there a clean / idiomatic solution? vowels = df[~nonvowels]
import pandas as pd dl = [ {'letter':'a','number':1,'vowel':True}, {'letter':'b','number':2,'vowel':False}, {'letter':'c','number':3,'vowel':False}, {'letter':'d','number':4,'vowel':False}, {'letter':'e','number':5,'vowel':True}, {'letter':'f','number':6,'vowel':False}, {'letter':'g','number':7,'vowel':False}, {'letter':'x','number':24,'vowel':False}, {'letter':'y','number':25,'vowel':None}, {'letter':'z','number':26,'vowel':False} ] df = pd.DataFrame(dl) print (df) # a filter nonvowels = df[df['vowel'] == False] # one solution- i know i can do this with the drop method, passing it the 'filtered df''s index to the original df drop method as a list... vowels = df.drop(nonvowels.index.tolist()) # but i was curious if there was an idiomatic way to store filtered rows of a # dataframe and also store/keep it's 'inverse' # as stated above, my inuition thinks once i have nonvowels in a df, i should be able to do something like: # vowels = df[~nonvowels] # but that doesnt work... print (nonvowels) print (vowels)
Advertisement
Answer
You can use df.loc
+ df.index.difference
with nonvowels.index
to achieve what you’re looking for (and yes, this is idiomatic):
vowels = df.loc[df.index.difference(nonvowels.index)]
Output:
>>> vowels letter number vowel 0 a 1 True 4 e 5 True 8 y 25 None