I’ve a df
consisted of person, origin and destination
df = pd.DataFrame({'PersonID':['1','1','2','2','2','3'],'O':['A','B','C','B','A','X'],'D':['B','A','B','A','B','Y']})
the df:
PersonID O D 1 A B 1 B A 2 C B 2 B A 2 A B 3 X Y
I have grouped by the df with df_grouped = df.groupby(['O','D'])
and match them with another dataframe, taxi
.
TaxiID O D T1 B A T2 A B T3 C B
similarly, I group by the taxi
with their O
and D
. Then I merged them after aggregating and counting the PersonID and TaxiID per O-D pair. I did it to see how many taxis are available for how many people.
O D PersonID TaxiID count count A B 2 1 B A 2 1 C B 1 1
Now, I want to perform df.loc
to take only those PersonID that was counted in the merged file. How can I do this? I’ve tried to us:
seek = df.loc[df.PersonID.isin(merged['PersonID'])]
but it returns an empty dataframe. What can I do to do this?
edit: I attach the complete code for this case using dummy data
df = pd.DataFrame({'PersonID':['1','1','2','2','2','3'],'O':['A','B','C','B','A','X'],'D':['B','A','B','A','B','Y']}) taxi = pd.DataFrame({'TaxiID':['T1','T2','T3'],'O':['B','A','C'],'D':['A','B','B']}) df_grouped = df.groupby(['O','D']) taxi_grouped = taxi.groupby(['O','D']) dfm = df_grouped.agg({'PersonID':['count',list]}).reset_index() tgm = taxi_grouped.agg({'TaxiID':['count',list]}).reset_index() merged = pd.merge(dfm, tgm, how='inner') seek = df.loc[df.PersonID.isin(merged['PersonID'])]
Advertisement
Answer
Select MultiIndex
by tuple
with Series.explode
for scalars from nested lists:
seek = df.loc[df.PersonID.isin(merged[('PersonID', 'list')].explode().unique())] print (seek) PersonID O D 0 1 A B 1 1 B A 2 2 C B 3 2 B A 4 2 A B
For better performance is possible use set comprehension
with flatten:
seek = df.loc[df.PersonID.isin(set(z for x in merged[('PersonID', 'list')] for z in x))] print (seek) PersonID O D 0 1 A B 1 1 B A 2 2 C B 3 2 B A 4 2 A B