Skip to content
Advertisement

perform df.loc to groupby df

I’ve a df consisted of person, origin and destination

df = pd.DataFrame({'PersonID':['1','1','2','2','2','3'],'O':['A','B','C','B','A','X'],'D':['B','A','B','A','B','Y']})

the df:

PersonID    O       D
   1        A       B
   1        B       A      
   2        C       B
   2        B       A
   2        A       B
   3        X       Y

I have grouped by the df with df_grouped = df.groupby(['O','D']) and match them with another dataframe, taxi.

TaxiID    O       D
  T1      B       A
  T2      A       B
  T3      C       B

similarly, I group by the taxi with their O and D. Then I merged them after aggregating and counting the PersonID and TaxiID per O-D pair. I did it to see how many taxis are available for how many people.

O     D     PersonID    TaxiID
             count      count
A     B        2          1
B     A        2          1
C     B        1          1

Now, I want to perform df.loc to take only those PersonID that was counted in the merged file. How can I do this? I’ve tried to us:

seek = df.loc[df.PersonID.isin(merged['PersonID'])]

but it returns an empty dataframe. What can I do to do this?

edit: I attach the complete code for this case using dummy data

df = pd.DataFrame({'PersonID':['1','1','2','2','2','3'],'O':['A','B','C','B','A','X'],'D':['B','A','B','A','B','Y']})
taxi = pd.DataFrame({'TaxiID':['T1','T2','T3'],'O':['B','A','C'],'D':['A','B','B']})

df_grouped = df.groupby(['O','D'])
taxi_grouped = taxi.groupby(['O','D'])

dfm = df_grouped.agg({'PersonID':['count',list]}).reset_index()
tgm = taxi_grouped.agg({'TaxiID':['count',list]}).reset_index()

merged = pd.merge(dfm, tgm, how='inner')

seek = df.loc[df.PersonID.isin(merged['PersonID'])]

Advertisement

Answer

Select MultiIndex by tuple with Series.explode for scalars from nested lists:

seek = df.loc[df.PersonID.isin(merged[('PersonID', 'list')].explode().unique())]
print (seek)
  PersonID  O  D
0        1  A  B
1        1  B  A
2        2  C  B
3        2  B  A
4        2  A  B

For better performance is possible use set comprehension with flatten:

seek = df.loc[df.PersonID.isin(set(z for x in merged[('PersonID', 'list')] for z in x))]
print (seek)
  PersonID  O  D
0        1  A  B
1        1  B  A
2        2  C  B
3        2  B  A
4        2  A  B
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement