Skip to content
Advertisement

I want to compare the values in pandas

I have two dataframes. First one:

import pandas as pd
a = [['xxx', 'admin'], ['yyy', 'admin,super admin'], ['zzz', 'guest,admin,superadmin']]
df1 = pd.DataFrame(a, columns=['user', 'groups'])

second one:

b = [['xxx', 'admin,super admin'], ['www', 'admin,super admin'], ['zzz', 'guest,superadmin']]
df2 = pd.DataFrame(b, columns=['user', 'groups'])

this is the first one:

 user  groups
0  xxx  admin
1  yyy  admin,super admin
2  zzz  guest,admin,superadmin

this is the second one:

 user  groups
0  xxx  admin,super admin
1  www  admin,super admin
2  zzz  guest,superadmin

I want to do two things:

  1. if the second one’s user is not in the first one, then print out. like: www is not in the list

  2. if the user is in the list, but group is not equal then print out:
    like xxx user have more: super admin than the list
    zzz user has less: admin than the list.

Advertisement

Answer

If there are same index values ina number of length in both DataFrame and need compare values per rows:

print (df1.index.equals(df2.index))
True

#compare rows for not equal
mask = df1['user'].ne(df2['user'])
#filter rows by mask and column user in df2
a = df2.loc[mask, 'user'].tolist()
print (a)
['www']

#join both DataFrames together
df1 = pd.concat([df1, df2], axis=1, keys=('a','b'))
df1.columns  = df1.columns.map('_'.join)
#filter only same user rows
df1 = df1[~mask]
#split columns by , ans convert to sets
df1['a'] = df1['a_groups'].apply(lambda x: set(x.split(',')))
df1['b'] = df1['b_groups'].apply(lambda x: set(x.split(',')))
#get difference of sets, join to strings with separator ,
df1['a_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['b'],df1['a'] )]
df1['b_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['a'],df1['b'] )]
print (df1)
  a_user                a_groups b_user           b_groups  
0    xxx                   admin    xxx  admin,super admin   
2    zzz  guest,admin,superadmin    zzz   guest,superadmin   

                            a                     b       a_diff b_diff  
0                     {admin}  {admin, super admin}  super admin         
2  {admin, superadmin, guest}   {superadmin, guest}               admin   

#filter by casting set columns to boolean, empty sets are converted to False
b = df1.loc[df1['a_diff'].astype(bool), ['a_user','a_diff']]
print (b)
  a_user       a_diff
0    xxx  super admin

c = df1.loc[df1['b_diff'].astype(bool), ['a_user','b_diff']]
print (c)
  a_user b_diff
2    zzz  admin
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement