I have two dataframes. First one:
JavaScript
x
4
1
import pandas as pd
2
a = [['xxx', 'admin'], ['yyy', 'admin,super admin'], ['zzz', 'guest,admin,superadmin']]
3
df1 = pd.DataFrame(a, columns=['user', 'groups'])
4
second one:
JavaScript
1
3
1
b = [['xxx', 'admin,super admin'], ['www', 'admin,super admin'], ['zzz', 'guest,superadmin']]
2
df2 = pd.DataFrame(b, columns=['user', 'groups'])
3
this is the first one:
JavaScript
1
5
1
user groups
2
0 xxx admin
3
1 yyy admin,super admin
4
2 zzz guest,admin,superadmin
5
this is the second one:
JavaScript
1
5
1
user groups
2
0 xxx admin,super admin
3
1 www admin,super admin
4
2 zzz guest,superadmin
5
I want to do two things:
if the second one’s user is not in the first one, then print out. like: www is not in the list
if the user is in the list, but group is not equal then print out:
likexxx
user have more:super admin
than the list
zzz
user has less:admin
than the list.
Advertisement
Answer
If there are same index values ina number of length in both DataFrame and need compare values per rows:
JavaScript
1
10
10
1
print (df1.index.equals(df2.index))
2
True
3
4
#compare rows for not equal
5
mask = df1['user'].ne(df2['user'])
6
#filter rows by mask and column user in df2
7
a = df2.loc[mask, 'user'].tolist()
8
print (a)
9
['www']
10
JavaScript
1
20
20
1
#join both DataFrames together
2
df1 = pd.concat([df1, df2], axis=1, keys=('a','b'))
3
df1.columns = df1.columns.map('_'.join)
4
#filter only same user rows
5
df1 = df1[~mask]
6
#split columns by , ans convert to sets
7
df1['a'] = df1['a_groups'].apply(lambda x: set(x.split(',')))
8
df1['b'] = df1['b_groups'].apply(lambda x: set(x.split(',')))
9
#get difference of sets, join to strings with separator ,
10
df1['a_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['b'],df1['a'] )]
11
df1['b_diff'] = [', '.join(x.difference(y)) for x, y in zip(df1['a'],df1['b'] )]
12
print (df1)
13
a_user a_groups b_user b_groups
14
0 xxx admin xxx admin,super admin
15
2 zzz guest,admin,superadmin zzz guest,superadmin
16
17
a b a_diff b_diff
18
0 {admin} {admin, super admin} super admin
19
2 {admin, superadmin, guest} {superadmin, guest} admin
20
JavaScript
1
11
11
1
#filter by casting set columns to boolean, empty sets are converted to False
2
b = df1.loc[df1['a_diff'].astype(bool), ['a_user','a_diff']]
3
print (b)
4
a_user a_diff
5
0 xxx super admin
6
7
c = df1.loc[df1['b_diff'].astype(bool), ['a_user','b_diff']]
8
print (c)
9
a_user b_diff
10
2 zzz admin
11