Skip to content
Advertisement

pandas, access a series of lists as a set and take the set difference of 2 set series

Given 2 pandas series, both consisting of lists (i.e. each row in the series is a list), I want to take the set difference of 2 columns

For example, in the dataframe…

pd.DataFrame({
    'A': [[1, 2, 3], [4, 5, 6], [7, 8, 9]],
    'B': [[1, 2], [5, 6], [7, 8, 9]]
})

I want to create a new column C, that is set(A) – set(B)…

pd.DataFrame({
    'C': [[3], [4], []]
})

Advertisement

Answer

Thanks to: https://www.geeksforgeeks.org/python-difference-two-lists/

def Diff(li1, li2):
    return list(set(li1) - set(li2)) + list(set(li2) - set(li1))

df['C'] = df.apply(lambda x: Diff(x['A'], x['B']), axis=1)

Output

           A          B    C
0  [1, 2, 3]     [1, 2]  [3]
1  [4, 5, 6]     [5, 6]  [4]
2  [7, 8, 9]  [7, 8, 9]   []
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement