I have two dataframes with 2 similar columns “date” and “id_number” and I want to find all the id_number missing from the second table to compare.
Here’s my code:
import pandas as pd
data1 = {'date': ['7/09/22', '7/09/22', '7/09/22'], 'second_column': ['first_value', 'second_value', 'third_value'], 'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589'], 'fourth_column':['first_value', 'second_value', 'third_value'], } data2 = {'date': ['7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22'], 'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'], 'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'], 'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'], } df1 = pd.DataFrame(data1) df2 = pd.DataFrame(data2) print (df1) print ('n') print (df2)
Advertisement
Answer
If need compare per date
and id_number
use left join with indicator
parameter:
m = df2.merge(df1, on=['date','id_number'], how='left', indicator=True)['_merge'].ne('both') df = df2[m] print (df) date second_column id_number fourth_column 3 7/09/22 fourth_value BXV6mkjdd589 fourth_value 4 7/09/22 fifth_value GGdbkz589 fifth_value 5 7/09/22 sixth_value BXhshhsd589 sixth_value
Or if need compare only by id_number
use Series.isin
:
df = df2[~df2.id_number.isin(df1.id_number)] print (df) date second_column id_number fourth_column 3 7/09/22 fourth_value BXV6mkjdd589 fourth_value 4 7/09/22 fifth_value GGdbkz589 fifth_value 5 7/09/22 sixth_value BXhshhsd589 sixth_value