Skip to content
Advertisement

find missing datas between two tables with similar columns python

I have two dataframes with 2 similar columns “date” and “id_number” and I want to find all the id_number missing from the second table to compare.

Here’s my code:

import pandas as pd

data1 = {'date':  ['7/09/22', '7/09/22', '7/09/22'],
        'second_column': ['first_value', 'second_value', 'third_value'],
         'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589'],
        'fourth_column':['first_value', 'second_value', 'third_value'],
        }

data2 = {'date':  ['7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22', '7/09/22'],
        'second_column': ['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
         'id_number':['AA576bdk89', 'GG6jabkhd589', 'BXV6jabd589','BXV6mkjdd589','GGdbkz589', 'BXhshhsd589'],
        'fourth_column':['first_value', 'second_value', 'third_value','fourth_value', 'fifth_value','sixth_value'],
        }

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

print (df1)
print ('n')
print (df2)

enter image description here

Advertisement

Answer

If need compare per date and id_number use left join with indicator parameter:

m = df2.merge(df1, on=['date','id_number'], how='left', indicator=True)['_merge'].ne('both')
df = df2[m]
print (df)
      date second_column     id_number fourth_column
3  7/09/22  fourth_value  BXV6mkjdd589  fourth_value
4  7/09/22   fifth_value     GGdbkz589   fifth_value
5  7/09/22   sixth_value   BXhshhsd589   sixth_value

Or if need compare only by id_number use Series.isin:

df = df2[~df2.id_number.isin(df1.id_number)]
print (df)
      date second_column     id_number fourth_column
3  7/09/22  fourth_value  BXV6mkjdd589  fourth_value
4  7/09/22   fifth_value     GGdbkz589   fifth_value
5  7/09/22   sixth_value   BXhshhsd589   sixth_value
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement