Skip to content
Advertisement

DataFrame comparison with SQL Server table and upload just the differences

I have an SQL table (table_1) that contains data, and I have a Python script that reads a csf and creates a dataframe.

I want to compare the dataframe with the SQL table data and then insert the missing data from the dataframe into the SQL table.

I went around and read this comparing pandas dataframe with sqlite table via sqlquery post and Compare pandas dataframe columns to sql table dataframe columns, but was not able to do it.

The table and the dataframe have the exact same columns.

The dataframe is:

import pandas as pd

df = pd.DataFrame({'userid':[1,2,3],
           'user': ['Bob', 'Jane', 'Alice'], 
                   'income': [40000, 50000, 42000]})

and the SQL table (using SQLAlchemy):

userid user income
1      Bob  40000
2      Jane 42000

I’d like to compare the df to the SQL table and insert userid 3, Alice, with all her details and it’s the only value missing between them.

Advertisement

Answer

Since you are only interested in inserting new records, and are loading from a CSV so you will have data in local memory already:

# read current userids
sql = pd.read_sql('SELECT userid FROM table_name', conn)

# keep only userids not in the sql table
df = df[~df['userid'].isin(sql['userid'])]

# insert new records
df.to_sql('table_name', conn, if_exists='append')

Other options would require first loading more data into SQL than needed.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement