I have an SQL table (table_1) that contains data, and I have a Python script that reads a csf and creates a dataframe.
I want to compare the dataframe with the SQL table data and then insert the missing data from the dataframe into the SQL table.
I went around and read this comparing pandas dataframe with sqlite table via sqlquery post and Compare pandas dataframe columns to sql table dataframe columns, but was not able to do it.
The table and the dataframe have the exact same columns.
The dataframe is:
import pandas as pd df = pd.DataFrame({'userid':[1,2,3], 'user': ['Bob', 'Jane', 'Alice'], 'income': [40000, 50000, 42000]})
and the SQL table (using SQLAlchemy):
userid user income 1 Bob 40000 2 Jane 42000
I’d like to compare the df to the SQL table and insert userid 3, Alice, with all her details and it’s the only value missing between them.
Advertisement
Answer
Since you are only interested in inserting new records, and are loading from a CSV so you will have data in local memory already:
# read current userids sql = pd.read_sql('SELECT userid FROM table_name', conn) # keep only userids not in the sql table df = df[~df['userid'].isin(sql['userid'])] # insert new records df.to_sql('table_name', conn, if_exists='append')
Other options would require first loading more data into SQL than needed.