I’ve scraped some data from web sources and stored it all in a pandas DataFrame. Now, in order harness the powerful db tools afforded by SQLAlchemy, I want to convert said DataFrame into a Table() object and eventually upsert all data into a PostgreSQL table. If this is practical, what is a workable method of going about accomplishing this task?
Advertisement
Answer
Update: You can save yourself some typing by using this method.
If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT
statement:
import sqlalchemy as sa # … with engine.begin() as conn: # step 0.0 - create test environment conn.exec_driver_sql("DROP TABLE IF EXISTS main_table") conn.exec_driver_sql( "CREATE TABLE main_table (id int primary key, txt varchar(50))" ) conn.exec_driver_sql( "INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')" ) # step 0.1 - create DataFrame to UPSERT df = pd.DataFrame( [(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"] ) # step 1 - create temporary table and upload DataFrame conn.exec_driver_sql( "CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false" ) df.to_sql("temp_table", conn, index=False, if_exists="append") # step 2 - merge temp_table into main_table conn.exec_driver_sql( """ INSERT INTO main_table (id, txt) SELECT id, txt FROM temp_table ON CONFLICT (id) DO UPDATE SET txt = EXCLUDED.txt """ ) # step 3 - confirm results result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all() print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]