I’ve scraped some data from web sources and stored it all in a pandas DataFrame. Now, in order harness the powerful db tools afforded by SQLAlchemy, I want to convert said DataFrame into a Table() object and eventually upsert all data into a PostgreSQL table. If this is practical, what is a workable method of going about accomplishing this task?
Advertisement
Answer
Update: You can save yourself some typing by using this method.
If you are using PostgreSQL 9.5 or later you can perform the UPSERT using a temporary table and an INSERT ... ON CONFLICT
statement:
JavaScript
x
38
38
1
import sqlalchemy as sa
2
3
# …
4
5
with engine.begin() as conn:
6
# step 0.0 - create test environment
7
conn.exec_driver_sql("DROP TABLE IF EXISTS main_table")
8
conn.exec_driver_sql(
9
"CREATE TABLE main_table (id int primary key, txt varchar(50))"
10
)
11
conn.exec_driver_sql(
12
"INSERT INTO main_table (id, txt) VALUES (1, 'row 1 old text')"
13
)
14
# step 0.1 - create DataFrame to UPSERT
15
df = pd.DataFrame(
16
[(2, "new row 2 text"), (1, "row 1 new text")], columns=["id", "txt"]
17
)
18
19
# step 1 - create temporary table and upload DataFrame
20
conn.exec_driver_sql(
21
"CREATE TEMPORARY TABLE temp_table AS SELECT * FROM main_table WHERE false"
22
)
23
df.to_sql("temp_table", conn, index=False, if_exists="append")
24
25
# step 2 - merge temp_table into main_table
26
conn.exec_driver_sql(
27
"""
28
INSERT INTO main_table (id, txt)
29
SELECT id, txt FROM temp_table
30
ON CONFLICT (id) DO
31
UPDATE SET txt = EXCLUDED.txt
32
"""
33
)
34
35
# step 3 - confirm results
36
result = conn.exec_driver_sql("SELECT * FROM main_table ORDER BY id").all()
37
print(result) # [(1, 'row 1 new text'), (2, 'new row 2 text')]
38