I am trying to insert data from a json into my postgres table with python (psycopg2). This json is huge and now I am questioning what the proper way is to insert all of this data.
What makes it difficult for me is that the table has a reference to another table. I want to refer to the id of that table based on one of the values in my json.
Details of my question:
CREATE TABLE market.entities (
id serial PRIMARY KEY,
code TEXT NOT null,
name TEXT NOT NULL,
country TEXT NOT null,
exchange TEXT,
currency_code TEXT NOT null,
"type" TEXT,
isin TEXT,
api_source int REFERENCES market.source_api(id) NOT null,
imported_at date NOT null
);
where this table entities is already filled with data like this:
id|code|name|country|exchange|currency_code|type|isin|api_source|imported_at
----------------------------------------------------------------------------
1|A|Agilent Technologies, Inc|USA|NYSE|USD|Common Stock|US00846U1016|1|2021-07-17
----------------------------------------------------------------------------
The table that I want to fill with the data from the json:
CREATE TABLE IF NOT EXISTS market.end_of_days(
id serial PRIMARY KEY,
entity_id int REFERENCES market.entities(id),
import_type int REFERENCES market.import_types(id),
source_api int REFERENCES market.source_api(id),
date date,
open int,
high int,
low int,
close int,
adjusted_close int,
volume int,
imported_at date,
UNIQUE (date, source_api, entity_id)
);
part of the json:
[
{
"code": "ONTRF",
"exchange_short_name": "US",
"date": "2021-07-08",
"open": 0.1393,
"high": 0.1393,
"low": 0.1393,
"close": 0.1393,
"adjusted_close": 0.1393,
"volume": 0
},
{
"code": "ONTX",
"exchange_short_name": "US",
"date": "2021-07-08",
"open": 5.72,
"high": 6.19,
"low": 5.7,
"close": 6.07,
"adjusted_close": 6.07,
"volume": 324700
}
]
The “code” key in the json is already located in the entities table, so in the end_of_days.entity_id I want to refer to the id from the entities table based on this “code” value.
It there a pretty solution for this with psycopg2? Normally I would just use psycopg2.extras.execute_values() but I think this won’t work in this specific situation.
(P.S. this is my first time posting on stackoverflow so if I need to specify something, or structure my question differently, please let me know.)
Advertisement
Answer
This is not really an answer but a suggestion for a possible way to handle this. While figuring this out break the JSON data into smaller chunks for testing purposes. The idea:
Create a staging table that you directly INSERT
the JSON from the file(s) into jsonb
fields in the table. Use the JSON capabilities of psycopg2
. Then you could use the JSON processing functions from here JSON Function in ‘Table 9.47. JSON Processing Functions’ to modify and move the data to market.end_of_days
UPDATE
I found it easier to do something like:
CREATE TABLE symbol_import
(code varchar,
exchange_short_name varchar,
"date" date,
open numeric,
high numeric,
low numeric,
close numeric,
adjusted_close numeric,
volume numeric);
import json
import psycopg2
from psycopg2.extras import execute_batch
json_val = """[
{
"code": "ONTRF",
"exchange_short_name": "US",
"date": "2021-07-08",
"open": 0.1393,
"high": 0.1393,
"low": 0.1393,
"close": 0.1393,
"adjusted_close": 0.1393,
"volume": 0
},
{
"code": "ONTX",
"exchange_short_name": "US",
"date": "2021-07-08",
"open": 5.72,
"high": 6.19,
"low": 5.7,
"close": 6.07,
"adjusted_close": 6.07,
"volume": 324700
}
]"""
data_dict = json.loads(json_val)
con = psycopg2.connect("...")
cur = con.cursor()
data_insert = "insert into symbol_import values(%(code)s, %(exchange_short_name)s, %(date)s, %(open)s, %(high)s, %(low)s, %(close)s, %(adjusted_close)s, %(volume)s)"
execute_batch(cur, data_insert, data_dict)
con.commit()