I am trying to insert data from a json into my postgres table with python (psycopg2). This json is huge and now I am questioning what the proper way is to insert all of this data.
What makes it difficult for me is that the table has a reference to another table. I want to refer to the id of that table based on one of the values in my json.
Details of my question:
CREATE TABLE market.entities ( id serial PRIMARY KEY, code TEXT NOT null, name TEXT NOT NULL, country TEXT NOT null, exchange TEXT, currency_code TEXT NOT null, "type" TEXT, isin TEXT, api_source int REFERENCES market.source_api(id) NOT null, imported_at date NOT null );
where this table entities is already filled with data like this:
id|code|name|country|exchange|currency_code|type|isin|api_source|imported_at ---------------------------------------------------------------------------- 1|A|Agilent Technologies, Inc|USA|NYSE|USD|Common Stock|US00846U1016|1|2021-07-17 ----------------------------------------------------------------------------
The table that I want to fill with the data from the json:
CREATE TABLE IF NOT EXISTS market.end_of_days( id serial PRIMARY KEY, entity_id int REFERENCES market.entities(id), import_type int REFERENCES market.import_types(id), source_api int REFERENCES market.source_api(id), date date, open int, high int, low int, close int, adjusted_close int, volume int, imported_at date, UNIQUE (date, source_api, entity_id) );
part of the json:
[ { "code": "ONTRF", "exchange_short_name": "US", "date": "2021-07-08", "open": 0.1393, "high": 0.1393, "low": 0.1393, "close": 0.1393, "adjusted_close": 0.1393, "volume": 0 }, { "code": "ONTX", "exchange_short_name": "US", "date": "2021-07-08", "open": 5.72, "high": 6.19, "low": 5.7, "close": 6.07, "adjusted_close": 6.07, "volume": 324700 } ]
The “code” key in the json is already located in the entities table, so in the end_of_days.entity_id I want to refer to the id from the entities table based on this “code” value.
It there a pretty solution for this with psycopg2? Normally I would just use psycopg2.extras.execute_values() but I think this won’t work in this specific situation.
(P.S. this is my first time posting on stackoverflow so if I need to specify something, or structure my question differently, please let me know.)
Advertisement
Answer
This is not really an answer but a suggestion for a possible way to handle this. While figuring this out break the JSON data into smaller chunks for testing purposes. The idea:
Create a staging table that you directly INSERT
the JSON from the file(s) into jsonb
fields in the table. Use the JSON capabilities of psycopg2
. Then you could use the JSON processing functions from here JSON Function in ‘Table 9.47. JSON Processing Functions’ to modify and move the data to market.end_of_days
UPDATE
I found it easier to do something like:
CREATE TABLE symbol_import (code varchar, exchange_short_name varchar, "date" date, open numeric, high numeric, low numeric, close numeric, adjusted_close numeric, volume numeric);
import json import psycopg2 from psycopg2.extras import execute_batch json_val = """[ { "code": "ONTRF", "exchange_short_name": "US", "date": "2021-07-08", "open": 0.1393, "high": 0.1393, "low": 0.1393, "close": 0.1393, "adjusted_close": 0.1393, "volume": 0 }, { "code": "ONTX", "exchange_short_name": "US", "date": "2021-07-08", "open": 5.72, "high": 6.19, "low": 5.7, "close": 6.07, "adjusted_close": 6.07, "volume": 324700 } ]""" data_dict = json.loads(json_val) con = psycopg2.connect("...") cur = con.cursor() data_insert = "insert into symbol_import values(%(code)s, %(exchange_short_name)s, %(date)s, %(open)s, %(high)s, %(low)s, %(close)s, %(adjusted_close)s, %(volume)s)" execute_batch(cur, data_insert, data_dict) con.commit()