Skip to content
Advertisement

Inserting data into psql database with high performance

Assume, I have a Python program, and I have an Offer object Offer(title='title1', category='cat1', regions=['reg1']). I want to add this Offer into psql db, with minimal number of queries (performance). Inserts of new regions and categories are rare (number of regions and categories is limited (and are unique), but number of offers is unlimited). Basically Regions and Categories can be inserted by query:

INSERT INTO Categories(name)
SELECT 'cat1'
WHERE NOT EXISTS(
        SELECT 1 FROM Categories WHERE name = 'cat1'
    )
RETURNING id;

, but I need to execute another query to get id of region/category (when region/category already exists). I need this id to execute query inserting data to Offers table:

INSERT INTO Offers(title, category)
SELECT 'title1', (SELECT id FROM Regions WHERE name = 'reg1')
WHERE NOT EXISTS(
        SELECT 1 FROM Offers WHERE title = 'title1' AND category = (SELECT id FROM Regions WHERE name = 'reg1')
    );

Currently my code looks like:

INSERT INTO Categories(name)
SELECT 'cat1'
WHERE NOT EXISTS(
        SELECT 1 FROM Categories WHERE name = 'cat1'
    );

INSERT INTO Regions(name)
SELECT 'reg1'
WHERE NOT EXISTS(
        SELECT 1 FROM Regions WHERE name = 'reg1'
    );

INSERT INTO Offers(title, category)
SELECT 'title1', (SELECT id FROM Regions WHERE name = 'reg1')
WHERE NOT EXISTS(
        SELECT 1 FROM Offers WHERE title = 'title1' AND category = (SELECT id FROM Regions WHERE name = 'reg1')
    );

INSERT INTO OfferRegions(offer, region)
SELECT (SELECT id FROM Offers WHERE title = 'title1'), (SELECT id FROM Regions WHERE name = 'reg1')
WHERE NOT EXISTS(
        SELECT 1 FROM OfferRegions WHERE offer = (SELECT id FROM Offers WHERE title = 'title1') AND region = (SELECT id FROM Regions WHERE name = 'reg1')
    );

I don’t know how to do it robust and efficient (without unnecessary SELECTs). I work with Python/psycopg2.


Tables:

CREATE TABLE IF NOT EXISTS Regions
(
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    UNIQUE (name)
);

CREATE TABLE IF NOT EXISTS Categories
(
    id   SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    UNIQUE (name)
);

CREATE TABLE IF NOT EXISTS Offers
(
    id       SERIAL PRIMARY KEY,
    title    TEXT,
    category SERIAL REFERENCES Categories (id) ON UPDATE CASCADE ON DELETE CASCADE,
    UNIQUE (id)
);

CREATE TABLE IF NOT EXISTS OfferRegions
(
    offer  SERIAL REFERENCES Offers (id) ON UPDATE CASCADE ON DELETE CASCADE,
    region SERIAL REFERENCES Regions (id) ON UPDATE CASCADE ON DELETE CASCADE,
    UNIQUE (offer, region)
);

Advertisement

Answer

You are looking for INSERT ... ON CONFLICT DO NOTHING.

For that, you need a unique constraint on the column that identifies the object.

That allows you to retrieve the generated id, for example:

INSERT INTO categories (name) VALUES ('cat1')
ON CONFLICT ON (name) DO NOTHING
RETURNING id;

Dependent tables could be filled like this, using a variable cat_id that was set with the result from the above query:

INSERT INTO offers (category, title)
SELECT CASE WHEN cat_id IS NULL
            THEN (SELECT id FROM categories WHERE name = 'cat1')
            ELSE cat_id
       END,
       /* similar for "title */

There is of course a race condition: someone could delete the categories row before your second INSERT. But perhaps that is good enough.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement