Skip to content
Advertisement

Parse a CSV file, loop and insert rows into a PostGreSQL database

I use the Python psycopg2 module to copy the content of a csv file (a list of users) into a PostGreSQL database.
So I begin to parse the CSV with the Python pandas module. Then with a for loop, I try to insert my data in my SQL queries.
I have two problems :

a) When I execute the role query (query 2 – see below) to create new roles in the database, I get ‘user’ instead of user. How could I do to insert the roles with the right syntax ?
b) The queries 3 and 4 (see below) give the following error :

TypeError: not all arguments converted during string formatting 

What is exactly this problem and how to solve it ?

Here is the complete code :

import csv, psycopg2
import pandas as pd

conn = psycopg2.connect("host=localhost dbname=vmap user=postgres password=postgres port=5432")
c = conn.cursor()
# Import_CSV
data = pd.read_csv (r'users.csv', sep=';')   
df = pd.DataFrame(data, columns= ['id','login','mdp','mail','date'])
print(df)

for row in df.itertuples():
    print (row.login)
    c.execute("INSERT INTO users (user_id, login, email) VALUES(%s, %s, %s);", (row.id, row.login, row.mail))
    # query2
    c.execute('create role "%s" with encrypted password %s',(row.login, row.mdp))
    # query3 
    c.execute('grant vitis_user, vmap_user to "%s"',(row.login))    
    # query4
    c.execute('grant connect on vmap to "%s"',(row.login))

And the DataFrame I want to parse (content of the CSV file) is this one :

   id              login  ...                             mail        date
0  10  ldeschamps-diallo  ...  ldeschamps-diallo@monwebsig.com  2022-01-31
1  11            pmarion  ...            pmarion@monwebsig.com  2022-01-31
2  12             cleroy  ...             cleroy@monwebsig.com  2022-01-31
3  13          lcourtois  ...          lcourtois@monwebsig.com  2022-01-31
4  14      rpaul-monnier  ...      rpaul-monnier@monwebsig.com  2022-01-31

Advertisement

Answer

Reading the CSV

First of all, there is probably no need for pandas here, since you only need to open the csv file and parse its contents. Using the built-in csv module should be enough.

You can read the file like this:

import csv
with open("users.csv", "r", encoding="utf-8", newline="") as fid:
    reader = csv.reader(fid, delimiter=";")

Loading

As suggested by Adrian Klaver, you can load the data using .copy_from method, but this will only help you with the first part of the problem, which is inserting the users. You will deal with the same problem you are facing now starting with query number 2.

Explanation

Another thing is, psycopg2 provides you with parametrized queries in order to protect you from SQL injection but is designed to work while escaping ‘values’, while you are trying to use it with ‘identifier’. When you first INSERT the user into the table, the username is a value to be inserted and everything works as expected. In the second query, you are refering to an actual user in the database (note that you escape the username with " instead of '). Because psycopg2 things the username is a value, it escapes it with additional ' and you end up with query like this:

create role "'cleroy'" with encrypted password 'password'

If you want to pass the username, you will probably want to use string formatting:

c.execute(f'create role "{row.login}" with encrypted password %s',(row.mdp,))

But this way, you are open to sql injection. Imagine having a username tom"; DROP DATABASE mysuperdatabase;. This way, you will end up with query: create role "tom"; DROP DATABASE mysuperdatabase; and you face potential risk of losing your precious data.

The safe way

So what you need to do first is make sure the identifier is properly escaped. You may use psycopg2.sql module to do this.

from psycopg2 import sql
login = sql.Identifier(row.login)
query = sql.SQL(f"create role {login} with encrypted password %s")
c.execute(query, (row.mdp,))
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement