I use the Python psycopg2 module to copy the content of a csv file (a list of users) into a PostGreSQL database.
So I begin to parse the CSV with the Python pandas module. Then with a for loop, I try to insert my data in my SQL queries.
I have two problems :
a) When I execute the role query (query 2 – see below) to create new roles in the database, I get ‘user’ instead of user. How could I do to insert the roles with the right syntax ?
b) The queries 3 and 4 (see below) give the following error :
TypeError: not all arguments converted during string formatting
What is exactly this problem and how to solve it ?
Here is the complete code :
import csv, psycopg2 import pandas as pd conn = psycopg2.connect("host=localhost dbname=vmap user=postgres password=postgres port=5432") c = conn.cursor() # Import_CSV data = pd.read_csv (r'users.csv', sep=';') df = pd.DataFrame(data, columns= ['id','login','mdp','mail','date']) print(df) for row in df.itertuples(): print (row.login) c.execute("INSERT INTO users (user_id, login, email) VALUES(%s, %s, %s);", (row.id, row.login, row.mail)) # query2 c.execute('create role "%s" with encrypted password %s',(row.login, row.mdp)) # query3 c.execute('grant vitis_user, vmap_user to "%s"',(row.login)) # query4 c.execute('grant connect on vmap to "%s"',(row.login))
And the DataFrame I want to parse (content of the CSV file) is this one :
id login ... mail date 0 10 ldeschamps-diallo ... ldeschamps-diallo@monwebsig.com 2022-01-31 1 11 pmarion ... pmarion@monwebsig.com 2022-01-31 2 12 cleroy ... cleroy@monwebsig.com 2022-01-31 3 13 lcourtois ... lcourtois@monwebsig.com 2022-01-31 4 14 rpaul-monnier ... rpaul-monnier@monwebsig.com 2022-01-31
Advertisement
Answer
Reading the CSV
First of all, there is probably no need for pandas
here, since you only need to open the csv file and parse its contents. Using the built-in csv
module should be enough.
You can read the file like this:
import csv with open("users.csv", "r", encoding="utf-8", newline="") as fid: reader = csv.reader(fid, delimiter=";")
Loading
As suggested by Adrian Klaver, you can load the data using .copy_from
method, but this will only help you with the first part of the problem, which is inserting the users. You will deal with the same problem you are facing now starting with query number 2.
Explanation
Another thing is, psycopg2
provides you with parametrized queries in order to protect you from SQL injection but is designed to work while escaping ‘values’, while you are trying to use it with ‘identifier’. When you first INSERT the user into the table, the username is a value to be inserted and everything works as expected. In the second query, you are refering to an actual user in the database (note that you escape the username with "
instead of '
). Because psycopg2
things the username is a value, it escapes it with additional '
and you end up with query like this:
create role "'cleroy'" with encrypted password 'password'
If you want to pass the username, you will probably want to use string formatting:
c.execute(f'create role "{row.login}" with encrypted password %s',(row.mdp,))
But this way, you are open to sql injection. Imagine having a username tom"; DROP DATABASE mysuperdatabase;
. This way, you will end up with query:
create role "tom"; DROP DATABASE mysuperdatabase;
and you face potential risk of losing your precious data.
The safe way
So what you need to do first is make sure the identifier is properly escaped. You may use psycopg2.sql
module to do this.
from psycopg2 import sql login = sql.Identifier(row.login) query = sql.SQL(f"create role {login} with encrypted password %s") c.execute(query, (row.mdp,))