Skip to content
Advertisement

Best practise for creating database objects from .csv file

My situation is: having .csv file with some columns e.g. name, surname, age, activity, and others I want to create objects in my relational database with row from this file being single object. Column names correspond to model fields and with others being multiple columns defined by user who creates .csv (those others land in model’s filed comment separated with commas). Having relational db, base on model_1's object I create model_2 and model_3, all with the info from this .csv file.

My project is based on django rest framework and react.

Up till now my solution for this was to analize .csv file on with react using FileReader and readAsArrayBuffer which did it’s job most of the time.

With react I not only have to analyze the file, but for every row make at least 3 x POSTing with axios. Sequentional posting is not ideal all the time.

My question is: > should files be analyzed on front- or back-end side ?

Heaving it done on the backend side with python seems a lot easier, but there might be a better solution of which I can’t think of.

EDIT: The database I use is postgresql.

Csv file size could go up to a 6000 rows. With this many rows I could just completely ram up on posts from react. Also making a single post with file to handle the work to django could also result in a overtime for response from server.

EDIT2:

Example .csv file, might be a 6k rows long or more.

name,surname,activity,comment,Add_1,Add_2
franko1,romano2,asd1,smth1,1,2
franko2,romano3,asd2,smth2,2,3
franko3,romano4,asd3,smth3,3,4
franko4,romano5,asd4,smth4,4,5
franko5,romano6,asd5,smth5,5,6
franko6,romano7,asd6,smth6,6,7
franko7,romano8,asd7,smth7,7,8
franko8,romano9,asd8,smth8,8,9
franko9,romano10,asd9,smth9,9,10

Example models:

class Users(models.Model):
    name = models.CharField()
    surname = models.CharField()

class Clients(models.Model):
    user = models.ForeignKey(Users)
    activity = models.CharField()

class Returns(models.Model):
    client = models.ForeignKey(Clients)
    comment = models.CharField() <-- Add_1, Add_2 appended here

These models and .csv file are made up, as I am not allowed to provide those right ones, but dependencies between them are similar.

Users to Clients is OneToOne, Returns to Clients is ManyToOne relation. My .csv contain a info to make all of those models sequentially: Users, Clients and Returns. Unfortunetally I’m not able to cut it down even to two models.

I have found a great node package for .csv files loadup – papaparse, opensource, great documentation and unbelivable functionality!

With this package I end up with already .json format for my upload.

Advertisement

Answer

I would use pandas, as it’s easiest to deal with CSV files quickly

import pandas as pd
from my_app.models import Users,Clients,Returns

def convert_csv_to_models(csv_abs_path):
    df = pd.read_csv(csv_abs_path)
    df['users'] = df.apply(lambda row: Users(name=row['name'], surname=row['surname']), axis=1) # axis = 1 means the apply will run per row and per columns
    df['clients'] = df.apply(lambda row: Clients(user=row['users'], activity=row['activity']), axis=1)
    df['returns'] = df.apply(
        lambda row: Returns(client=row['clients'], comment=f'{row.comment} {row.Add_1} {row.Add_2}'), axis=1)

    cols_to_models = {'users': Users, 'clients': Clients, 'returns': Returns}
    for model, col in cols_to_models.items():
        # convert the column of objs to a list and then bulk create it
        objs = df[col].tolist()
        # ignore conflicts will ignore skip an object if it already exist
        model.objects.bulk_create(objs, ignore_conflicts=True)

I didn’t quite understand what You wanted to achieve with the Returns and comments column, so I joined the comment, Add_1, and Add_2 columns to one string as the comment, you can modify as you need.

Also, it’s not common practice to call Django Models in plural, you should do:

class User(models.Model):
    name = models.CharField()
    surname = models.CharField()

class Client(models.Model):
    user = models.ForeignKey(Users)
    activity = models.CharField()

class Return(models.Model):
    client = models.ForeignKey(Clients)
    comment = models.CharField()

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement