Skip to content
Advertisement

How to store CSV file in database?

There is a output file from Python Pandas with a lot of columns with headers.

I need to be able handle this file by script and get CSV files in different columns positions.

For example, initial file has columns

Name, Age, Country
Oli.  18.  USA

As variation I need to get it in different sequence:

Age, Country Name
 18.  USA.    Oli

I wonder what is the best way to store this file to get data from it by specific columns

Advertisement

Answer

When you need to load data into the table from CSV file in which the column ordering is not definite then you may use the code similar to below one:

SET @name1 = NULL;
SET @name2 = NULL;
SET @name3 = NULL;

LOAD DATA INFILE 'file_path_name'
INTO TABLE table_name
{loading options, but no line ignoring}
(@column1, @column2, @column3)
SET
    @name1 = COALESCE(@name1, @column1),
    @name2 = COALESCE(@name2, @column2),
    @name3 = COALESCE(@name3, @column3),
    Name = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL
                WHEN @name1 = 'Name' Then @column1
                WHEN @name2 = 'Name' Then @column2
                WHEN @name3 = 'Name' Then @column3
                ELSE NULL
                END,
    Age = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL
               WHEN @name1 = 'Age' Then @column1
               WHEN @name2 = 'Age' Then @column2
               WHEN @name3 = 'Age' Then @column3
               ELSE NULL
               END,
    Country = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL
                   WHEN @name1 = 'Country' Then @column1
                   WHEN @name2 = 'Country' Then @column2
                   WHEN @name3 = 'Country' Then @column3
                   ELSE NULL
                   END;

DELETE FROM table_name WHERE Name IS NULL AND Age IS NULL AND Country IS NULL;                   

Three variables are cleared.

Then the file is loaded and parsed with input preprocessing line by line.

First line of CSV contains columns names. First 3 assiginings stores the columns ordering in the variables. For all next rows these value won’t be altered.

Then we fill the table columns values by selecting the value which matches the column from correct variable with according CASEs. The most first condition in each CASE will fire again for the first line only. We cannot skip line in SET clause but we can set the values to some incorrect predefined values for future detection. These values shouldn’t be NULL, for example, you may set Name to ‘This row must be deleted’ and/or Age to 999. Also you may use not all but only some columns values for the detection (but you must assign correct values to NOT NULL columns, so this condition can be removed for the CASE of the column only when the column is nullable).

After all lines loaded we delete excess line inserted while the header line is loaded.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement