There is a output file from Python Pandas with a lot of columns with headers.
I need to be able handle this file by script and get CSV files in different columns positions.
For example, initial file has columns
Name, Age, Country Oli. 18. USA
As variation I need to get it in different sequence:
Age, Country Name 18. USA. Oli
I wonder what is the best way to store this file to get data from it by specific columns
Advertisement
Answer
When you need to load data into the table from CSV file in which the column ordering is not definite then you may use the code similar to below one:
SET @name1 = NULL; SET @name2 = NULL; SET @name3 = NULL; LOAD DATA INFILE 'file_path_name' INTO TABLE table_name {loading options, but no line ignoring} (@column1, @column2, @column3) SET @name1 = COALESCE(@name1, @column1), @name2 = COALESCE(@name2, @column2), @name3 = COALESCE(@name3, @column3), Name = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL WHEN @name1 = 'Name' Then @column1 WHEN @name2 = 'Name' Then @column2 WHEN @name3 = 'Name' Then @column3 ELSE NULL END, Age = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL WHEN @name1 = 'Age' Then @column1 WHEN @name2 = 'Age' Then @column2 WHEN @name3 = 'Age' Then @column3 ELSE NULL END, Country = CASE WHEN (@name1, @name2, @name3) = (@column1,@column2,@column3) THEN NULL WHEN @name1 = 'Country' Then @column1 WHEN @name2 = 'Country' Then @column2 WHEN @name3 = 'Country' Then @column3 ELSE NULL END; DELETE FROM table_name WHERE Name IS NULL AND Age IS NULL AND Country IS NULL;
Three variables are cleared.
Then the file is loaded and parsed with input preprocessing line by line.
First line of CSV contains columns names. First 3 assiginings stores the columns ordering in the variables. For all next rows these value won’t be altered.
Then we fill the table columns values by selecting the value which matches the column from correct variable with according CASEs. The most first condition in each CASE will fire again for the first line only. We cannot skip line in SET clause but we can set the values to some incorrect predefined values for future detection. These values shouldn’t be NULL, for example, you may set Name to ‘This row must be deleted’ and/or Age to 999. Also you may use not all but only some columns values for the detection (but you must assign correct values to NOT NULL columns, so this condition can be removed for the CASE of the column only when the column is nullable).
After all lines loaded we delete excess line inserted while the header line is loaded.