Skip to content
Advertisement

Import csv: remove filename from column names in first row

I am using Python 3.5. I have several csv files:

The csv files are named according to a fixed structure. They have a fixed prefix (always the same) plus a varying filename part:

099_2019_01_01_filename1.csv
099_2019_01_01_filename2.csv

My original csv files look like this:

filename1-Streetname filename1-ZIPCODE
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
TEXT TEXT
Name1 2012932
Name2 3023923

I am manipulating these files using the following code (I am reading the csv files from a source folder and writing them to a destination folder. I am skipping certain rows as I do not want to include this information):

I cut off the TEXT rows, as I do not need them:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            writer.writerows(rows)

(this code works) gives

filename1-Streetname filename1-ZIPCODE
Street1 2012932
Street2 3023923

filename2-Name filename2-Phone
Name1 2012932
Name2 3023923

The first row contains the header. In the header names there is always the filename (however without the 099_2019_01_01_ prefix) plus a “-“. The filename ending .csv is missing. I want to remove this “filename-” for each csv file.

The core part now is to get the first row and only for this row to perform a replace. I need to cut off the prefix and the .csv and then perform a general replace. The first replace could be something like this:

  1. Either I could start with a function to cut off the first n signs, as the length is fixed or
  2. According to this solution just use string.removeprefix('099_2019_01_01_')

As I have Python 3.5 I cannot use removeprefix so I try to just simple replace it.

string.replace(“099_2019_01_01_”,””)

Then I need to remove the .csv which is easy:

string.replace(".csv","")

I put this together and I get (string.replace("099_2019_01_01_","")).replace(".csv",""). (Plus at the end the “-” needs to be removed too, see in the code below). I am not sure if this works.

My main problem is now for this csv import code that I do not know how I can manipulate only the first row when reading/writing the csv. So I want to replace this only in the first row. I tried something like this:

import csv
    
skiprows = (1,2,3,4,5,6)
for file in os.listdir(sourcefolder):
    with open(os.path.join(sourcefolder,file)) as fp_in:
        reader = csv.reader(fp_in, delimiter=';')
        rows = [row for i, row in enumerate(reader) if i not in skiprows]
        with open(os.path.join(destinationfolder,file), 'w', newline='') as fp_out:
            writer = csv.writer(fp_out)
            rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")
            writer.writerows(rows)

This gives an error as the idea with rows[0] is not working. How can I do this?

(I am not sure if I should try to include this replacing in the code or to put it into a second code which runs after the first code. However, then I would read and write csv files again I assume. So I think it would be most efficient to implement it into this code. Otherwise I need to open and change and save every file again. However, if it is not possible to include it into this code I would be also fine with a code which runs stand-alone and just does the replacing assuming the csv file have the rows 0 as header and then the data comes.)

Please note that I do want to go this way with csv and not use pandas.

EDIT: At the end the csv files should look like this:

Streetname ZIPCode
Street1 9999
Street2 9848

Name Phone
Name1 23421
Name2 23232

Advertisement

Answer

Try by replacing this:

rows[0].replace((file.replace("099_2019_01_01_","")).replace(".csv","")+"-","")

By this in your code:

x=file.replace('099_2019_01_01_','').replace('.csv', '')
rows[0]=[i.replace(x+'-', '') for i in rows[0]]

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement