Skip to content
Advertisement

Compare 2 csv files and update columns

I have 2 csv files with 2 rows 3 columns (id, name, value) that I want to compare. If there’s a new row added to one of the files, the other one is updated as well. Likewise, if a value in one of the column changes the other file is updated.

Here’s what I tried

a = '/path/to/file'
b = 'path/to/file'

with open(a, 'r') as f1, open(b, 'r') as f2:
    file1 = csv.DictReader(f1)
    file2 = csv.DictReader(f2)

    for row_new in file2:
        for row_old in file1:
            if row_new['id'] == row_old['id']:
                for k1, v1 in row_new.items():
                    for k, v in row_old.items():
                        if row_old[k1] == row_new[k]:
                            if v1 != v:
                                print(f'updated value for col {k}')
                                v1 = v
                            else:
                                print('Nothing to update')
            else:
                print(f'create row {row_new["id"]}')

I noticed that the iteration takes place only once. Am I doing something wrong here?

Advertisement

Answer

I noticed that the iteration takes place only once…?

The inner loop is probably reaching the end of the file before the outer loop has a chance to make its next iteration. Try moving the file object’s pointer back to the beginning after the inner loop stops.

with open(a, 'r') as f1, open(b, 'r') as f2:
    ...
    for row_new in file2:
        for row_old in file1:
            if row_new['id'] == row_old['id']:
                ...
            else:
                print(f'create row {row_new["id"]}')
        f1.seek(0)

Some would say that the nested for loops is what you are doing wrong. Here are some SO questions/answers to consider.

python update a column value of a csv file according to another csv file Python Pandas: how to update a csv file from another csv file
search python csv update one csv file based on another site:stackoverflow.com

Basically you should try to just read each file once and use data types that allow for fast membership testing like sets or dicts.

Your DictReaders will give you an {'id':x,'name':y,'value':z} dict for each row – causing you to use nested for loops to compare all the rows from one file to each row in the other. You could create a single dictionary using the id column for the keys and the dictionary values could be lists – {id:[name,value],id:[name,value],...} which may make the processing easier.


You also opened both your files for reading, open(...,'r'), so you’ll probably find your files unchanged after you fix other things.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement