How to merge CSV files such that rows with a unique identifier are added into the same row of the output?

Question

I'm using Python to merge 4 headerless CSV's into one output file. Each CSV has a unique number in the first column as shown in the 2 example CSV files below: 1.csv 2.csv I have generated the output from these CSV's using the following code. This works fine and outputs a file. The data ends up as follows Is there

Accepted Answer

You could use dictionary to put all data as{1: [1, "Ringo", "Beatles", 1, "TSLA", "XNAS", 1.0, "USD", 1, 3, 1, 100, 309.0125], 2: [2, ...],3: [3, ...],...}and later write all in new file.So first create empty dictionary. ie. new_rows = {}Next get row from file, get ID and check if it exists in dictionary. If not exists then create it with list with has only ID new_rows[key] = [key]Next you can add other values from row to this list new_rows[key] += valuesRepeate it for all rows in all files.And later you can use this dictionary to write all rows to new file.I use io only to simulate files in memory but you should use open()text1 = '''1,Ringo,Beatles2,John,Beatles3,Mick,Rolling Stones4,Keith,Rolling Stones5,Rivers,Weezer'''text2 = '''1,TSLA,XNAS,1.0,USD2,AAPL,XNAS,1.0,USD3,SPY,ARCX,1.0,USD4,BP LN,XLON,1.0,GBP5,ESUD,XCME,1.0,USD'''import osimport csvimport ionew_rows = {} # dictfilenames = [text1, text2]#filenames = ['1.csv', '2.csv', '3.csv', '4.csv']for fname in filenames:    #with open(fname) as infile:    with io.StringIO(fname) as infile:        reader = csv.reader(infile)        for row in reader:            key = row[0]      # ID            values = row[1:]  # rest                        # create key if not exists            if key not in new_rows:                new_rows[key] = [key]                            new_rows[key] += values  # add two lists                        # OR            #if key not in new_rows:            #    new_rows[key] = values    # only for first file            #else:            #     new_rows[key] += values  # for other file - add two lists # --- write it  ---with open('output_file', 'w') as outfile:    writer = csv.writer(outfile)    all_rows = new_rows.values()    writer.writerows(all_rows)   # `writerows` with `s` to write list with many rows.BTW:In older Python dict didn&#8217;t have to keep order so it could write new rows in different order &#8211; and it would need to sort list of rows before saving or it would need to use collections.OrderedDict()

Advertisement

Answer