Skip to content
Advertisement

In excel differentiate delimiters from content characters

I have an excel .csv file which looks like this

id,ticker1,ticker2,Company,Supplier
9,MMM,DHR,3M Company,Danaher Corporation
10,MMM,IBM,3M Company,International Business Machines Corporation
11,MMM,UNP,3M Company,Union Pacific Corporation
12,MMM,TMO,3M Company,Thermo Fisher Scientific Inc.
13,MMM,NSC,3M Company,Norfolk Southern Corporation
61,No,0,,
14,MMM,TER,3M Company,"Teradyne, Inc."

The data here is comma seperated so I use delimiter=',', but in some cases as in last line there is comma in text too. In my csv file all strings with comma in it are in quotation mark " ". I have tried to read the file with open()

with open(‘file.csv’,’r’) as f:
    for row in f.read().split(‘n’):

But this way I have to handle commas and empty values. So I am curious which is the correct way of working with damaged csv files? should I use regex expressions or libs like pandas ?

Advertisement

Answer

You should use pythons csv module. its designed for exactly this purpose. To take away the complexity of things like parsing csv data where the deliminator can appear inside fields etc.

here is a simple example. I use the csv modules DictReader this will read the first line of the CSV as key names then map each subsequent row as key values. it automatically takes care of the comma inside the supplier field.

import csv

with open("sample.csv") as my_csv:
    for row in csv.DictReader(my_csv):
        print(f'id {row["id"]} has supplier "{row["Supplier"]}"')

OUTPUT

id 9 has supplier "Danaher Corporation"
id 10 has supplier "International Business Machines Corporation"
id 11 has supplier "Union Pacific Corporation"
id 12 has supplier "Thermo Fisher Scientific Inc."
id 13 has supplier "Norfolk Southern Corporation"
id 61 has supplier ""
id 14 has supplier "Teradyne, Inc."
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement