I am using the CSV reader to read a TSV in Python. The code is:
f = csv.reader(open('sample.csv'), delimiter='t') for chunk in f: print(chunk)
One row from the tab separated CSV file looks like this (csv hosted here):
doc | unit1_toks | unit2_toks | unit1_txt1 | unit2_txt2 | s1_toks | s2_toks | unit1_sent | unit2_sent | dir |
---|---|---|---|---|---|---|---|---|---|
GUM_bio_galois | 156-160 | 161-170 | ” We zouden dan voorstellen | dat de auteur al zijn werk zou moeten publiceren | 107-182 | 107-182 | Poisson declared Galois ‘ work ” incomprehensible ” , declaring that ” [ Galois ‘ ] argument is not sufficient . ” [ 16 ] | Poisson declared Galois ‘ work ” incomprehensible ” , declaring that ” [ Galois ‘ ] argument would then suggest that the author should publish the opinion . ” [ 16 ] | 1>2 |
I am getting the following output (the CSV reader is missing some tab spaces):
['GUM_bio_galois', '156-160', '161-170', ' We zouden dan voorstellentdat de auteur al zijn werk zou moeten publicerent107-182t107-182tPoisson declared Galois ' work incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ]', 'Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', '1>2']
I want it to look like this:
['GUM_bio_galois', '156-160', '161-170', '" We zouden dan voorstellen', 'dat de auteur al zijn werk zou moeten publiceren', '107-182', '107-182', 'Poisson declared Galois ' work incomprehensible " , declaring that " [ Galois ' ] argument is not sufficient . " [ 16 ]', 'Poisson declared Galois ' work " incomprehensible " , declaring that " [ Galois ' ] argument would then suggest that the author should publish the opinion . " [ 16 ]', '1>2']
How can I get the CSV reader to handle incomplete quotes and retain them in my output?
Advertisement
Answer
import csv with open('sample.csv') as f: rdr = csv.reader(f, quoting=csv.QUOTE_NONE, delimiter='t') header = next(rdr) for line in rdr: print(line)
or using csv.DictReader
:
import csv with open('sample.csv') as f: rdr = csv.DictReader(f, quoting=csv.QUOTE_NONE, delimiter='t') for line in rdr: print(line)