I am reading one big csv file line by line and I want to count the no. of delimiters in each line.
But if the delimiter is part of data value, then it should not be counted.
Few records of data set:
com.abc.xyz, ple Sara, "DIT, Government of Maharashtra, India" com.mtt.rder, News Maharashtra, Time Internet Limited" com.grner.mahya, Mh Swth, "Public Health Department, Maharashtra"
In all 3 lines, number of actual commas (which divides the data into multiple columns) are only 2
but below code snippet outputs
- 4 commas for line 1
- 2 for line 2
- 3 for line 3
Code Snippet:
file1 = open('file_name.csv', 'r') while True: line = file1.readline() if not line: break print(line.count(','))
Advertisement
Answer
One simple way could be to use regex and remove everything between two "
, so that the commas inside aren’t counted.
import re file1 = open('input.csv', 'r') while True: line = file1.readline() if not line: break line = re.sub('".*?"', '', line) print(line.count(','))
Output:
2 2 2