Skip to content
Advertisement

Count no. of delimiters in line while ignoring the delimiter which is part of data value

I am reading one big csv file line by line and I want to count the no. of delimiters in each line.

But if the delimiter is part of data value, then it should not be counted.

Few records of data set:

com.abc.xyz, ple Sara, "DIT, Government of Maharashtra, India"
com.mtt.rder, News Maharashtra, Time Internet Limited"
com.grner.mahya, Mh Swth, "Public Health Department, Maharashtra"

In all 3 lines, number of actual commas (which divides the data into multiple columns) are only 2

but below code snippet outputs

  • 4 commas for line 1
  • 2 for line 2
  • 3 for line 3

Code Snippet:

file1 = open('file_name.csv', 'r') 

while True: 

    line = file1.readline() 
  
    if not line: 
        break
    
    print(line.count(','))

Advertisement

Answer

One simple way could be to use regex and remove everything between two ", so that the commas inside aren’t counted.

import re
file1 = open('input.csv', 'r') 

while True: 
    line = file1.readline()   
    if not line: 
        break
    line = re.sub('".*?"', '', line)
    print(line.count(','))

Output:

2
2
2
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement