Skip to content
Advertisement

Parsing data containing escaped quotes and separators in python

I have data that is structured like this:

1661171420, foo="bar", test="This, is a "TEST"", count=5, com="foo, bar=blah"

It always starts with a unix timestamp, but then I can’t know how many other fields follow and how they are called.

The goal is to parse this into a dictionary as such:

{"timestamp": 1661171420,
 "foo": "bar",
 "test": 'This, is a "TEST"',
 "count": 5,
 "com": "foo, bar=blah"}

I’m having trouble parsing this, especially regarding the escaped quotes and commas in the values. What would be the best way to parse this correctly? preferably without any 3rd party modules.

Advertisement

Answer

If changing the format of input data is not an option (JSON would be much easier to handle, but if it is an API as you say then you might be stuck with this) the following would work assuming the file follows given structure more or less. Not the cleanest solution, I agree, but it does the job.

import re

d = r'''1661171420, foo="bar", test="This, is a "TEST"", count=5, com="foo, bar=blah", fraction=-0.11'''.replace(r""", "'''")

string_pattern = re.compile(r'''(w+)="([^"]*)"''')

matches = re.finditer(string_pattern, d)

parsed_data = {}
parsed_data['timestamp'] = int(d.partition(", ")[0])
for match in matches:
    parsed_data[match.group(1)] = match.group(2).replace("'''", """)

number_pattern = re.compile(r'''(w+)=([+-]?d+(?:.d+)?)''')

matches = re.finditer(number_pattern, d)
for match in matches:
    try:
        parsed_data[match.group(1)] = int(match.group(2))
    except ValueError:
        parsed_data[match.group(1)] = float(match.group(2))

print(parsed_data)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement