Read out file and convert certain line into a correct form

Question

I have a problem. I am reading in a file. This file contains abbreviations. However, I only want to read the abbreviations. This also works. However, not in the desired format as expected, I would like to save the abbreviations cleanly per line (see below for the desired output). The problem is that I'm getting something like 't\acro{.... How can

Accepted Answer

You can use re.findall() to capture all of the abbreviations, then use the json module to dump it out into a file. Your approach could work, but you&#8217;d have to do a lot of manual string parsing, which would be a pretty massive headache. (Note that a program that can parse arbitrary LaTeX would need something more powerful than regular expressions; however, since we&#8217;re parsing a very small subset of LaTeX, regular expressions will do fine here.)import reimport jsondata = r"""chapter*{Short}addcontentsline{toc}{chapter}{Short}markboth{Short}{Short}begin{acronym}[TESTERER]    acro{knmi}[KNMI]{Koninklijk Nederlands Meteorologisch Instituut}    acro{example}[e.g.]{For example}end{acronym}"""pattern = re.compile(r"\acro{(.+)}[(.+)]{(.+)}")regex_result = re.findall(pattern, data)final_output = {}for index, (symbol, shortform, longform) in enumerate(regex_result, start=1):    final_output[f'abbreviation{index}'] =         dict(symbol=symbol, shortform=shortform, longform=longform)with open('output.json', 'w') as output_file:    json.dump(final_output, output_file, indent=4)output.json contains the following:{    "abbreviation1": {        "symbol": "knmi",        "shortform": "KNMI",        "longform": "Koninklijk Nederlands Meteorologisch Instituut"    },    "abbreviation2": {        "symbol": "example",        "shortform": "e.g.",        "longform": "For example"    }}

Advertisement

Answer