Using Python maintain clean a text file on a specific format consistent across each line

Question

I have got a unique use case. I have got a txt file in the following format where every line starts with "APA" and ends with "||" (varies in length and content, does not matter) In some lines however, due to unknown reasons, some of these lines are split like so: Technically this line should have been like: The file

Accepted Answer

It appears that you don&#8217;t only have || at the end of the line, but also in between columns. If this is a typo, the normal .split() function would be sufficient, however if it&#8217;s not, you can use re.split() to only split on || at the end of lines. Afterwards, remove the line breaks from the resulting list elements and finally join everything back together:import redata = """APA lEDGER|5023|124223|STAFF NAME|XYZ|123||APA lEDGER|5023|124223|STAFF NAME|XYZ|131|12r2gw||APA lEDGER|5023|124223|STAFF NAME|XYZ|43s|12|123sdfq||prime||APA lEDGER|5023|hello|40937 / 903.01for period: 2021|8|332.48||"""r = re.compile(r'||$', re.M)splits = re.split(r, data)clean_lines = (line.replace('n', ' ').strip() for line in splits)clean_file = '||n'.join(clean_lines)print(clean_file.splitlines())Output:['APA lEDGER|5023|124223|STAFF NAME|XYZ|123||', 'APA lEDGER|5023|124223|STAFF NAME|XYZ|131|12r2gw||', 'APA lEDGER|5023|124223|STAFF NAME|XYZ|43s|12|123sdfq||prime||', 'APA lEDGER|5023|hello| 40937 / 903.01 for period: 2021|8|332.48||']

Advertisement

Answer