Python to remove extra delimiter

Question

We have a 100MB pipe delimited file that has 5 column/4 delimiters each separated by a pipe. However there are few rows where the second column has an extra pipe. For these few rows total delimiter are 5. For example, in the below 4 rows, the 3rd is a problematic one as it has an extra pipe. Is there any

Accepted Answer

Source: my_file.txt1|B|3|D|5A|1|2|34|5D|This is a |text|3|5|7B|4|5|5|6E|1 |9 |2 |8 |Not| a |text|!!!|3|7|4Code# If using Python3.10, this can be Parenthesized context managers# https://docs.python.org/3.10/whatsnew/3.10.html#parenthesized-context-managerswith open('./my_file.txt') as file_src, open('./my_file_parsed.txt', 'w') as file_dst:    for line in file_src.readlines():        # Split the line by the character '|'        line_list = line.split('|')        if len(line_list) <= 5:            # If the number of columns doesn't exceed, just write the original line as is.            file_dst.write(line)        else:            # If the number of columns exceeds, count the number of columns that should be merged.            to_merge_columns_count = (len(line_list) - 5) + 1            # Merge the columns from index 1 to index x which includes all the columns to be merged.            merged_column = "".join(line_list[1:1+to_merge_columns_count])            # Replace all the items from index 1 to index x with the single merged column            line_list[1:1+to_merge_columns_count] = [merged_column]            # Write the updated line.            file_dst.write("|".join(line_list))Result: my_file_parsed.txt1|B|3|D|5A|1|2|34|5D|This is a text|3|5|7B|4|5|5|6E|1 9 2 8 Not a text!!!|3|7|4

Advertisement

Answer