I have a text file in this format:
000000.png 712,143,810,307,0 000001.png 599,156,629,189,3 387,181,423,203,1 676,163,688,193,5 000002.png 657,190,700,223,1 000003.png 614,181,727,284,1 000004.png 280,185,344,215,1 365,184,406,205,1
I want to remove the lines that don’t have a [number1,number2,number3,number4,1] or [number1,number2,number3,number4,5] ending and also strip the text line and remove the [blocks] -> [number1,number2,number3,number4,number5] that don’t fulfill this condition.
The above text file should look like this in the end:
000001.png 387,181,423,203,1 676,163,688,193,5 000002.png 657,190,700,223,1 000003.png 614,181,727,284,1 000004.png 280,185,344,215,1 365,184,406,205,1
My code:
import os with open("data.txt", "r") as input: with open("newdata.txt", "w") as output: # iterate all lines from file for line in input: # if substring contain in a line then don't write it if ",0" or ",2" or ",3" or ",4" or ",6" not in line.strip("n"): output.write(line)
I have tried something like this and it didn’t work obviously.
Advertisement
Answer
No need for Regex, this might help you:
with open("data.txt", "r") as input: # Read all data lines. data = input.readlines() with open("newdata.txt", "w") as output: # Create output file. for line in data: # Iterate over data lines. line_elements = line.split() # Split line by spaces. line_updated = [line_elements[0]] # Initialize fixed line (without undesired patterns) with image's name. for i in line_elements[1:]: # Iterate over groups of numbers in current line. tmp = i.split(',') # Split current group by commas. if len(tmp) == 5 and (tmp[-1] == '1' or tmp[-1] == '5'): line_updated.append(i) # If the pattern is Ok, append group to fixed line. if len(line_updated) > 1: # If the fixed line is valid, write it to output file. output.write(f"{' '.join(line_updated)}n")