Here is the code I have to extract blocks of text of a file that starts with “Start Text” until it encounters another “Start Text”.
with open('temp.txt', "r") as f: buff = [] i = 1 for line in f: if line.strip(): skips the empty lines buff.append(line) if line.startswith("Start Text"): output = open('file' + '%d.txt' % i, 'w') output.write(''.join(buff)) output.close() i += 1 buff = [] # buffer reset
INPUT: “temp.txt” has the following structure:
Start Text - ABCD line1 line2 line3 Start Text - EFG line4 Start Text - P3456 line5 line6
DESIRED OUTPUT: I am trying to create multiple text files below with extracted blocks of texts.
file1.txt
Start Text - ABCD line1 line2 line3
file2.txt
Start Text - EFG line4
file3.txt
Start Text - P3456 line5 line6
UNDESIRED OUTPUT (What the code produces)
file1.txt
Start Text - ABCD
file2.txt
Start Text - EFG line1 line2 line3
file3.txt
line4 Start Text - P3456
Here is the issue I am facing. The code creates three files but does not write “Start Text” lines into their respective text blocks. I am not sure what I am missing. I will appreciate any pointers.
Advertisement
Answer
When the code sees “Start Text” in a line, it writes that line and all the previous lines to the output file.
This explains why the first output file contains only the header — that is the first line in the input file, so obviously there aren’t any previous lines.
It seems like what you really want is for the header and the following lines to be written.
I’ve updated your code to not write a file after seeing the very first header, and also to write a file after the input file is exhausted.
buff = [] i = 1 with open('temp.txt', "r") as f: for line in f: if line.startswith("Start Text"): # write a file only if buff isn't empty. (if it is # empty, this must be the very first header, so we # don't need to write an output file yet) if buff: output = open('file' + '%d.txt' % i, 'w') output.write(''.join(buff)) output.close() i += 1 buff = [] # buffer reset if line.strip(): buff.append(line) # write the final section if buff: output = open('file' + '%d.txt' % i, 'w') output.write(''.join(buff)) output.close()