Skip to content
Advertisement

Extract blocks of text that starts with “Start Text” until it encounters another “Start Text”

Here is the code I have to extract blocks of text of a file that starts with “Start Text” until it encounters another “Start Text”.

 with open('temp.txt', "r") as f:
     buff = []
     i = 1
     for line in f:
         if line.strip():   skips the empty lines
             buff.append(line)
         if line.startswith("Start Text"):
             output = open('file' + '%d.txt' % i, 'w')
             output.write(''.join(buff))
             output.close()
             i += 1
             buff = []  # buffer reset

INPUT: “temp.txt” has the following structure:

Start Text - ABCD  
line1  
line2  
line3  
Start Text - EFG  
line4  
Start Text - P3456  
line5  
line6  

DESIRED OUTPUT: I am trying to create multiple text files below with extracted blocks of texts.

file1.txt

Start Text - ABCD  
line1  
line2  
line3 

file2.txt

Start Text - EFG  
line4 

file3.txt

Start Text - P3456  
line5  
line6

UNDESIRED OUTPUT (What the code produces)

file1.txt

Start Text - ABCD   

file2.txt

Start Text - EFG  
line1 
line2 
line3 

file3.txt

line4 
Start Text - P3456  

Here is the issue I am facing. The code creates three files but does not write “Start Text” lines into their respective text blocks. I am not sure what I am missing. I will appreciate any pointers.

Advertisement

Answer

When the code sees “Start Text” in a line, it writes that line and all the previous lines to the output file.

This explains why the first output file contains only the header — that is the first line in the input file, so obviously there aren’t any previous lines.

It seems like what you really want is for the header and the following lines to be written.

I’ve updated your code to not write a file after seeing the very first header, and also to write a file after the input file is exhausted.

buff = []
i = 1

with open('temp.txt', "r") as f:
    for line in f:
        if line.startswith("Start Text"):
            # write a file only if buff isn't empty.  (if it is 
            # empty, this must be the very first header, so we
            # don't need to write an output file yet)
            if buff:
                output = open('file' + '%d.txt' % i, 'w')
                output.write(''.join(buff))
                output.close()
                i += 1
                buff = []  # buffer reset
        if line.strip():
            buff.append(line)

# write the final section
if buff:
    output = open('file' + '%d.txt' % i, 'w')
    output.write(''.join(buff))
    output.close()
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement