Skip to content
Advertisement

How to extract specific line in text file

I am text mining a large document. I want to extract a specific line.

CONTINUED ON NEXT PAGE   CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES  

SPE2DH-20-T-0133   SECTION B  

PR: 0081939954   NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP   RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:

I want to extract the description immediately under ITEM DESCRIPTION.

I have tried many unsuccessful attempts.

My latest attempt was:

for line in text:
    if 'ITEM' and 'DESCRIPTION'in line:
        print ('Possibe Descript:n', line)

But it did not find the text.

Is there a way to find ITEM DESCRIPTION and get the line after it or something similar?

Advertisement

Answer

The following function finds the description on the line below some given pattern, e.g. “ITEM DESCRIPTION”, and also ignores any blank lines that may be present in between. However, beware that the function does not handle the special case when the pattern exists, but the description does not.

txt = '''
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED:    PAGE 4 OF 16 PAGES

SPE2DH-20-T-0133 SECTION B

PR: 0081939954 NSN/MATERIAL: 6530015627381

ITEM DESCRIPTION

BOTTLE, SAFETY CAP

BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT

RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
'''

I’ve assumed you got your text as a text string, and thus the function below will split it into a list of lines ..

pattern = "ITEM DESCRIPTION" # to search for

def find_pattern_in_txt(txt, pattern):
    lines = [line for line in txt.split("n") if line] # remove empty lines
    if pattern in lines: return lines[lines.index(pattern)+1]
    return None

print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement