I am text mining a large document. I want to extract a specific line.
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES SPE2DH-20-T-0133 SECTION B PR: 0081939954 NSN/MATERIAL: 6530015627381 ITEM DESCRIPTION BOTTLE, SAFETY CAP BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
I want to extract the description immediately under ITEM DESCRIPTION
.
I have tried many unsuccessful attempts.
My latest attempt was:
for line in text: if 'ITEM' and 'DESCRIPTION'in line: print ('Possibe Descript:n', line)
But it did not find the text.
Is there a way to find ITEM DESCRIPTION
and get the line after it or something similar?
Advertisement
Answer
The following function finds the description on the line below some given pattern
, e.g. “ITEM DESCRIPTION”, and also ignores any blank lines that may be present in between. However, beware that the function does not handle the special case when the pattern exists, but the description does not.
txt = ''' CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES SPE2DH-20-T-0133 SECTION B PR: 0081939954 NSN/MATERIAL: 6530015627381 ITEM DESCRIPTION BOTTLE, SAFETY CAP BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT: '''
I’ve assumed you got your text as a text string, and thus the function below will split it into a list of lines ..
pattern = "ITEM DESCRIPTION" # to search for def find_pattern_in_txt(txt, pattern): lines = [line for line in txt.split("n") if line] # remove empty lines if pattern in lines: return lines[lines.index(pattern)+1] return None print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"