I am text mining a large document. I want to extract a specific line.
JavaScript
x
14
14
1
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES
2
3
SPE2DH-20-T-0133 SECTION B
4
5
PR: 0081939954 NSN/MATERIAL: 6530015627381
6
7
ITEM DESCRIPTION
8
9
BOTTLE, SAFETY CAP
10
11
BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT
12
13
RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
14
I want to extract the description immediately under ITEM DESCRIPTION
.
I have tried many unsuccessful attempts.
My latest attempt was:
JavaScript
1
4
1
for line in text:
2
if 'ITEM' and 'DESCRIPTION'in line:
3
print ('Possibe Descript:n', line)
4
But it did not find the text.
Is there a way to find ITEM DESCRIPTION
and get the line after it or something similar?
Advertisement
Answer
The following function finds the description on the line below some given pattern
, e.g. “ITEM DESCRIPTION”, and also ignores any blank lines that may be present in between. However, beware that the function does not handle the special case when the pattern exists, but the description does not.
JavaScript
1
16
16
1
txt = '''
2
CONTINUED ON NEXT PAGE CONTINUATION SHEET REFERENCE NO. OF DOCUMENT BEING CONTINUED: PAGE 4 OF 16 PAGES
3
4
SPE2DH-20-T-0133 SECTION B
5
6
PR: 0081939954 NSN/MATERIAL: 6530015627381
7
8
ITEM DESCRIPTION
9
10
BOTTLE, SAFETY CAP
11
12
BOTTLE, SAFETY CAP RPOO1: DLA PACKAGING REQUIREMENTS FOR PROCUREMENT
13
14
RAQO1: THIS DOCUMENT INCORPORATES TECHNICAL AND/OR QUALITY REQUIREMENTS (IDENTIFIED BY AN 'R' OR AN 'I' NUMBER) SET FORTH IN FULL TEXT IN THE DLA MASTER LIST OF TECHNICAL AND QUALITY REQUIREMENTS FOUND ON THE WEB AT:
15
'''
16
I’ve assumed you got your text as a text string, and thus the function below will split it into a list of lines ..
JavaScript
1
9
1
pattern = "ITEM DESCRIPTION" # to search for
2
3
def find_pattern_in_txt(txt, pattern):
4
lines = [line for line in txt.split("n") if line] # remove empty lines
5
if pattern in lines: return lines[lines.index(pattern)+1]
6
return None
7
8
print(find_pattern_in_txt(txt, pattern)) # prints: "BOTTLE, SAFETY CAP"
9