I am trying to extract text between two words. The below pattern repeats itself with modifications in between ‘start keyword’ and ‘end keyword’ across the text document. The document has paragraphs and text before and after the following patterns, which i don’t want to extract. Can anyone help me with the regex for the following ? which would extract all occurrences.
Start keyword- RIASWIX End keyword – Sky Access
----Document Start------- Paragraph* RIASWIX.* ABCDEF1 NONE WORKING: HELLO(READ) BOOLEAN Access: SADGRE3, VJFKES3, JGJKEWW, IS4DWF44(A), DFEAWE2(G), DW4444W, IHFK3MF3 BAZAAR Access: No resource with BAZAAR Access GHAR Access: No resource with GHAR Access WATER Access: ADMINDDD(A), GEDDE33 SKY None: No Resource with Sky Access RIASWIX.@7483NFJ.* HFDFDF3 NONE WORKING: BYE(READ) BOOLEAN Access: GRREGGG, GREFEFF, GFGGGG, FDFDFDF(A), RERERE3(G), GFFWEF44, FFRF44F BAZAAR Access: No resource with BAZAAR Access GHAR Access: No resource with GHAR Access WATER Access: ADMINEWW(A), FFRFRGR SKY None: No Resource with Sky Access RIASWIX.@7483KXX.* HFDFDF3 NONE WORKING: TATA(READ) BOOLEAN Access: GRDSD33, FASDE, GFGGGG, RWERW33(A), NMUYHT4(G), BAZAAR Access: XCDFEFE3, FREFE33R GHAR Access: No resource with GHAR Access WATER Access: DASDEFG(A), SJMFEIOE(P) SKY None: No Resource with Sky Access *Text ----Document End-------
Advertisement
Answer
(?s)
for new line characters, check this regex-match-all-characters-between-two-strings
import re print(re.findall('RIASWIX(?s)(.*?)Sky Access', str1))