Hello I have this string and I need extract from this some sub strings according some delimiters:
string = """ 1538 a 123 skua456 789 5 g 15563 blu55g b 456 16453 a 789 5 16524 blu g 55 1734 a 987 987 55 aasf 552 18278 blu ttry """
And I need extract exactly this strings:
string1 = """ 1538 a 123 skua456 789 5 g 15563 blu55g """
string2 = """ 16453 a 789 5 16524 blu """
string3 = """ 1734 a 987 987 55 aasf 552 18278 blu """
I have tried a lot of types: re.findall, re.search, re.match. But I never geted the result expected.
For eg: this code bellow print all string:
re.split(r"a(.*)blu", a)[0]
Advertisement
Answer
You do not need a regex for this, you may get lines between lines containing a and blu:
text = "1538 an123nskua456n789n5ngn15563 blu55gnbn456n16453 an789n5n16524 blungn55n1734 an987n987n55naasfn552n18278 blunttry"
f = False
result = []
block = []
for line in text.splitlines():
if 'a' in line:
f = True
if f:
block.append(line)
if 'blu' in line and f:
f = False
result.append("n".join(block))
block = []
print(result)
# => ['1538 an123nskua456n789n5ngn15563 blu55g', '16453 an789n5n16524 blu', '1734 an987n987n55naasfn552n18278 blu']
See the Python demo.
With regex, you can use
print( re.findall(r'(?m)^.*a(?s:.*?)blu.*', text) ) print( re.findall(r'(?m)^.*a(?:n.*)*?n.*blu.*', text) )
See this Python demo.
The first regex means:
(?m)^– multiline mode on, so^matches any line start position.*a– any zero or more chars other than line break chars as many as possible, and thena(?s:.*?)– any zero or more chars including line break chars as few as possibleblu.*–blueand then any zero or more chars other than line break chars as many as possible.
The second regex matches
(?m)^– start of a line.*a– any zero or more chars other than line break chars as many as possible, and thena(?:n.*)*?– zero or more lines, as few as possiblen.*blu.*– a newline, any zero or more chars other than line break chars as many as possible,bluand any zero or more chars other than line break chars as many as possible.