Hello I have this string and I need extract from this some sub strings according some delimiters:
string = """ 1538 a 123 skua456 789 5 g 15563 blu55g b 456 16453 a 789 5 16524 blu g 55 1734 a 987 987 55 aasf 552 18278 blu ttry """
And I need extract exactly this strings:
string1 = """ 1538 a 123 skua456 789 5 g 15563 blu55g """
string2 = """ 16453 a 789 5 16524 blu """
string3 = """ 1734 a 987 987 55 aasf 552 18278 blu """
I have tried a lot of types: re.findall, re.search, re.match. But I never geted the result expected.
For eg: this code bellow print all string:
re.split(r"a(.*)blu", a)[0]
Advertisement
Answer
You do not need a regex for this, you may get lines between lines containing a
and blu
:
text = "1538 an123nskua456n789n5ngn15563 blu55gnbn456n16453 an789n5n16524 blungn55n1734 an987n987n55naasfn552n18278 blunttry" f = False result = [] block = [] for line in text.splitlines(): if 'a' in line: f = True if f: block.append(line) if 'blu' in line and f: f = False result.append("n".join(block)) block = [] print(result) # => ['1538 an123nskua456n789n5ngn15563 blu55g', '16453 an789n5n16524 blu', '1734 an987n987n55naasfn552n18278 blu']
See the Python demo.
With regex, you can use
print( re.findall(r'(?m)^.*a(?s:.*?)blu.*', text) ) print( re.findall(r'(?m)^.*a(?:n.*)*?n.*blu.*', text) )
See this Python demo.
The first regex means:
(?m)^
– multiline mode on, so^
matches any line start position.*a
– any zero or more chars other than line break chars as many as possible, and thena
(?s:.*?)
– any zero or more chars including line break chars as few as possibleblu.*
–blue
and then any zero or more chars other than line break chars as many as possible.
The second regex matches
(?m)^
– start of a line.*a
– any zero or more chars other than line break chars as many as possible, and thena
(?:n.*)*?
– zero or more lines, as few as possiblen.*blu.*
– a newline, any zero or more chars other than line break chars as many as possible,blu
and any zero or more chars other than line break chars as many as possible.