I have a test string that looks like
These are my food preferences mango and I also like bananas and I like grapes too.
I am trying to write a regex in python to return the text with such rules:
- Search for the keyword: preferences
- make a group (words 1:7) until the word ‘like’ >> Repeat this step as much as possible
- a final group (words 1:7)
My current expression is: (live: https://regex101.com/r/1CSSNc/1/ )
JavaScript
x
2
1
(?P<Start>bpreferencesb)(?:s*(?:(?P<Name>w*)s*){1,7}like)*?(s*(?P<Last>w*s*){1,7})
2
which returns
JavaScript
1
5
1
Match 1 18-64 preferences mango and I also like bananas and
2
Group Start 18-29 preferences
3
Group 3 29-64 mango and I also like bananas and
4
Group Last 60-64 and
5
I expected/wanted the output to be:
JavaScript
1
6
1
Match 1 18-64 preferences mango .. grapes too
2
Group Start 18-29 preferences
3
Group 3 29-64 mango and I also
4
Group 4 xx xx bananas and I
5
Group Last 60-64 grapes too
6
My implementation is missing some concepts here.
Advertisement
Answer
You can use
JavaScript
1
2
1
(?P<Start>bpreferencesb)(?P<Mid>(?:s+w+(?:s+w+){0,6}?s+like)+)(?:s+(?P<Last>w+(?:s+w+){1,7}))?
2
See the regex demo.
Details:
(?P<Start>bpreferencesb)
– Group “Start”: a whole wordpreferences
(?P<Mid>(?:s+w+(?:s+w+){0,6}?s+like)+)
– Group “Mid”: one or more repetitions ofs+
– one or more whitespacesw+(?:s+w+){0,6}?
– one or more word chars and then zero to six occurrences of one or more whitespaces and then one or more word chars, as few as possibles+like
– one or more whitespaces and then the wordlike
(?:s+(?P<Last>w+(?:s+w+){1,7}))?
– an optional occurrence ofs+
– one or more whitespaces(?P<Last>w+(?:s+w+){1,7})
– Group “Last”: one or more word chars and then one to seven occurrences of one or more whitespaces and one or more word chars
See the Python demo:
JavaScript
1
9
1
import re
2
text = "These are my food preferences mango and I also like bananas and I like grapes too."
3
pattern = r"(?P<Start>bpreferencesb)(?P<Mid>(?:s+w+(?:s+w+){0,6}?s+like)+)(?:s+(?P<Last>w+(?:s+w+){1,7}))?"
4
match = re.search(pattern, text)
5
if match:
6
print(match.group("Start"))
7
print( re.split(r"s*blikebs*", match.group("Mid").strip()) )
8
print(match.group("Last"))
9
Output:
JavaScript
1
4
1
preferences
2
['mango and I also', 'bananas and I', '']
3
grapes too
4