Skip to content
Advertisement

Python regex to match many tokens in sequnece

I have a test string that looks like

These are my food preferences mango and I also like bananas and I like grapes too.

I am trying to write a regex in python to return the text with such rules:

  • Search for the keyword: preferences
  • make a group (words 1:7) until the word ‘like’ >> Repeat this step as much as possible
  • a final group (words 1:7)

My current expression is: (live: https://regex101.com/r/1CSSNc/1/ )

JavaScript

which returns

JavaScript

I expected/wanted the output to be:

JavaScript

My implementation is missing some concepts here.

Advertisement

Answer

You can use

JavaScript

See the regex demo.

Details:

  • (?P<Start>bpreferencesb) – Group “Start”: a whole word preferences
  • (?P<Mid>(?:s+w+(?:s+w+){0,6}?s+like)+) – Group “Mid”: one or more repetitions of
    • s+ – one or more whitespaces
    • w+(?:s+w+){0,6}? – one or more word chars and then zero to six occurrences of one or more whitespaces and then one or more word chars, as few as possible
    • s+like – one or more whitespaces and then the word like
  • (?:s+(?P<Last>w+(?:s+w+){1,7}))? – an optional occurrence of
    • s+ – one or more whitespaces
    • (?P<Last>w+(?:s+w+){1,7}) – Group “Last”: one or more word chars and then one to seven occurrences of one or more whitespaces and one or more word chars

See the Python demo:

JavaScript

Output:

JavaScript
Advertisement