Skip to content
Advertisement

Regex to match a list of one or more comma-separated words, unless the string ends in a comma

I have written the following regex long time ago that must match at least 3 words and works in both Latin and Cyrillic characters : regex = '([^ ,;d]{2,}[ ,;]{1,}){2,}[^ ,;d]{2,}'

I would like to rewrite it to match hello but fail to match hello, because of the comma. However, I would still like it to match hello, and, more, words.

Example matches: hello, hello, test69, hello, test69, matches

Example non-matches: hello, hello test69, hello test69 matches

Advertisement

Answer

You can use

^w+(?:, *w+)*$

In Python, you can use a shorter version if you use re.fullmatch:

re.fullmatch(r'w+(?:, *w+)*', text)

See the regex demo.

Note that in case your spaces can be any whitespaces, replace the with s in the regex. If your words can only contain letters, replace each w with [^Wd_]. If your words can only contain letters and digits, replace every w with [^W_].

Details:

  • ^ – start of string
  • w+ – one or more word chars
  • (?:, *w+)* – zero or more repetitions of a comma, zero or more spaces, and then one or more word chars
  • $ – end of string.
Advertisement