I have written the following regex long time ago that must match at least 3 words and works in both Latin and Cyrillic characters : regex = '([^ ,;d]{2,}[ ,;]{1,}){2,}[^ ,;d]{2,}'
I would like to rewrite it to match hello
but fail to match hello,
because of the comma. However, I would still like it to match hello, and, more, words
.
Example matches: hello
, hello, test69
, hello, test69, matches
Example non-matches: hello,
hello test69
, hello test69 matches
Advertisement
Answer
You can use
^w+(?:, *w+)*$
In Python, you can use a shorter version if you use re.fullmatch
:
re.fullmatch(r'w+(?:, *w+)*', text)
See the regex demo.
Note that in case your spaces can be any whitespaces, replace the
with s
in the regex. If your words can only contain letters, replace each w
with [^Wd_]
. If your words can only contain letters and digits, replace every w
with [^W_]
.
Details:
^
– start of stringw+
– one or more word chars(?:, *w+)*
– zero or more repetitions of a comma, zero or more spaces, and then one or more word chars$
– end of string.