Why is Python re not splitting multiple instances of punctuation?

Question

I am trying to split inputted text at spaces, and all special characters like punctuation, while keeping the delimiters. My re pattern works exactly the way I want except that it will not split multiple instances of the punctuation. Here is my re pattern wordsWithPunc = re.split(r'([^-w]+)&#8217;,words) If I …

Accepted Answer

You need to make your pattern non-greedy (remove the +) if you want to capture single non-word characters, something like:import rewords = 'My name is mud!!!'splitted = re.split(r'([^-w])', words)# ['My', ' ', 'name', ' ', 'is', ' ', 'mud', '!', '', '!', '', '!', '']This will produce also &#8217;empty&#8217; matches between non-word characters (because you&#8217;re slitting on each of them), but you can mitigate that by postprocessing the result to remove empty matches:splitted = [match for match in re.split(r'([^-w])', words) if match]# ['My', ' ', 'name', ' ', 'is', ' ', 'mud', '!', '!', '!']You can further strip spaces in the generator (i.e. ... if match.strip() ...) if you want to get rid off the space matches as well.

Advertisement

Answer