Skip to content
Advertisement

How I can use regex to remove repeated characters from string

I have a string as follows where I tried to remove similar consecutive characters.

import re
input = "abccbcbbb";
for i in input :
    input = re.sub("(.)\1+", "",input);
print(input)

Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not “int”) to str

import re
input = "abccbcbbb";
k=3
for i in input :
   input= re.sub("(.)\1+{"+(k-1)+"}", "",input)
print(input)

Advertisement

Answer

If I were you, I would prefer to do it like suggested before. But since I’ve already spend time on answering this question here is my handmade solution.

The pattern described below creates a named group named “letter”. This group updates iterative, so firstly it is a, then b, etc. Then it looks ahead for all the repetitions of the group “letter” (which updates for each letter).

So it finds all groups of repeated letters and replaces them with empty string.

import re

input = 'abccbcbbb'
result = 'abcbcb'
pattern = r'(?P<letter>[a-z])(?=(?P=letter)+)'
substituted = re.sub(pattern, '', input)
assert substituted == result
Advertisement