Skip to content
Advertisement

Add a character at start of a regex match in Pandas

I have a dataframe that has two columns, id and text

df = pd.DataFrame([[1, 'Hello world 28'], [2, 'Hi how are you 9'], [3, '19 Hello']], columns=['id','text'])

   id   text
    1   Hello world 28
    2   Hi how are you 9
    3   19 Hello
    

In the text field, whenever there is a digit preceded by a space, I want to add a # before the digit. The resultant dataframe that I am looking for would be as follows:

   id   text
    1   Hello world #28
    2   Hi how are you #9
    3   19 Hello 

I have tried the following method to capture the regex pattern and add the # character before the digit by following the example in this link:

df['text'] = df['text'].replace(r'(sd{1,2})', "#1", regex=True)

However, this gives me the following result and it replaces the entire digit with # instead of adding it at the start of the regex match:

   id   text
    1   Hello world #
    2   Hi how are you #
    3   19 Hello 

Any pointers on how I can add the # character before a regex match? Thanks!

Advertisement

Answer

try

df['text'].replace(r"s(d{1,2})", r" #1", regex=True)

i.e. move the parantheses to surround the digit-part to capture the digit(s) to be reflected in 1 and make the replacing string raw by r to escape the slash in 1 (and also put a space before #)

Advertisement