I have a dataframe that has two columns, id
and text
df = pd.DataFrame([[1, 'Hello world 28'], [2, 'Hi how are you 9'], [3, '19 Hello']], columns=['id','text']) id text 1 Hello world 28 2 Hi how are you 9 3 19 Hello
In the text
field, whenever there is a digit preceded by a space, I want to add a #
before the digit. The resultant dataframe that I am looking for would be as follows:
id text 1 Hello world #28 2 Hi how are you #9 3 19 Hello
I have tried the following method to capture the regex pattern and add the #
character before the digit by following the example in this link:
df['text'] = df['text'].replace(r'(sd{1,2})', "#1", regex=True)
However, this gives me the following result and it replaces the entire digit with #
instead of adding it at the start of the regex match:
id text 1 Hello world # 2 Hi how are you # 3 19 Hello
Any pointers on how I can add the #
character before a regex match? Thanks!
Advertisement
Answer
try
df['text'].replace(r"s(d{1,2})", r" #1", regex=True)
i.e. move the parantheses to surround the digit-part to capture the digit(s) to be reflected in 1
and make the replacing string raw by r
to escape the slash in 1
(and also put a space before #
)