Skip to content

Tag: regex

How to standardize column in pandas

I have dataframe which contains id column with the following sample values I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python. here’s my code Answer Can use DataFrame.replace() function using a regular expression like this: Here’s …

Regex For Special Character (S with line on top)

I was trying to write regex in Python to replace all non-ascii with an underscore, but if one of the characters is “S̄” (an ‘S’ with a line on the top), it adds an extra ‘S’… Is there a way to account for this character as well? I believe it’s a valid utf-8 char…

How I can use regex to remove repeated characters from string

I have a string as follows where I tried to remove similar consecutive characters. Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not “int”) to str Answer If I were you, I would pref…

pyspark regex extract all

I have a dataframe like below. I am trying to extract the next word after function or var My code is here. as it is capture only one word, the final row returns only AWS and not Twitter. So I would like to capture all matching. My spark version is less than 3, so I tried df.withColumn(‘output’, f.…