Skip to content
Advertisement

Tag: regex

Find indices of target words without the surrounding brackets

I want a set of sentences with target words target[“text”] surrounded by brackets/braces/parentheses and some are overlapping/nested brackets/braces/parentheses. I want to extract these target words as well as their correct indices in the sentence, without brackets/braces/parentheses. I have managed to do so with the brackets and so on: Now I want to remove the brackets/braces/parentheses from the target[“text”]s and find

How to standardize column in pandas

I have dataframe which contains id column with the following sample values I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python. here’s my code Answer Can use DataFrame.replace() function using a regular expression like this: Here’s example code with sample data. Output: If any value does not

Why isn’t my re.sub finding all instances using my regex?

I’m using Python 3.10 on Windows 10 and trying the search below: If I use just “JohnnyB Cool”, the “B” gets a space before it. Why isn’t the “JohnnyB” substituted in the first search? I’ve also tried: To be clear, I want the final answer to be, Johnny B Cool & Joe Cool. Answer You may use this python code:

Regex For Special Character (S with line on top)

I was trying to write regex in Python to replace all non-ascii with an underscore, but if one of the characters is “S̄” (an ‘S’ with a line on the top), it adds an extra ‘S’… Is there a way to account for this character as well? I believe it’s a valid utf-8 character, but not ascii Here’s there code:

With pandas.DataFrame.replace in python how to replace all ä with ae?

With pandas.DataFrame.replace in python how to replace all ä with ae only the ones that are in between ${}? Below is my Python code that I tried with but it didn’t worked: df.replace({‘Desc’: r’${.*ä}’} , {‘Desc’: r’${.*ae}’}, regex=True) As a first e.g. Actual Result: Lorem Ipsum is ä simply dummy text ${Männer} Lorem Ipsum is simply dummy text ä. Expected

Pyspark: regex search with text in a list withColumn

I am new to Spark and I am having a silly “what’s-the-best-approach” issue. Basically, I have a map(dict) that I would like to loop over. During each iteration, I want to search through a column in a spark dataframe using rlike regex and assign the key of the dict to a new column using withColumn The data sample is shown

How I can use regex to remove repeated characters from string

I have a string as follows where I tried to remove similar consecutive characters. Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not “int”) to str Answer If I were you, I would prefer to

pyspark regex extract all

I have a dataframe like below. I am trying to extract the next word after function or var My code is here. as it is capture only one word, the final row returns only AWS and not Twitter. So I would like to capture all matching. My spark version is less than 3, so I tried df.withColumn(‘output’, f.expr(“regexp_extract_all(js, ‘(func)s+(w+)|(var)s+(w+)’, 4)”)).show()

Advertisement