Tag: regex

Find indices of target words without the surrounding brackets

I want a set of sentences with target words target[“text”] surrounded by brackets/braces/parentheses and some are overlapping/nested brackets/braces/parentheses. I want to extract these target words as well as their correct indices in the sentence, without brackets/braces/parentheses. I have managed to do so with the brackets and so on: Now I want to remove the brackets/braces/parentheses from the target[“text”]s and find

Split string with a certain keyword outside a string but not inside a string

python regex string

I have a question about how to use regex at this condition (or can be in any solution in Python): What I want to achieve is to split the colon ‘:’ if it’s found outside a string, but don’t split it if it’s inside a string, like this example below: Regex I use: (?!B”[^”]*):(?![^”]*”B) string_to_split: str = ‘”A: String 1″:

How to standardize column in pandas

pandas python regex

I have dataframe which contains id column with the following sample values I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python. here’s my code Answer Can use DataFrame.replace() function using a regular expression like this: Here’s example code with sample data. Output: If any value does not

Why isn’t my re.sub finding all instances using my regex?

python python-re regex

I’m using Python 3.10 on Windows 10 and trying the search below: If I use just “JohnnyB Cool”, the “B” gets a space before it. Why isn’t the “JohnnyB” substituted in the first search? I’ve also tried: To be clear, I want the final answer to be, Johnny B Cool & Joe Cool. Answer You may use this python code:

Regex For Special Character (S with line on top)

ascii python regex

I was trying to write regex in Python to replace all non-ascii with an underscore, but if one of the characters is “S̄” (an ‘S’ with a line on the top), it adds an extra ‘S’… Is there a way to account for this character as well? I believe it’s a valid utf-8 character, but not ascii Here’s there code:

With pandas.DataFrame.replace in python how to replace all ä with ae?

pandas python regex

With pandas.DataFrame.replace in python how to replace all ä with ae only the ones that are in between ${}? Below is my Python code that I tried with but it didn’t worked: df.replace({‘Desc’: r’${.*ä}’} , {‘Desc’: r’${.*ae}’}, regex=True) As a first e.g. Actual Result: Lorem Ipsum is ä simply dummy text ${Männer} Lorem Ipsum is simply dummy text ä. Expected

Pyspark: regex search with text in a list withColumn

apache-spark pyspark python regex

I am new to Spark and I am having a silly “what’s-the-best-approach” issue. Basically, I have a map(dict) that I would like to loop over. During each iteration, I want to search through a column in a spark dataframe using rlike regex and assign the key of the dict to a new column using withColumn The data sample is shown

How I can use regex to remove repeated characters from string

python regex

I have a string as follows where I tried to remove similar consecutive characters. Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not “int”) to str Answer If I were you, I would prefer to

python regex: capture different strings line by line from .txt file

python python-3.x regex

I need to extract names/strings from a .txt file line by line. I am trying to use regex to do this. Eg. In this below line I want to extract the name “Victor Lau”, “Siti Zuan” and the string “TELEGRAPHIC TRANSFER” in three different lists then output them into an excel file. You may see the txt file also TELEGRAPHIC

pyspark regex extract all

apache-spark bigdata pyspark python regex

I have a dataframe like below. I am trying to extract the next word after function or var My code is here. as it is capture only one word, the final row returns only AWS and not Twitter. So I would like to capture all matching. My spark version is less than 3, so I tried df.withColumn(‘output’, f.expr(“regexp_extract_all(js, ‘(func)s+(w+)|(var)s+(w+)’, 4)”)).show()