I want a set of sentences with target words target[“text”] surrounded by brackets/braces/parentheses and some are overlapping/nested brackets/braces/parentheses. I want to extract these target words as well as their correct indices in the sentence, without brackets/braces/parentheses. I have manag…
Tag: regex
Split string with a certain keyword outside a string but not inside a string
I have a question about how to use regex at this condition (or can be in any solution in Python): What I want to achieve is to split the colon ‘:’ if it’s found outside a string, but don’t split it if it’s inside a string, like this example below: Regex I use: (?!B”[^”…
How to standardize column in pandas
I have dataframe which contains id column with the following sample values I want to standardise to XXXXXXXX-XXXX (i.e. 8 and 4 digits separated by a dash), How can I achieve that using python. here’s my code Answer Can use DataFrame.replace() function using a regular expression like this: Here’s …
Why isn’t my re.sub finding all instances using my regex?
I’m using Python 3.10 on Windows 10 and trying the search below: If I use just “JohnnyB Cool”, the “B” gets a space before it. Why isn’t the “JohnnyB” substituted in the first search? I’ve also tried: To be clear, I want the final answer to be, Johnny B Co…
Regex For Special Character (S with line on top)
I was trying to write regex in Python to replace all non-ascii with an underscore, but if one of the characters is “S̄” (an ‘S’ with a line on the top), it adds an extra ‘S’… Is there a way to account for this character as well? I believe it’s a valid utf-8 char…
With pandas.DataFrame.replace in python how to replace all ä with ae?
With pandas.DataFrame.replace in python how to replace all ä with ae only the ones that are in between ${}? Below is my Python code that I tried with but it didn’t worked: df.replace({‘Desc’: r’${.*ä}’} , {‘Desc’: r’${.*ae}’}, regex=True) As a first e.g. A…
Pyspark: regex search with text in a list withColumn
I am new to Spark and I am having a silly “what’s-the-best-approach” issue. Basically, I have a map(dict) that I would like to loop over. During each iteration, I want to search through a column in a spark dataframe using rlike regex and assign the key of the dict to a new column using withC…
How I can use regex to remove repeated characters from string
I have a string as follows where I tried to remove similar consecutive characters. Now I need to let the user specify the value of k. I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not “int”) to str Answer If I were you, I would pref…
python regex: capture different strings line by line from .txt file
I need to extract names/strings from a .txt file line by line. I am trying to use regex to do this. Eg. In this below line I want to extract the name “Victor Lau”, “Siti Zuan” and the string “TELEGRAPHIC TRANSFER” in three different lists then output them into an excel file…
pyspark regex extract all
I have a dataframe like below. I am trying to extract the next word after function or var My code is here. as it is capture only one word, the final row returns only AWS and not Twitter. So I would like to capture all matching. My spark version is less than 3, so I tried df.withColumn(‘output’, f.…