Grouping speaker dialogue in a written transcript

I have a txt file for a transcript. Example content: I would like to write some python code that will give the following output: So if Travis de Ronde is talking, for example, I want all of his dialogue to be on one “line” under his name until he is finished speaking or another speaker begins talking. Answer This is a very good job for itertools.groupby, not regular expressions: This yields

Walrus operator for filtering regex searches in list comprehension

I have a Python list of strings. I want to do the regex search on each element, filtering only those elements where I managed to capture the regex group. I think I can do the regex search only once using the walrus operator from Python 3.8. So far I have: The logic is: I take the found group if the regex search returned anything, which means it is not None. The problem is, the bevahiour is weird – I can print() before this list comprehension, the program finishes with code 0, but there is no result and print() after the

Python regular expression SyntaxError: unexpected character after line continuation character

I must have totally messed up my regular expression. I’m trying to find the ID and value in the following str (and eventually will need to find description and it’s value too): But for some reason, I get the above error message. This is my code: I tested in https://pythex.org/ and the regular expression of ID”: “([A-Za-z0-9-]*)” worked, but running it in the program has issues with the semicolon so I tried to replace it with s* but it’s having issues with line continuation. Any ideas? Right now I need to just capture the ID, and then I’ll need the

Extracting codes with regex (irregular regex keys)

I´m extracting the codes from a string list using coming from the title email. Which looks something like: So far what I tried is: My issue is that, I´m not able to extract the code next to the words that goes before [‘PN’, ‘P/N’, ‘PN:’, ‘P/N:’], specially if the code after starts with a letter (i.e ‘M’) or if it has a slash between it (i.e 26-59-29). My desired output would be: Answer In your pattern the character class [p/n:]s+ will match one of the listed followed by 1+ whitespace chars. In the example data that will match a forward

Python find all occurrences of hyphenated word and replace at position

I have to replace all occurrences of patterns with hyphen like c-c-c-c-come or oh-oh-oh-oh, etc. with the last token i.e. come or oh in this example, where The number of character between hyphen is arbitrary, it can be one ore more characters the token to match is the last token in the hyphenation, hence come in c-c-come. the input string may have one or more occurrences of it like the following sentences: c-c-c-c-come to home today c-c-c-c-come to me oh-oh-oh-oh it’s a bad life oh-oh-oh-oh Need to find the start and end position of the matched token via finditer [UPDATE]

Regex Python – Keep only ASCII and copyright symbol

I have the following function to keep only ASCII characters: But now I also want to keep the copyright symbol (©). What should I add to the pattern? Answer Add the copyright symbol’s hex xA9 (source) to your match group: Regex101

Search and filter pandas dataframe with regular expressions

I’d appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not quite sure how to do this. For example, I know how pull the results of a search thusly (with col1 being a column name): but I can’t figure out how to take this type of action, and perform it with

Using a regular expression to replace upper case repeated letters in python with a single lowercase letter

I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case. How can I make the “1” lower case? Should I not be using a regular expression to do this? Answer Pass a function as the repl argument. The MatchObject is passed to this function and .group(1) gives the

Delete digits in Python (Regex)

I am trying to delete all digits from a string. However, the next code deletes as well digits contained in any word. Obviously, I don’t want that. I have been trying many regular expressions with no success. Thanks! Result: This must not b deleted, but the number at the end yes Answer Add a space before the d+. Edit: After looking at the comments, I decided to form a more complete answer. I think this accounts for all the cases.