Tag: regex

re.sub erroring with “Expected string or bytes-like object”

I have read multiple posts regarding this error, but I still can’t figure it out. When I try to loop through my function: Here is the error: Answer As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub. The simplest way

Extracting Dialogs from movie scripts using Regex

python regex

I would like to extract movie script dialogues like so: UPPERCAPS Character Names Dialog followed up until line-break to avoid snatching in the narration as well. Current Regex: ((s[^w].s[A-Z]+)n+.+) Problem is, it only extracts the character name and the first sentence from the dialog. Here’s the testing data: EDIT New Regex: (w[A-Z]+ns).+?(?=n) Answer You can use the following the regex:

Python27 TypeError: unsupported operand for type(s) += ‘int’ and ‘str’

python python-2.7 regex

Please help me understand what’s happening here. My aim is to create a function that will read “input.txt” and return the min, max, and average for each line within the text document. The text within the document is as follows: My code looks like this: Everything prints out fine except for numSum, which gives the error mentioned in the heading.

Python regex to remove punctuation except from URLs and decimal numbers

nltk python regex

People, I need a regex to remove punctuation from a string, but keep the accents and URLs. I also have to keep the mentions and hashtags from that string. I tried with the code below but unfortunately, it replaces the characters with accents but I want to keep the accents. The output for the following text “Apenas um teste com

Treat regular expression between dashes

python regex

Could you help me to use “sub” to change the numbers of these expressions: &AFL-03-123456 &AFL-01-12345 &AFL-02-123 context: samsung-j7-duos-dual-chip-desbloqueado-oi-android-5.1-tela-5.5-16gb-wi-fi-4g-camera-13mp-branco&AFL-03-171644black In need to replace the numbers after the second dash for other numbers (let’s say 987654). The number after the second dash, as you can see in the examples, may vary in number of digits but they are always numbers. The

String/regex search over Excel in Python issue

excel openpyxl python regex

I’m a newb to SO, and relatively new to Python, so i’m sorry if this is a simple fix or an inappropriate question. Firstly, my program generally works, but i’m trying to implement some redundancy/catchalls for to make it robust. The program looks over a directory (and sub-dirs) of excel files, opens them individually, scours for data (on a specific

Extract digits from string by condition

python regex string

I want to extract digits from a short string, base on a condition that the digits is in front of a character (S flag). example and result: I can split the string to a list to get the individual element, but how could I just get the 18 and 10? Answer Use re.findall with the regex r'(d+)S’. This matches all

Beautiful Soup if Class “Contains” or Regex?

beautifulsoup python regex web-scraping

If my class names are constantly different say for example: Normally I could do: There are way too many class names to work with here so a bunch of these are out. I know Python doesn’t have a “.contains” I would normally use but it does have an “in”. Though I haven’t been able to work out a way to

Python regex to extract html paragraph

html html-parsing python regex

I’m trying to extract parapgraphs from HTML by using the following line of code: but it returns none even though I know there is. Why? Answer Why don’t use an HTML parser to, well, parse HTML. Example using BeautifulSoup: Note that text=True helps to filter out empty paragraphs.

How to remove string value from column in pandas dataframe

dataframe lambda pandas python regex

I am trying to write some code that splits a string in a dataframe column at comma (so it becomes a list) and removes a certain string from that list if it is present. after removing the unwanted string I want to join the list elements again at comma. My dataframe looks like this: So basically my goal is to