Skip to content
Advertisement

Tag: regex

scrapy/regex get json_object from html

I’m crawling reviews from a website in scrapy python and want to get all the reviews from the following part of the raw html as a dictionary. Getting the window.cj.listings is no problem, but I can’t seem to get the window.cj.app_data out with regex. The following code works for getting the listing. But I get nothing from window.cj.app_data, when I

Extract names of a sentence with regex

I’m very new with the syntax of regex, I already read some about the libary. I’m trying extract names from a simple sentence, but I found myself in trouble, below I show a exemple of what I’ve done. Anyone can explain me what is wrong and how to proceed? Answer I think your regex has two problems. You want to

Regex: allow comma-separated strings, including characters and non-characters

I’m finding it difficult to complete this regex. The following regex checks for the validity of comma-separated strings: ^(w+)(,s*w+)*$ So, this will match the following comma-separated strings: Then, I can do the same for non-characters, using ^(W+)(,s*W+)*$, which will match: I would like to create a regex which matches strings which include special characters, and hyphens and underscore, e.g. foo-bar,

Grouping speaker dialogue in a written transcript

I have a txt file for a transcript. Example content: I would like to write some python code that will give the following output: So if Travis de Ronde is talking, for example, I want all of his dialogue to be on one “line” under his name until he is finished speaking or another speaker begins talking. Answer This is

groupdict in regex

I have the following string: I wrote a regex for this which will find the first-name and last-name of user: Result: Sometimes, I don’t get the last name. At that time, my regex fails. Can any one have any idea regarding this? Answer You may use See the regex demo Regex details ^ – start of string (?:(?:M(?:(?:is|r)?s|r)|[JS]r).?s+)? – an

Get rid of default text

I am trying to parse a user’s event descriptions after obtaining access to their google calendar through the google calendar API. When I input the description into my program, I want to get rid of default (and useless) text such as Zoom meeting invitations. If the following below is the description string How can I parse it so that only

Advertisement