Tag: regex

How to remove escape sequence like ‘xe2’ or ‘x0c’ in python

I am working on a project (content based search), for that I am using ‘pdftotext’ command line utility in Ubuntu which writes all the text from pdf to some text file. But it also writes bullets, now when I’m reading the file to index each word, it also gets some escape sequence indexed(like ‘x01’).I know its because of bullets(•). I

Do Python regular expressions have an equivalent to Ruby’s atomic grouping?

python regex ruby

Ruby’s regular expressions have a feature called atomic grouping (?>regexp), described here, is there any equivalent in Python’s re module? Answer Python does not directly support this feature, but you can emulate it by using a zero-width lookahead assert ((?=RE)), which matches from the current point with the same semantics you want, putting a named group ((?P<name>RE)) inside the lookahead,

regular expression match starting clause with end

python regex

I want to be able to capture the value of an HTML attribute with a python regexp. currently I use My problem is that I want the regular expression to “remember” whether the attribute started with a single or a double quote. I found the bug in my current approach with the following attribute my regex catches Answer You can

Regular expression in Python won’t match end of a string

python regex

I’m just learning Python, and I can’t seem to figure out regular expressions. I want this code to print ‘yes’, but it obstinately prints ‘no’. I’ve also tried each of the following: Plus countless other variations. I’ve been searching for quite a while, but can’t find/understand anything that solves my problem. Can someone help out a newbie? Answer You’ve tried

Using a RegEx to match IP addresses

python regex

I’m trying to make a test for checking whether a sys.argv input matches the RegEx for an IP address… As a simple test, I have the following… However when I pass random values into it, it returns “Acceptable IP address” in most cases, except when I have an “address” that is basically equivalent to d+. Answer You have to modify

heavy regex – really time consuming

html-parsing performance python regex

I have the following regex to detect start and end script tags in the html file: meaning in short it will catch: <script “NOT THIS</s” > “NOT THIS</s” </script> it works but needs really long time to detect <script>, even minutes or hours for long strings The lite version works perfectly even for long string: however, the extended pattern I

Check for camel case in Python

camelcasing python regex

I would like to check if a string is a camel case or not (boolean). I am inclined to use a regex but any other elegant solution would work. I wrote a simple regex Would this be correct? Or am I missing something? Edit I would like to capture names in a collection of text documents of the format Edit2

Python Regex to find a string in double quotes within a string

python regex

I’m looking for a code in python using regex that can perform something like this Input: Regex should return “String 1” or “String 2” or “String3” Output: String 1,String2,String3 I tried r'”*”‘ Answer Here’s all you need to do: result: As pointed out by Li-aung Yip: To elaborate, .+? is the “non-greedy” version of .+. It makes the regular expression

Get the string within brackets in Python

brackets python regex

I have a sample string <alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card] …>, created=1324336085, description=’Customer for My Test App’, livemode=False> I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another []) How could I do it in easiest possible way in Python? Maybe by using RegEx (which I am not good at)? Answer How about: For me this prints: Note that the

Regular expression to find any number in a string

python regex

What’s the notation for any number in re? Like if I’m searching a string for any number, positive or negative. I’ve been using d+ but that can’t find 0 or -1 Answer Searching for positive, negative, and/or decimals, you could use [+-]?d+(?:.d+)? This isn’t very smart about leading/trailing zeros, of course: Edit: Corrected the above regex to find single-digit numbers.