Skip to content
Advertisement

regex lookahead AND look behind

I have the following 2 variations of scraped data:

   txt =  '''Käuferprovision: 3 % zzgl. gesetzl. MwSt.''' # variation 1

and

    txt = '''Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist''' # variation 2

I’d like to make one regular expression that gets the percentage as a float, so in the first instance 3.0 and in the second 3.57

I’ve tried this so far:

m = re.search(r'.{3}.%.{5}',txt)
txt = m.group().split("%")[1:]
txt = ("".join(txt)).replace(",",".")
print(txt)

Which works for the variation 2 but not variaton 1.

Advertisement

Answer

You may try this code to grab your percent values and convert them into float:

>>> import re
>>> arr = ['Käuferprovision: 3 % zzgl. gesetzl. MwSt.', 'Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist']
>>> rx = re.compile(r'd+(?:[.,]d+)*(?=s*%)|(?<=%)s*d+(?:[.,]d+)*')
>>> for s in arr:
...     for m in rx.finditer(s): print (float(m.group().replace(',', '.')))
...
3.0
3.57

RegEx Demo

Online Code Demo

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement