I have the following 2 variations of scraped data:
txt = '''Käuferprovision: 3 % zzgl. gesetzl. MwSt.''' # variation 1
and
txt = '''Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist''' # variation 2
I’d like to make one regular expression that gets the percentage as a float, so in the first instance 3.0 and in the second 3.57
I’ve tried this so far:
m = re.search(r'.{3}.%.{5}',txt)
txt = m.group().split("%")[1:]
txt = ("".join(txt)).replace(",",".")
print(txt)
Which works for the variation 2 but not variaton 1.
Advertisement
Answer
You may try this code to grab your percent values and convert them into float:
>>> import re
>>> arr = ['Käuferprovision: 3 % zzgl. gesetzl. MwSt.', 'Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist']
>>> rx = re.compile(r'd+(?:[.,]d+)*(?=s*%)|(?<=%)s*d+(?:[.,]d+)*')
>>> for s in arr:
... for m in rx.finditer(s): print (float(m.group().replace(',', '.')))
...
3.0
3.57