I have the following 2 variations of scraped data:
txt = '''Käuferprovision: 3 % zzgl. gesetzl. MwSt.''' # variation 1
and
txt = '''Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist''' # variation 2
I’d like to make one regular expression that gets the percentage as a float, so in the first instance 3.0 and in the second 3.57
I’ve tried this so far:
m = re.search(r'.{3}.%.{5}',txt) txt = m.group().split("%")[1:] txt = ("".join(txt)).replace(",",".") print(txt)
Which works for the variation 2 but not variaton 1.
Advertisement
Answer
You may try this code to grab your percent values and convert them into float
:
>>> import re >>> arr = ['Käuferprovision: 3 % zzgl. gesetzl. MwSt.', 'Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist'] >>> rx = re.compile(r'd+(?:[.,]d+)*(?=s*%)|(?<=%)s*d+(?:[.,]d+)*') >>> for s in arr: ... for m in rx.finditer(s): print (float(m.group().replace(',', '.'))) ... 3.0 3.57