I have the following 2 variations of scraped data:
JavaScript
x
2
1
txt = '''Käuferprovision: 3 % zzgl. gesetzl. MwSt.''' # variation 1
2
and
JavaScript
1
2
1
txt = '''Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist''' # variation 2
2
I’d like to make one regular expression that gets the percentage as a float, so in the first instance 3.0 and in the second 3.57
I’ve tried this so far:
JavaScript
1
5
1
m = re.search(r'.{3}.%.{5}',txt)
2
txt = m.group().split("%")[1:]
3
txt = ("".join(txt)).replace(",",".")
4
print(txt)
5
Which works for the variation 2 but not variaton 1.
Advertisement
Answer
You may try this code to grab your percent values and convert them into float
:
JavaScript
1
9
1
>>> import re
2
>>> arr = ['Käuferprovision: 3 % zzgl. gesetzl. MwSt.', 'Käuferprovision: Die Courtage i.H.v. % 3,57 inkl. MwSt. ist']
3
>>> rx = re.compile(r'd+(?:[.,]d+)*(?=s*%)|(?<=%)s*d+(?:[.,]d+)*')
4
>>> for s in arr:
5
for m in rx.finditer(s): print (float(m.group().replace(',', '.')))
6
7
3.0
8
3.57
9