Python

I would like to ask you how I can extract substrings related to some keywords.

For example I have the following text:

mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"

JavaScript
​x
 
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
​

I would like to extract the numeric value after some keywords, for example: “Discount” and “Other discount”. I was trying with the following code:

    test = re.compile(r"""(
    (Discountsd*)
    (Othersdiscountsd*)
    )""", re.VERBOSE)

pr = test.findall(mystring)

JavaScript
 
    test = re.compile(r"""(
    (Discountsd*)
    (Othersdiscountsd*)
    )""", re.VERBOSE)
​
pr = test.findall(mystring)
​

I would like to obtain (in this case) a pair –> Discount : 0,0120 and Other discount : 0,0000 But it could be also enough obtain a list like the following one:

["Discount 0,0120", "Other discount 0,0000"]

JavaScript
 
["Discount 0,0120", "Other discount 0,0000"]
​

I really thanks in advance for any help.

Answer

I had better luck with re.search. Also you were missing d,d to capture numbers before and after the comma.

import re

mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"

pattern = "(Discountsd+,d+)(.*)(Othersdiscountsd+,d+)"

p = re.search(pattern, mystring)

p.groups()
>> ('Discount 0,0120',
 ' Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % ',
 'Other discount 0,0000')

JavaScript
 
import re
​
mystring = "Commission 0,0000 Packaging 0,0426 Discount 0,0120 Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % Other discount 0,0000 YEB 4,0700 % Industrial 0,3856"
​
pattern = "(Discountsd+,d+)(.*)(Othersdiscountsd+,d+)"
​
p = re.search(pattern, mystring)
​
p.groups()
>> ('Discount 0,0120',
 ' Transport 0,0690 F YEB 0,0000 Commission 0,0000 Payment discount 0,0000 % ',
 'Other discount 0,0000')
​

Python – Finding substrings related to specific keywords

Advertisement

Answer