Get only numbers at the end (regex)

Question

I&#8217;d like to get only the numbers (integers) at the end of the phrases below: I mean: 600, 1400, 100000. I&#8217;ll add each one of them to a database later. I tried to use regex: (?<=s)(d*s*)|(d*.d*)$ But it didn&#8217;t work properly. Any ideas? PS: We use dots, not commas to represent a thousand: 1…

Accepted Answer

In the pattern that you tried, this part (?<=s)(d*s*) matches optional digits, followed by optional whitespace chars while there must be a whitespace char directly to the left.That will also get all the positions in the string where there is a whitspace char to the left, as the digits and the whitespace char in the match are optional.In this part (d*.d*)$ the digits are optional, so it could also match just a dot at the end of the string.If there has to be a whitespace char before the number at the end, you can use:(?<=s)d{1,3}(?:.d{3})*$The pattern matches:(?<=s) Positive lookbehind, assert a whitspace char to the left from the current positiond{1,3} Match 1-3 digits(?:.d{3})* Optionally repeat a dot and 3 digits$ End of stringSee a regex demo.If the number can also be by itself, you could assert a whitespace boundary to the left (?<!S)(?<!S)d{1,3}(?:.d{3})*$See another regex demo.For example, using str.extract and wrapping the pattern in a capture group:import pandas as pdstrings = [    "VISTA AES TIETE E UNT N2 600",    "VISTA IT AUUNIBANCO PN N1 1.400",    "OPCAO DE VENDA 04/21 COGNP450ON 4,50COGNE 100.000"]df = pd.DataFrame(strings, columns=["colName"])df['lastNumbers'] = df['colName'].str.extract(r"(?<=s)(d{1,3}(?:.d{3})*)$")print(df)Output                                             colName lastNumbers0                       VISTA AES TIETE E UNT N2 600         6001                    VISTA IT AUUNIBANCO PN N1 1.400       1.4002  OPCAO DE VENDA 04/21 COGNP450ON 4,50COGNE 100.000     100.000

Advertisement

Answer