Skip to content
Advertisement

Get only numbers at the end (regex)

I’d like to get only the numbers (integers) at the end of the phrases below:

VISTA AES TIETE E UNT N2 600 
VISTA IT AUUNIBANCO PN N1 1.400
OPCAO DE VENDA 04/21 COGNP450ON 4,50COGNE 100.000

I mean: 600, 1400, 100000. I’ll add each one of them to a database later.

I tried to use regex: (?<=s)(d*s*)|(d*.d*)$

But it didn’t work properly. Any ideas?

PS: We use dots, not commas to represent a thousand: 1.000, instead of 1,000.

Advertisement

Answer

In the pattern that you tried, this part (?<=s)(d*s*) matches optional digits, followed by optional whitespace chars while there must be a whitespace char directly to the left.

That will also get all the positions in the string where there is a whitspace char to the left, as the digits and the whitespace char in the match are optional.

In this part (d*.d*)$ the digits are optional, so it could also match just a dot at the end of the string.


If there has to be a whitespace char before the number at the end, you can use:

(?<=s)d{1,3}(?:.d{3})*$

The pattern matches:

  • (?<=s) Positive lookbehind, assert a whitspace char to the left from the current position
  • d{1,3} Match 1-3 digits
  • (?:.d{3})* Optionally repeat a dot and 3 digits
  • $ End of string

See a regex demo.

If the number can also be by itself, you could assert a whitespace boundary to the left (?<!S)

(?<!S)d{1,3}(?:.d{3})*$

See another regex demo.

For example, using str.extract and wrapping the pattern in a capture group:

import pandas as pd

strings = [
    "VISTA AES TIETE E UNT N2 600",
    "VISTA IT AUUNIBANCO PN N1 1.400",
    "OPCAO DE VENDA 04/21 COGNP450ON 4,50COGNE 100.000"
]

df = pd.DataFrame(strings, columns=["colName"])
df['lastNumbers'] = df['colName'].str.extract(r"(?<=s)(d{1,3}(?:.d{3})*)$")

print(df)

Output

                                             colName lastNumbers
0                       VISTA AES TIETE E UNT N2 600         600
1                    VISTA IT AUUNIBANCO PN N1 1.400       1.400
2  OPCAO DE VENDA 04/21 COGNP450ON 4,50COGNE 100.000     100.000
Advertisement