Skip to content
Advertisement

Using Python and Regex get last occurrence and remaining part

I’m trying to use python and regex to get the last set of integers in a filename (string) Which the method does what i need, however I want to also return the inverse or remaining parts of the regex. How can i do that?

Here is the regex ([0-9]+|#+)(?!.*([0-9]+|#+))

import re

values = [
    'image.0001',
    'image###',
    '###image###',
    'image001',
    'image_001',
    '001',
    '0001.image',
    '001image',
    '001_image',
    'image',
    '01_image01',
    '03_image01',
]

pattern = '([0-9]+|#+|@+)'
regex = '{0}(?!.*{0})'.format(pattern)

for v in values:
    result = re.search(regex, v)
    if result:
        print result.groups()

Currently it is returning…. ('01', None) I’d like it to return something like ('image', '0001')

Updated

Optionally is there a way to split the strings by groups of numbers…for example

'image.0001' > ['image.', '0001']
'image###' > ['image', '###']
'###image###' > ['###', 'image', '###']
'image001' > ['image', '001']
'image_001' > ['image_', '001']
'001' > ['001']
'0001.image' > ['0001', '.image']
'001image' > ['001', 'image']
'001_image' > ['001', '_image']
'image' > ['image']
'01_image01' > ['01', '_image', '01']
'03_image01' > ['03', '_image', '01']

Advertisement

Answer

EDIT:

Use

re.findall(r'd+|#+|@+|[^#@d]+', v)

See proof.

Explanation

--------------------------------------------------------------------------------
  d+                      digits (0-9) (1 or more times (matching
                           the most amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  #+                       '#' (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  @+                       '@' (1 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  [^#@d]+                 any character except: '#', '@', digits (0-
                           9) (1 or more times (matching the most
                           amount possible))

ORIGINAL: Use re.split, add capturing group to keep captured part inside the result:

import re

values = [
    'image.0001',
    'image###',
    '###image###',
    'image001',
    'image_001',
    '001',
    '0001.image',
    '001image',
    '001_image',
    'image',
    '01_image01',
    '03_image01',
]

pattern = '[0-9]+|#+|@+'
regex = re.compile(r'({0})(?!.*(?:{0}))'.format(pattern))
for v in values:
    print(regex.split(v))

See Python proof

Results:

['image.', '0001', '']
['image', '###', '']
['###image', '###', '']
['image', '001', '']
['image_', '001', '']
['', '001', '']
['', '0001', '.image']
['', '001', 'image']
['', '001', '_image']
['image']
['01_image', '01', '']
['03_image', '01', '']

See regex proof.

Explanation

--------------------------------------------------------------------------------
  (                        group and capture to 1:
--------------------------------------------------------------------------------
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    #+                       '#' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    @+                       '@' (1 or more times (matching the most
                             amount possible))
--------------------------------------------------------------------------------
  )                        end of 1
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    .*                       any character except n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      #+                       '#' (1 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      @+                       '@' (1 or more times (matching the
                               most amount possible))
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
  )                        end of look-ahead
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement