I’m trying to use python and regex to get the last set of integers in a filename (string) Which the method does what i need, however I want to also return the inverse or remaining parts of the regex. How can i do that?
Here is the regex ([0-9]+|#+)(?!.*([0-9]+|#+))
import re values = [ 'image.0001', 'image###', '###image###', 'image001', 'image_001', '001', '0001.image', '001image', '001_image', 'image', '01_image01', '03_image01', ] pattern = '([0-9]+|#+|@+)' regex = '{0}(?!.*{0})'.format(pattern) for v in values: result = re.search(regex, v) if result: print result.groups()
Currently it is returning…. ('01', None)
I’d like it to return something like ('image', '0001')
Updated
Optionally is there a way to split the strings by groups of numbers…for example
'image.0001' > ['image.', '0001'] 'image###' > ['image', '###'] '###image###' > ['###', 'image', '###'] 'image001' > ['image', '001'] 'image_001' > ['image_', '001'] '001' > ['001'] '0001.image' > ['0001', '.image'] '001image' > ['001', 'image'] '001_image' > ['001', '_image'] 'image' > ['image'] '01_image01' > ['01', '_image', '01'] '03_image01' > ['03', '_image', '01']
Advertisement
Answer
EDIT:
Use
re.findall(r'd+|#+|@+|[^#@d]+', v)
See proof.
Explanation
-------------------------------------------------------------------------------- d+ digits (0-9) (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- #+ '#' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- @+ '@' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- [^#@d]+ any character except: '#', '@', digits (0- 9) (1 or more times (matching the most amount possible))
ORIGINAL:
Use re.split
, add capturing group to keep captured part inside the result:
import re values = [ 'image.0001', 'image###', '###image###', 'image001', 'image_001', '001', '0001.image', '001image', '001_image', 'image', '01_image01', '03_image01', ] pattern = '[0-9]+|#+|@+' regex = re.compile(r'({0})(?!.*(?:{0}))'.format(pattern)) for v in values: print(regex.split(v))
See Python proof
Results:
['image.', '0001', ''] ['image', '###', ''] ['###image', '###', ''] ['image', '001', ''] ['image_', '001', ''] ['', '001', ''] ['', '0001', '.image'] ['', '001', 'image'] ['', '001', '_image'] ['image'] ['01_image', '01', ''] ['03_image', '01', '']
See regex proof.
Explanation
-------------------------------------------------------------------------------- ( group and capture to 1: -------------------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- #+ '#' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- @+ '@' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- ) end of 1 -------------------------------------------------------------------------------- (?! look ahead to see if there is not: -------------------------------------------------------------------------------- .* any character except n (0 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- (?: group, but do not capture: -------------------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- #+ '#' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- @+ '@' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- ) end of grouping -------------------------------------------------------------------------------- ) end of look-ahead