I’m trying to use python and regex to get the last set of integers in a filename (string) Which the method does what i need, however I want to also return the inverse or remaining parts of the regex. How can i do that?
Here is the regex ([0-9]+|#+)(?!.*([0-9]+|#+))
import re
values = [
'image.0001',
'image###',
'###image###',
'image001',
'image_001',
'001',
'0001.image',
'001image',
'001_image',
'image',
'01_image01',
'03_image01',
]
pattern = '([0-9]+|#+|@+)'
regex = '{0}(?!.*{0})'.format(pattern)
for v in values:
result = re.search(regex, v)
if result:
print result.groups()
Currently it is returning…. ('01', None) I’d like it to return something like ('image', '0001')
Updated
Optionally is there a way to split the strings by groups of numbers…for example
'image.0001' > ['image.', '0001'] 'image###' > ['image', '###'] '###image###' > ['###', 'image', '###'] 'image001' > ['image', '001'] 'image_001' > ['image_', '001'] '001' > ['001'] '0001.image' > ['0001', '.image'] '001image' > ['001', 'image'] '001_image' > ['001', '_image'] 'image' > ['image'] '01_image01' > ['01', '_image', '01'] '03_image01' > ['03', '_image', '01']
Advertisement
Answer
EDIT:
Use
re.findall(r'd+|#+|@+|[^#@d]+', v)
See proof.
Explanation
--------------------------------------------------------------------------------
d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[^#@d]+ any character except: '#', '@', digits (0-
9) (1 or more times (matching the most
amount possible))
ORIGINAL:
Use re.split, add capturing group to keep captured part inside the result:
import re
values = [
'image.0001',
'image###',
'###image###',
'image001',
'image_001',
'001',
'0001.image',
'001image',
'001_image',
'image',
'01_image01',
'03_image01',
]
pattern = '[0-9]+|#+|@+'
regex = re.compile(r'({0})(?!.*(?:{0}))'.format(pattern))
for v in values:
print(regex.split(v))
See Python proof
Results:
['image.', '0001', ''] ['image', '###', ''] ['###image', '###', ''] ['image', '001', ''] ['image_', '001', ''] ['', '001', ''] ['', '0001', '.image'] ['', '001', 'image'] ['', '001', '_image'] ['image'] ['01_image', '01', ''] ['03_image', '01', '']
See regex proof.
Explanation
--------------------------------------------------------------------------------
( group and capture to 1:
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
) end of 1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
.* any character except n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
#+ '#' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
@+ '@' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
) end of look-ahead