Skip to content
Advertisement

python regex: duplicate names in named groups

Is there a way to use same name in regex named group in python? e.g.(?P<n>foo)|(?P<n>bar).

Use case: I am trying to capture type and id with this regex:
/(?=videos)((?P<type>videos)/(?P<id>d+))|(?P<type>w+)/?(?P<v>v)?/?(?P<id>d+)?
from this strings:

  • /channel/v/123
  • /ch/v/41500082
  • /channel
  • /videos/41500082

For now I am getting error: redefinition of group name 'id' as group 6; was group 3

Advertisement

Answer

The answer is: Python re does not support identically named groups.

Python PyPi regex module supports identically named named capturing groups:

The same name can be used by more than one group, with later captures ‘overwriting’ earlier captures. All of the captures of the group will be available from the captures method of the match object.

And here is a live Python 2.7 demo:

import regex
s = "foo bar"
rx = regex.compile(r"(?P<n>foo)|(?P<n>bar)")
print([x.group("n") for x in rx.finditer(s)])
// => ['foo', 'bar']

Also, in other cases, when you want to match several alternatives and capture just parts into one group, you may resort to a branch reset feature:

Branch reset

(?|...|...)

Capture group numbers will be reused across the alternatives, but groups with different names will have different group numbers.

Examples:

>>> regex.match(r"(?|(first)|(second))", "first").groups()
('first',)
>>> regex.match(r"(?|(first)|(second))", "second").groups()
('second',)

Note that there is only one group.

Advertisement