Skip to content
Advertisement

How to capture a group only if occurs twice in a line

import re

text = """
Tumble Trouble Twwixt Two Towns!
Was the Moon soon in the Sea
Or soon in the sky?
Nobody really knows YET.
"""

enter image description here

How should I make the match happen only when the occurence is found twice in a line?

Regular expression that highlights two ‘o’s that appear beside each other only if there is another occurence of two ‘o’s appearing beside each other subsequently in the same line

Advertisement

Answer

You can match a single word char with a backreference, and group that again.

The word character will become group 2 as the groups are nested, then the outer group will be group 1.

Then you can assert group 1 using a positive lookahead again in the line.

((w+)2)(?=.*?1)

The pattern matches:

  • ( Capture group 1
    • (w+)2 Match 1+ word chars in capture group 2 followed by a backreference to group 2 to match the same again
  • ) Close group 1
  • (?=.*?1) Positive lookahead to assert the captured value of group 1 in the line

See a regex demo and a Python demo.

Example

print(re.compile(r"((w+)2)(?=.*?1)").sub('{g<1>}', text.rstrip()))

Output

Tumble Trouble Twwixt Two Towns!
Was the M{oo}n soon in the Sea
Or soon in the sky?
Nobody really knows YET.
Advertisement