Skip to content
Advertisement

How to append something to the beginning of Regex matches?

This is the regex code:

without_header = re.findall('/sports/[a-z0-9/.-:]*[0-9.]+cms', without_header_url)

It returns me the output of each URL which doesn’t have the https header in front. For example:

/sports/cricket/ipl/top-stories/kxip-vs-csk-shane-watson-faf-du-plessis-infuse-life-into-csks-ipl-campaign-shape-confidence-boosting-win-over-kxip/articleshow/78481088.cms'
/sports/football/epl/top-stories/epl-manchester-united-humiliated-as-mourinhos-spurs-win-6-1-at-old-trafford/articleshow/78481304.cms

For this, I want to append “https://example.com” in the beginning. I don’t want a for loop, is there any efficient way of doing it using re.sub?

Advertisement

Answer

You may use this regex in re.sub:

(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)

RegEx Demo

Code:

s = re.sub(r'(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)', r'https://1', s)

RegEx Details:

  • (?<!:/): Negative lookbehind to assert that we don’t have :/ at previous position
  • (/sports/[a-z0-9/.:-]*[0-9.]+cms): Match your text and capture in group #1
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement