How to append something to the beginning of Regex matches?

This is the regex code:

without_header = re.findall('/sports/[a-z0-9/.-:]*[0-9.]+cms', without_header_url)

JavaScript
​x
 
without_header = re.findall('/sports/[a-z0-9/.-:]*[0-9.]+cms', without_header_url)
​

It returns me the output of each URL which doesn’t have the https header in front. For example:

/sports/cricket/ipl/top-stories/kxip-vs-csk-shane-watson-faf-du-plessis-infuse-life-into-csks-ipl-campaign-shape-confidence-boosting-win-over-kxip/articleshow/78481088.cms'
/sports/football/epl/top-stories/epl-manchester-united-humiliated-as-mourinhos-spurs-win-6-1-at-old-trafford/articleshow/78481304.cms

JavaScript
 
/sports/cricket/ipl/top-stories/kxip-vs-csk-shane-watson-faf-du-plessis-infuse-life-into-csks-ipl-campaign-shape-confidence-boosting-win-over-kxip/articleshow/78481088.cms'
/sports/football/epl/top-stories/epl-manchester-united-humiliated-as-mourinhos-spurs-win-6-1-at-old-trafford/articleshow/78481304.cms
​

For this, I want to append “https://example.com” in the beginning. I don’t want a for loop, is there any efficient way of doing it using re.sub?

Answer

You may use this regex in re.sub:

(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)

JavaScript
 
(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)
​

RegEx Demo

Code:

s = re.sub(r'(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)', r'https://1', s)

JavaScript
 
s = re.sub(r'(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)', r'https://1', s)
​

RegEx Details:

(?<!:/): Negative lookbehind to assert that we don’t have :/ at previous position
(/sports/[a-z0-9/.:-]*[0-9.]+cms): Match your text and capture in group #1

Advertisement

Answer