Using this to try to remove URLs from a string:
text = re.sub(r'https?://[A-Za-z0-9./]+', '', text)
Unfortunately it works for simple URLs but not for complex ones.
So something like http://www.example.com/somestuff.html
will be removed but something like http://www.example.com/somestuff.html?query=python
etc. will just leave trailing bits behind.
I think I’m at the limits of my re knowledge so any help will be much appreciated. Thx.
Advertisement
Answer
Try:
r"https?:[^s]+"