Skip to content
Advertisement

removing URL from string using python’s re

Using this to try to remove URLs from a string:

text = re.sub(r'https?://[A-Za-z0-9./]+', '', text)

Unfortunately it works for simple URLs but not for complex ones. So something like http://www.example.com/somestuff.html will be removed but something like http://www.example.com/somestuff.html?query=python etc. will just leave trailing bits behind.

I think I’m at the limits of my re knowledge so any help will be much appreciated. Thx.

Advertisement

Answer

Try:

r"https?:[^s]+"

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement