Hi there I have following Problem:
I extracted a list of URL’s from a .txt file with Python using this:
import re with open('html.txt') as f: urls = f.read() links = re.findall('"((http)s?://.*?)"', urls) for url in links: print(url[0])
And the Output contains for some files following:
https://url.com/?download_file=259&order=wc_order_xDxDxD&email=testmail%40gmail.com&key=1234-1234-1234-1234-8c368abd9c22
PROBLEM IS:
as you see it printed out “#038;” I’m thinking that translates into “&” but there is already a “&” infront of that and if I follow the Link its invalid.
However if I delete all “#038;” the Link works just fine.
How can I print them so that I dont have “#038;” inside and the Link works?
Thanks so much
Advertisement
Answer
Looks like a url encoding issue. Since, you are only printing, you can use string replace function.
for url in links: url[0].replace("#038","")