Python 404’ing on urllib.request

Question

The basics of the code are below. I know for a fact how I'm retrieving these pages works for other URLs, as I just wrote a script scraping a different page in the same way. However with this specific URL it keeps throwing "urllib.error.HTTPError: HTTP Error 404: Not Found" in my face. I replaced the URL with a different one

Accepted Answer

The problem is most likely that the site you are trying to access is actively blocking spiders crawling; you can change the user agent to circumvent it. See this question for more information (the solution prescribed in that post seems to work for your url too). If you want to use urllib this post tells you how to alter the user agent.

Advertisement

Answer