Python3. How to save downloaded webpages to a specified dir?

Question

I am trying to save all the < a > links within the python homepage into a folder named &#8216;Downloaded pages&#8217;. However after 2 iterations through the for loop I receive the following error: www.python.org#content <_io.BufferedWriter name=&#8217;Downloaded Pages/www.python.org#content&#8217;&g…

Accepted Answer

The problem is that when you try to do things like parse the basename of a page with an .html dir it works, but when you try to do it with one that doesn&#8217;t specify it on the url like &#8220;http://python.org/&#8221; the basename is actually empty (you can try printing first the url and then the basename bewteen brackets or something to see what i mean). So to work arround that, the easiest solution would be to use absolue paths like @Thyebri said.And also, remember that the file you write cannot contain characters like '/', '' or '?'So, i dont know if the following code it&#8217;s messy or not, but using the re library  i would do the following:filename = re.sub('[/*:"?]+', '-', linkUrlToOpen.split("://")[1])downloadedPage = open(os.path.join('Downloaded_Pages', filename), 'wb')So, first i remove part i remove the "https://" part, and then with the regular expressions library i replace all the usual symbols that are present in url links with a dash '-' and that is the name that will be given to the file.Hope it works!

Advertisement

Answer