Is urlopen lazily evaluated?

Question

Would the following only fetch the headers or would it fetch the whole document? I'm using this to check if it is or not an html page, to avoid downloading (big) files. Answer It appears that only the headers and a bit of the body is retrieved in the urlopen() call, and the rest is gotten upon the read(). In

Accepted Answer

It appears that only the headers and a bit of the body is retrieved in the urlopen() call, and the rest is gotten upon the read().In testing this with something like wireshark, you can see that even if you call urllib.urlopen('VeryLargeFile') you still only receive headers and a few packets of body initially.However, if you only wish to know the size of a page or its content type you can use requests:import requestsrh = requests.head(url)header = rh.headerscontent_type = header.get('content-type')print(content_type)# text/html; charset=UTF-8

Advertisement

Answer