Skip to content
Advertisement

Is urlopen lazily evaluated?

# Get the content type of a URL
def get_url_type(url: str) -> str:
  r = urlopen(url)
  header = r.headers
  return header.get_content_type()

Would the following only fetch the headers or would it fetch the whole document?

I’m using this to check if it is or not an html page, to avoid downloading (big) files.

Advertisement

Answer

It appears that only the headers and a bit of the body is retrieved in the urlopen() call, and the rest is gotten upon the read().

In testing this with something like wireshark, you can see that even if you call urllib.urlopen('VeryLargeFile') you still only receive headers and a few packets of body initially.

However, if you only wish to know the size of a page or its content type you can use requests:

import requests

rh = requests.head(url)
header = rh.headers
content_type = header.get('content-type')

print(content_type)
# text/html; charset=UTF-8
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement