# Get the content type of a URL def get_url_type(url: str) -> str: r = urlopen(url) header = r.headers return header.get_content_type()
Would the following only fetch the headers or would it fetch the whole document?
I’m using this to check if it is or not an html page, to avoid downloading (big) files.
Advertisement
Answer
It appears that only the headers and a bit of the body is retrieved in the urlopen()
call, and the rest is gotten upon the read()
.
In testing this with something like wireshark, you can see that even if you call urllib.urlopen('VeryLargeFile')
you still only receive headers and a few packets of body initially.
However, if you only wish to know the size of a page or its content type you can use requests
:
import requests rh = requests.head(url) header = rh.headers content_type = header.get('content-type') print(content_type) # text/html; charset=UTF-8