I want to build a canonical url for my website: my.com
here are the requirements:
- always include www subdomain
- always use https protocol
- remove default 80 and 443 ports
- remove trailing slash
Example:
http://my.com => https://www.my.com http://my.com/ => https://www.my.com https://my.com:80/ => https://www.my.com https://sub.my.com/ => https://sub.my.com https://sub.my.com?term=t1 => https://sub.my.com?term=t1
This is what I have tried:
from urllib.parse import urlparse, urljoin def build_canonical_url(request): absolute = request.build_absolute_uri(request.path) parsed = urlparse(absolute) parsed.scheme == 'https' if parsed.hostname.startswith('my.com'): parsed.hostname == 'www.my.com' if parsed.port == 80 or parsed.port == 443: parsed.port == None # how to join this url components? # canonical = join parsed.scheme, parsed.hostname, parsed.port and parsed.query
But I don’t know how to join these url components?
Advertisement
Answer
You just need to write a simple function,
In [1]: def build_canonical_url(url): ...: parsed = urlparse(url) ...: port = '' ...: if parsed.hostname.startswith('my.com') or parsed.hostname.startswith('www.my.com'): ...: hostname = 'www.my.com' ...: else: ...: hostname = parsed.hostname ...: if parsed.port == 80 or parsed.port == 443: ...: port = '' ...: scheme = 'https' ...: parsed_url = f'{scheme}://{hostname}' ...: if port: ...: parsed_url = f'{parsed_ur}:{port}/' ...: if parsed.query: ...: parsed_url = f'{parsed_url}?{parsed.query}' ...: return parsed_url ...:
Execution,
In [2]: urls = ["http://my.com", "http://my.com/", "https://my.com:80/", "https://sub.my.com/", "https://sub.my.com?term=t1"] In [3]: for url in urls: ...: print(f'{url} >> {build_canonical_url(url)}') ...: http://my.com >> https://www.my.com http://my.com/ >> https://www.my.com https://my.com:80/ >> https://www.my.com https://sub.my.com/ >> https://sub.my.com https://sub.my.com?term=t1 >> https://sub.my.com?term=t1
Few issues of your code,
parsed.scheme == ‘https’ -> It’s not the right way to assign a value, It’s a statement gives True
or False
And parsed.scheme doesn’t allow to setttr.