I want to build a canonical url for my website: my.com
here are the requirements:
- always include www subdomain
- always use https protocol
- remove default 80 and 443 ports
- remove trailing slash
Example:
JavaScript
x
6
1
http://my.com => https://www.my.com
2
http://my.com/ => https://www.my.com
3
https://my.com:80/ => https://www.my.com
4
https://sub.my.com/ => https://sub.my.com
5
https://sub.my.com?term=t1 => https://sub.my.com?term=t1
6
This is what I have tried:
JavaScript
1
15
15
1
from urllib.parse import urlparse, urljoin
2
3
def build_canonical_url(request):
4
absolute = request.build_absolute_uri(request.path)
5
parsed = urlparse(absolute)
6
7
parsed.scheme == 'https'
8
if parsed.hostname.startswith('my.com'):
9
parsed.hostname == 'www.my.com'
10
if parsed.port == 80 or parsed.port == 443:
11
parsed.port == None
12
13
# how to join this url components?
14
# canonical = join parsed.scheme, parsed.hostname, parsed.port and parsed.query
15
But I don’t know how to join these url components?
Advertisement
Answer
You just need to write a simple function,
JavaScript
1
18
18
1
In [1]: def build_canonical_url(url):
2
parsed = urlparse(url) :
3
port = '' :
4
if parsed.hostname.startswith('my.com') or parsed.hostname.startswith('www.my.com'): :
5
hostname = 'www.my.com' :
6
else: :
7
hostname = parsed.hostname :
8
if parsed.port == 80 or parsed.port == 443: :
9
port = '' :
10
scheme = 'https' :
11
parsed_url = f'{scheme}://{hostname}' :
12
if port: :
13
parsed_url = f'{parsed_ur}:{port}/' :
14
if parsed.query: :
15
parsed_url = f'{parsed_url}?{parsed.query}' :
16
return parsed_url :
17
:
18
Execution,
JavaScript
1
10
10
1
In [2]: urls = ["http://my.com", "http://my.com/", "https://my.com:80/", "https://sub.my.com/", "https://sub.my.com?term=t1"]
2
In [3]: for url in urls:
3
print(f'{url} >> {build_canonical_url(url)}') :
4
:
5
http://my.com >> https://www.my.com
6
http://my.com/ >> https://www.my.com
7
https://my.com:80/ >> https://www.my.com
8
https://sub.my.com/ >> https://sub.my.com
9
https://sub.my.com?term=t1 >> https://sub.my.com?term=t1
10
Few issues of your code,
parsed.scheme == ‘https’ -> It’s not the right way to assign a value, It’s a statement gives True
or False
And parsed.scheme doesn’t allow to setttr.