I have a list (with dictionaries inside) and I want to know how many different domains are inside it.
I have something like this:
JavaScript
x
7
1
list = [
2
{'url': 'https://stackoverflow.com/questions', 'number': 10},
3
{'url': 'https://stackoverflow.com/users', 'number': 40},
4
{'url': 'https://stackexchange.com/tour', 'number': 40},
5
{'url': 'https://stackexchange.com/whatever/whatever', 'number': 25}
6
]
7
The desired result would look like this:
JavaScript
1
5
1
unique_domains = [
2
{'url': 'https://stackoverflow.com'},
3
{'url': 'https://stackexchange.com'}
4
]
5
Or maybe just:
JavaScript
1
2
1
unique_domains = ['stackoverflow.com', 'stackexchange.com']
2
Both would be OK, so whatever is easier or faster I guess.
I think I could use Regex for this, but maybe there are more pythonic and/or efficient ways to do this?
Thanks!
Advertisement
Answer
You can use urllib.parse.urlparse
(from standard library) together with set comprehension (to avoid duplicates):
JavaScript
1
4
1
from urllib.parse import urlparse
2
3
unique_domains = {urlparse(item['url']).netloc for item in given_list}
4
If you need, you can convert set
to list
via list(unique_domains)
. This is more reliable than regex solution.
(please don’t call variable list
, it shadows useful builtin).