I have a list (with dictionaries inside) and I want to know how many different domains are inside it.
I have something like this:
list = [ {'url': 'https://stackoverflow.com/questions', 'number': 10}, {'url': 'https://stackoverflow.com/users', 'number': 40}, {'url': 'https://stackexchange.com/tour', 'number': 40}, {'url': 'https://stackexchange.com/whatever/whatever', 'number': 25} ]
The desired result would look like this:
unique_domains = [ {'url': 'https://stackoverflow.com'}, {'url': 'https://stackexchange.com'} ]
Or maybe just:
unique_domains = ['stackoverflow.com', 'stackexchange.com']
Both would be OK, so whatever is easier or faster I guess.
I think I could use Regex for this, but maybe there are more pythonic and/or efficient ways to do this?
Thanks!
Advertisement
Answer
You can use urllib.parse.urlparse
(from standard library) together with set comprehension (to avoid duplicates):
from urllib.parse import urlparse unique_domains = {urlparse(item['url']).netloc for item in given_list}
If you need, you can convert set
to list
via list(unique_domains)
. This is more reliable than regex solution.
(please don’t call variable list
, it shadows useful builtin).