I have a list (with dictionaries inside) and I want to know how many different domains are inside it. I have something like this: The desired result would look like this: Or maybe just: Both would be OK, so whatever is easier or faster I guess. I think I could use Regex for this, but maybe there are more pythonic

Python

I have a list (with dictionaries inside) and I want to know how many different domains are inside it.

I have something like this:

list = [
    {'url': 'https://stackoverflow.com/questions', 'number': 10},
    {'url': 'https://stackoverflow.com/users', 'number': 40},
    {'url': 'https://stackexchange.com/tour', 'number': 40}, 
    {'url': 'https://stackexchange.com/whatever/whatever', 'number': 25}
]

JavaScript
​x
 
list = [
    {'url': 'https://stackoverflow.com/questions', 'number': 10},
    {'url': 'https://stackoverflow.com/users', 'number': 40},
    {'url': 'https://stackexchange.com/tour', 'number': 40}, 
    {'url': 'https://stackexchange.com/whatever/whatever', 'number': 25}
] 
​

The desired result would look like this:

unique_domains = [
    {'url': 'https://stackoverflow.com'},
    {'url': 'https://stackexchange.com'}
]

JavaScript
 
unique_domains = [
    {'url': 'https://stackoverflow.com'},
    {'url': 'https://stackexchange.com'}
]
​

Or maybe just:

unique_domains = ['stackoverflow.com', 'stackexchange.com']

JavaScript
 
unique_domains = ['stackoverflow.com', 'stackexchange.com']
​

Both would be OK, so whatever is easier or faster I guess.

I think I could use Regex for this, but maybe there are more pythonic and/or efficient ways to do this?

Thanks!

Answer

You can use urllib.parse.urlparse (from standard library) together with set comprehension (to avoid duplicates):

from urllib.parse import urlparse

unique_domains = {urlparse(item['url']).netloc for item in given_list}

JavaScript
 
from urllib.parse import urlparse
​
unique_domains = {urlparse(item['url']).netloc for item in given_list}
​

If you need, you can convert set to list via list(unique_domains). This is more reliable than regex solution.

(please don’t call variable list, it shadows useful builtin).

Python – Find distinct domains inside a list of dictionaries

Advertisement

Answer