Skip to content
Advertisement

Remove duplicate values from list of dictionaries

I’m trying to filter out my list of dictionaries by two keys. I have a huge list of items and I need to find a way to filter out those items that have repeated ‘id’ and ‘updated_at’ keys.

Here is the item list example:

items = [{
        'id': 1,
        'updated_at': '11/11/2020T00:00:00',
        'title': 'Some title',
        'value': 'Some value',
        'replies': 1
    }, {
        'id': 1,
        'updated_at': '11/11/2020T00:00:00',
        'title': 'This is duplicate by id and updated',
        'value': 'This item should be removed',
        'replies': 1
    }, {
        'id': 1,
        'updated_at': '11/11/2020T17:00:10',
        'title': 'This is only duplicate by id',
        'value': 'Some value',
        'replies': 1
    }]

I want to remove those dictionaries that have the same ‘id’ and ‘updated_at’. What would be the correct way of doing this?

Advertisement

Answer

Instead of a list of dictionary, why not a dictionary of dictionaries?

filtered_dict = {(d['id'], d['updated_at']): d for d in list_of_dicts}

Since you mention no preference in your question, this will probably take the last duplicate.

You could create your own dict object with a special hash, but this seems easier. If you want a list back then just take filtered_dict.values().

If by chance you only want the first match you are going to have to add a few lines of code.:

existing_dicts = set()
filtered_list = []
for d in list_of_dicts:
    if (d['id'], d['updated_at']) not in existing_dicts:
        existing_dicts.add((d['id'], d['updated_at']))
        filtered_list.append(d)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement