I’m trying to filter out my list of dictionaries by two keys. I have a huge list of items and I need to find a way to filter out those items that have repeated ‘id’ and ‘updated_at’ keys.
Here is the item list example:
items = [{ 'id': 1, 'updated_at': '11/11/2020T00:00:00', 'title': 'Some title', 'value': 'Some value', 'replies': 1 }, { 'id': 1, 'updated_at': '11/11/2020T00:00:00', 'title': 'This is duplicate by id and updated', 'value': 'This item should be removed', 'replies': 1 }, { 'id': 1, 'updated_at': '11/11/2020T17:00:10', 'title': 'This is only duplicate by id', 'value': 'Some value', 'replies': 1 }]
I want to remove those dictionaries that have the same ‘id’ and ‘updated_at’. What would be the correct way of doing this?
Advertisement
Answer
Instead of a list of dictionary, why not a dictionary of dictionaries?
filtered_dict = {(d['id'], d['updated_at']): d for d in list_of_dicts}
Since you mention no preference in your question, this will probably take the last duplicate.
You could create your own dict object with a special hash, but this seems easier. If you want a list back then just take filtered_dict.values()
.
If by chance you only want the first match you are going to have to add a few lines of code.:
existing_dicts = set() filtered_list = [] for d in list_of_dicts: if (d['id'], d['updated_at']) not in existing_dicts: existing_dicts.add((d['id'], d['updated_at'])) filtered_list.append(d)