So, instead of trying to explain things first, I will just show you what I have and what I want (this is easier):
What I have:
dict_list = [ {'some': 1.2, 'key': 1.3, 'words': 3.9, 'label': 0}, {'other': 1.2, 'wordly': 1.3, 'words': 3.9, 'label': 1}, {'other': 10, 'work': 1.3, 'like': 3.9, 'label': 1}, ]
What I want to get from what I have:
dict_dict = { "0":{'some': 1.2, 'key': 1.3, 'words': 3.9}, "1":{'other': 10, 'wordly': 1.3, 'work': 1.3, 'like': 3.9, 'words': 3.9}, }
Explanation:
So, I want to create a dictionary by using the “label
” keys as the main keys in that new dictionary. I also need to merge dictionaries that have the same label. During this merging, I need to keep the highest value if there is a duplicate key (as the “other
” key in the example).
Why don’t I do all of this before I create the original list of dicts?
Because dict_list
is a result of a joblib (multiprocessing) process. Sharing some objects between processes slowing down the multiprocessing. So, instead of sharing, I have decided to run the heavy work on multiple cores and then do the organizing after. I am not sure if this approach will be any helpful but I can’t know without testing.
Advertisement
Answer
Counter module has nice merging feature a|b
which joins the dictionaries keeping the higher values.
from collections import Counter dict_dict = {} for dictionary in dict_list: label = str(dictionary.pop('label')) dict_dict[label] = dict_dict.get(label,Counter())|Counter(dictionary) ###If you don't need Counters, just convert back to dictionaries dict_dict = {i:dict(v) for i,v in dict_dict.items()}