I’m struggling to wrap my head around this one. I’ve got a list with multiple dictionaries that I would like to aggregate based on two values. Example code:
>>> data = [ ... { "regex": ".*ccc-r.*", "age": 44, "count": 224 }, ... { "regex": ".*nft-r.*", "age": 23, "count": 44 }, ... { "regex": ".*ccc-r.*", "age": 44, "count": 20 }, ... { "regex": ".*ccc-r.*", "age": 32, "count": 16 }, ... { "regex": ".*nft-r.*", "age": 23, "count": 46 }, ... { "regex": ".*zxy-r.*", "age": 16, "count": 55 } ... ]
I’m trying to aggregate dicts that have the same age and regex and adding the count key across all instances. Example output would be:
>>> data = [ ... { "regex": ".*ccc-r.*", "age": 44, "count": 244 }, ... { "regex": ".*nft-r.*", "age": 23, "count": 90 }, ... { "regex": ".*ccc-r.*", "age": 32, "count": 16 }, ... { "regex": ".*zxy-r.*", "age": 16, "count": 55 } ... ]
Would like to do this without pandas or addon modules, would prefer a solution from the std lib if at all possible.
Thanks!
Advertisement
Answer
Assuming you do not want to use any imports, you can first collect the data in a dictionary aggregated_data
in which the key will be a tuple of (regex, age)
, and the value will be the count
. Once you have formed this dictionary, you can form back the original structure you had:
data = [ { "regex": ".*ccc-r.*", "age": 44, "count": 224 }, { "regex": ".*nft-r.*", "age": 23, "count": 44 }, { "regex": ".*ccc-r.*", "age": 44, "count": 20 }, { "regex": ".*ccc-r.*", "age": 32, "count": 16 }, { "regex": ".*nft-r.*", "age": 23, "count": 46 }, { "regex": ".*zxy-r.*", "age": 16, "count": 55 } ] aggregated_data = {} for dictionary in data: key = (dictionary['regex'], dictionary['age']) aggregated_data[key] = aggregated_data.get(key, 0) + dictionary['count'] data = [{'regex': key[0], 'age': key[1], 'count': value} for key, value in aggregated_data.items()]