Skip to content
Advertisement

Aggregating dicts within a list based on key value

I’m struggling to wrap my head around this one. I’ve got a list with multiple dictionaries that I would like to aggregate based on two values. Example code:

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 44 },
...     { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 46 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

I’m trying to aggregate dicts that have the same age and regex and adding the count key across all instances. Example output would be:

>>> data = [
...     { "regex": ".*ccc-r.*", "age": 44, "count": 244 },
...     { "regex": ".*nft-r.*", "age": 23, "count": 90 },
...     { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
...     { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
...     ]

Would like to do this without pandas or addon modules, would prefer a solution from the std lib if at all possible.

Thanks!

Advertisement

Answer

Assuming you do not want to use any imports, you can first collect the data in a dictionary aggregated_data in which the key will be a tuple of (regex, age), and the value will be the count. Once you have formed this dictionary, you can form back the original structure you had:

data = [
    { "regex": ".*ccc-r.*", "age": 44, "count": 224 },
    { "regex": ".*nft-r.*", "age": 23, "count": 44 },
    { "regex": ".*ccc-r.*", "age": 44, "count": 20 },
    { "regex": ".*ccc-r.*", "age": 32, "count": 16 },
    { "regex": ".*nft-r.*", "age": 23, "count": 46 },
    { "regex": ".*zxy-r.*", "age": 16, "count": 55 }
]

aggregated_data = {}

for dictionary in data:
    key = (dictionary['regex'], dictionary['age'])
    aggregated_data[key] = aggregated_data.get(key, 0) + dictionary['count']

data = [{'regex': key[0], 'age': key[1], 'count': value} for key, value in aggregated_data.items()]
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement