I’m struggling to wrap my head around this one. I’ve got a list with multiple dictionaries that I would like to aggregate based on two values. Example code:
JavaScript
x
9
1
>>> data = [
2
"regex": ".*ccc-r.*", "age": 44, "count": 224 }, {
3
"regex": ".*nft-r.*", "age": 23, "count": 44 }, {
4
"regex": ".*ccc-r.*", "age": 44, "count": 20 }, {
5
"regex": ".*ccc-r.*", "age": 32, "count": 16 }, {
6
"regex": ".*nft-r.*", "age": 23, "count": 46 }, {
7
"regex": ".*zxy-r.*", "age": 16, "count": 55 } {
8
]
9
I’m trying to aggregate dicts that have the same age and regex and adding the count key across all instances. Example output would be:
JavaScript
1
7
1
>>> data = [
2
"regex": ".*ccc-r.*", "age": 44, "count": 244 }, {
3
"regex": ".*nft-r.*", "age": 23, "count": 90 }, {
4
"regex": ".*ccc-r.*", "age": 32, "count": 16 }, {
5
"regex": ".*zxy-r.*", "age": 16, "count": 55 } {
6
]
7
Would like to do this without pandas or addon modules, would prefer a solution from the std lib if at all possible.
Thanks!
Advertisement
Answer
Assuming you do not want to use any imports, you can first collect the data in a dictionary aggregated_data
in which the key will be a tuple of (regex, age)
, and the value will be the count
. Once you have formed this dictionary, you can form back the original structure you had:
JavaScript
1
17
17
1
data = [
2
{ "regex": ".*ccc-r.*", "age": 44, "count": 224 },
3
{ "regex": ".*nft-r.*", "age": 23, "count": 44 },
4
{ "regex": ".*ccc-r.*", "age": 44, "count": 20 },
5
{ "regex": ".*ccc-r.*", "age": 32, "count": 16 },
6
{ "regex": ".*nft-r.*", "age": 23, "count": 46 },
7
{ "regex": ".*zxy-r.*", "age": 16, "count": 55 }
8
]
9
10
aggregated_data = {}
11
12
for dictionary in data:
13
key = (dictionary['regex'], dictionary['age'])
14
aggregated_data[key] = aggregated_data.get(key, 0) + dictionary['count']
15
16
data = [{'regex': key[0], 'age': key[1], 'count': value} for key, value in aggregated_data.items()]
17