I am working on an interactive visualization of the world happiness report from the years 2015 up to 2020. The data was split into 6 csv files. Using pandas, I have succesfully cleaned the data and concatenated them into one big JSON file with the following format:
[
  {
    "Country": "Switzerland",
    "Year": 2015,
    "Happiness Rank": 1,
    "Happiness Score": 7.587000000000001,
  },
  {
    "Country": "Iceland",
    "Year": 2015,
    "Happiness Rank": 2,
    "Happiness Score": 7.561,
  },
  {
    "Country": "Switzerland",
    "Year": 2016,
    "Happiness Rank": 2,
    "Happiness Score": 7.5089999999999995,
  },
  {
    "Country": "Iceland",
    "Year": 2016,
    "Happiness Rank": 3,
    "Happiness Score": 7.501,
  },
  {
    "Country": "Switzerland",
    "Year": 2017,
    "Happiness Rank": 3,
    "Happiness Score": 7.49399995803833,
  },
  {
    "Country": "Iceland",
    "Year": 2017,
    "Happiness Rank": 1,
    "Happiness Score": 7.801,
  }
]
Now, I would like to programmatically format the JSON file such that it has the following format:
{
    "2015": {
        "Switzerland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.587000000000001
        },
        "Iceland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.561
        }
    },
    "2016": {
        "Switzerland": {
            "Happiness Rank": 2,
            "Happiness Score": 7.5089999999999995
        },
        "Iceland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.501
        }
    },
    "2017": {
        "Switzerland": {
            "Happiness Rank": 3,
            "Happiness Score": 7.49399995803833
        },
        "Iceland": {
            "Happiness Rank": 1,
            "Happiness Score": 7.801
        }
    }
}
It has to be done programmatically, since there are over 900 distinct (country, year) pairs. I want the JSON in this format since it make the JSON file more readable, and makes it easier to select appropriate data. If I want the rank of Iceland in 2015, I can then do data[2015]["Iceland"]["Happiness Rank"]
Does anyone know the easiest / most convenient way to do this in Python?
Advertisement
Answer
If data is your original list of dictionaries:
def by_year(data):
    from itertools import groupby
    from operator import itemgetter
    retain_keys = ("Happiness Rank", "Happiness Score")
    for year, group in groupby(data, key=itemgetter("Year")):
        as_tpl = tuple(group)
        yield str(year), dict(zip(map(itemgetter("Country"), as_tpl), [{k: d[k] for k in retain_keys} for d in as_tpl]))
print(dict(by_year(data)))
Output:
{'2015': {'Switzerland': {'Happiness Rank': 1, 'Happiness Score': 7.587000000000001}, 'Iceland': {'Happiness Rank': 2, 'Happiness Score': 7.561}}, '2016': {'Switzerland': {'Happiness Rank': 2, 'Happiness Score': 7.5089999999999995}, 'Iceland': {'Happiness Rank': 3, 'Happiness Score': 7.501}}, '2017': {'Switzerland': {'Happiness Rank': 3, 'Happiness Score': 7.49399995803833}, 'Iceland': {'Happiness Rank': 1, 'Happiness Score': 7.801}}}
>>> 
This assumes that the dictionaries in data will already be grouped together by year.