Skip to content
Advertisement

How to convert a 5-level dictionary into a DataFrame?

I have a dictionary with structure:

Level 1:

  • id (int)
  • username (str)
  • meta (contain a string of Kpi_info)

This is a dictionary:

dict = {'id': 206, 'username': 'hantran','meta': '{"kpi_info":
            {"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}},
            "2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}'
        }

My desire result is a DataFame like this:

id username Year Month revenue kpi result
206 hantran 2021 1 2000 2100 0
206 hantran 2021 2 2500 2000 1
206 hantran 2022 1 3000 2500 1
206 hantran 2022 2 2500 3000 0

Apparently, similar question has been discussed here. However, the solution only work for 3-level dictionary. I don’t know how to make it work for my 1-level dictionary with most of the needed information is in a string.

Advertisement

Answer

If the string in your dictionary is valid json, it can easily be converted into a dictionary:

from json import loads

d = {'id': 206, 'username': 'hantran', 'meta': '{"kpi_info": {"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}}, "2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}}'}

d['meta'] = loads(d['meta'])

However, the representation of the dictionary in your code is not a valid dictionary, as it is missing a closing }. There’s no easy way to deal with errors like these, so you should check if your actual data has this problem, or whether you should check the code you share more carefully.

Note that you shouldn’t call a dictionary dict, since doing so will shadow the actual dict type and you won’t be able to access that normally after doing so.

With the dictionary d it’s now fairly easy to construct a DataFrame as needed:

from pandas import DataFrame

df = DataFrame([
    {
        'id': d['id'], 'username': d['username'],
        'year': int(k1), 'month': int(k2),
        'revenue': d2['revenue'], 'kpi': d2['kpi'], 'result': d2['result']
    }
    for k1, d1 in d['meta']['kpi_info'].items()
    for k2, d2 in d1.items()
])

print(df)

This makes use of pandas ability to turn a list of dictionaries into a dataframe, using the keys of the dictionaries as column references.

Result:

    id username  year  month revenue   kpi result
0  206  hantran  2021      1    2000  2100      0
1  206  hantran  2021      2    2500  2000      1
2  206  hantran  2022      1    3000  2500      1
3  206  hantran  2022      2    2500  3000      0
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement