I have a dictionary with structure:
Level 1:
- id (int)
- username (str)
- meta (contain a string of Kpi_info)
This is a dictionary:
dict = {'id': 206, 'username': 'hantran','meta': '{"kpi_info": {"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}}, "2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}' }
My desire result is a DataFame like this:
id | username | Year | Month | revenue | kpi | result |
---|---|---|---|---|---|---|
206 | hantran | 2021 | 1 | 2000 | 2100 | 0 |
206 | hantran | 2021 | 2 | 2500 | 2000 | 1 |
206 | hantran | 2022 | 1 | 3000 | 2500 | 1 |
206 | hantran | 2022 | 2 | 2500 | 3000 | 0 |
Apparently, similar question has been discussed here. However, the solution only work for 3-level dictionary. I don’t know how to make it work for my 1-level dictionary with most of the needed information is in a string.
Advertisement
Answer
If the string in your dictionary is valid json, it can easily be converted into a dictionary:
from json import loads d = {'id': 206, 'username': 'hantran', 'meta': '{"kpi_info": {"2021" :{"1":{"revenue":"2000", "kpi":"2100","result":"0"}, "2":{"revenue":"2500", "kpi":"2000", "result":"1"}}, "2022": {"1":{"revenue":"3000", "kpi":"2500","result":"1"}, "2":{"revenue":"2500", "kpi":"3000", "result":"0"}}}}'} d['meta'] = loads(d['meta'])
However, the representation of the dictionary in your code is not a valid dictionary, as it is missing a closing }
. There’s no easy way to deal with errors like these, so you should check if your actual data has this problem, or whether you should check the code you share more carefully.
Note that you shouldn’t call a dictionary dict
, since doing so will shadow the actual dict
type and you won’t be able to access that normally after doing so.
With the dictionary d
it’s now fairly easy to construct a DataFrame as needed:
from pandas import DataFrame df = DataFrame([ { 'id': d['id'], 'username': d['username'], 'year': int(k1), 'month': int(k2), 'revenue': d2['revenue'], 'kpi': d2['kpi'], 'result': d2['result'] } for k1, d1 in d['meta']['kpi_info'].items() for k2, d2 in d1.items() ]) print(df)
This makes use of pandas
ability to turn a list of dictionaries into a dataframe, using the keys of the dictionaries as column references.
Result:
id username year month revenue kpi result 0 206 hantran 2021 1 2000 2100 0 1 206 hantran 2021 2 2500 2000 1 2 206 hantran 2022 1 3000 2500 1 3 206 hantran 2022 2 2500 3000 0