Problem
I am converting multiple nested dicts to dataframes. I have a slightly different dict that I haven’t been able to convert to a dataframe using my attempted solution. I am providing a shortened copy of my dict with dummy values as the reprex.
Reprex dict:
{'metrics': [{'metric': 'DatasetCorrelationsMetric',
'result': {'current': {'stats': {'pearson': {'target_prediction_correlation': None,
'abs_max_features_correlation': 0.1},
'cramer_v': {'target_prediction_correlation': None,
'abs_max_features_correlation': None}}},
'reference': {'stats': {'pearson': {'target_prediction_correlation': None,
'abs_max_features_correlation': 0.7},
'cramer_v': {'target_prediction_correlation': None,
'abs_max_features_correlation': None}}}}}]}
My attempted solution
Code is based on similar dict wrangling problems that I had, but I am not sure how to apply it for this specific dict.
data = {}
for result in reprex_dict['metrics']:
data[result['result']] = {
**{f"ref_{key}": val for key, val in result['result']['reference'].items()},
**{f"cur_{key}": val for key, val in result['result']['current'].items()}
}
Expected dataframe format:
cur_pearson_target_prediction_correlation | cur_pearson_abs_max_features_correlation | cur_cramer_v_target_prediction_correlation |
---|---|---|
None | 0.1 | None |
Error message
I am currently getting this error too.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In [403], line 7
5 data = {}
6 for result in corr_matrix_dict['metrics']:
----> 7 data[result['result']] = {
8 **{f"ref_{key}": val for key, val in result['result']['reference']['stats'].items()},
9 **{f"cur_{key}": val for key, val in result['result']['current']['stats'].items()}
10 }
TypeError: unhashable type: 'dict'
Advertisement
Answer
As mentioned in the comments, the TypeError
is due result['result']
(which is a dictionary) not being usable as a key. If you used some thing [like result['metric']
] then the error would no longer be raised, but I think that the resulting structure is not the outcome you want.
For flattening nested data, I often take a recursive approach. Below is a simplified version of my flattenObj
function:
def flattenDict(orig:dict, kList=[], kSep='_', rename={}):
if not isinstance(orig, dict): return [(kList, orig)]
tList, dCt = [], len([v for v in orig.values() if isinstance(v,dict)])
for k, v in orig.items():
kli = kList + ([] if isinstance(v,dict) and dCt==1 else [str(k)])
tList += flattenDict(v, kli, None)
if not isinstance(kSep, str): return tList
return {kSep.join([rename.get(k,k) for k in kl]):v for kl,v in tList}
# import pandas as pd
nrMap = {'current':'cur','reference':'ref'}
rows = [flattenDict(result, rename=nrMap) for result in reprex_dict['metrics']]
rowsDf = pd.DataFrame(rows)
rows
JavaScript110101[{'metric': 'DatasetCorrelationsMetric',
2'cur_pearson_target_prediction_correlation': None,
3'cur_pearson_abs_max_features_correlation': 0.1,
4'cur_cramer_v_target_prediction_correlation': None,
5'cur_cramer_v_abs_max_features_correlation': None,
6'ref_pearson_target_prediction_correlation': None,
7'ref_pearson_abs_max_features_correlation': 0.7,
8'ref_cramer_v_target_prediction_correlation': None,
9'ref_cramer_v_abs_max_features_correlation': None}]
10
If you don’t want the metric
column, you can either drop
it or omit it by defining rows
as
rows = [flattenDict(result['result'], rename=nrMap) for result in reprex_dict['metrics']]