Problem
I am converting multiple nested dicts to dataframes. I have a slightly different dict that I haven’t been able to convert to a dataframe using my attempted solution. I am providing a shortened copy of my dict with dummy values as the reprex.
Reprex dict:
{'metrics': [{'metric': 'DatasetCorrelationsMetric', 'result': {'current': {'stats': {'pearson': {'target_prediction_correlation': None, 'abs_max_features_correlation': 0.1}, 'cramer_v': {'target_prediction_correlation': None, 'abs_max_features_correlation': None}}}, 'reference': {'stats': {'pearson': {'target_prediction_correlation': None, 'abs_max_features_correlation': 0.7}, 'cramer_v': {'target_prediction_correlation': None, 'abs_max_features_correlation': None}}}}}]}
My attempted solution
Code is based on similar dict wrangling problems that I had, but I am not sure how to apply it for this specific dict.
data = {} for result in reprex_dict['metrics']: data[result['result']] = { **{f"ref_{key}": val for key, val in result['result']['reference'].items()}, **{f"cur_{key}": val for key, val in result['result']['current'].items()} }
Expected dataframe format:
cur_pearson_target_prediction_correlation | cur_pearson_abs_max_features_correlation | cur_cramer_v_target_prediction_correlation |
---|---|---|
None | 0.1 | None |
Error message
I am currently getting this error too.
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In [403], line 7 5 data = {} 6 for result in corr_matrix_dict['metrics']: ----> 7 data[result['result']] = { 8 **{f"ref_{key}": val for key, val in result['result']['reference']['stats'].items()}, 9 **{f"cur_{key}": val for key, val in result['result']['current']['stats'].items()} 10 } TypeError: unhashable type: 'dict'
Advertisement
Answer
As mentioned in the comments, the TypeError
is due result['result']
(which is a dictionary) not being usable as a key. If you used some thing [like result['metric']
] then the error would no longer be raised, but I think that the resulting structure is not the outcome you want.
For flattening nested data, I often take a recursive approach. Below is a simplified version of my flattenObj
function:
def flattenDict(orig:dict, kList=[], kSep='_', rename={}): if not isinstance(orig, dict): return [(kList, orig)] tList, dCt = [], len([v for v in orig.values() if isinstance(v,dict)]) for k, v in orig.items(): kli = kList + ([] if isinstance(v,dict) and dCt==1 else [str(k)]) tList += flattenDict(v, kli, None) if not isinstance(kSep, str): return tList return {kSep.join([rename.get(k,k) for k in kl]):v for kl,v in tList}
# import pandas as pd nrMap = {'current':'cur','reference':'ref'} rows = [flattenDict(result, rename=nrMap) for result in reprex_dict['metrics']] rowsDf = pd.DataFrame(rows)
rows
[{'metric': 'DatasetCorrelationsMetric', 'cur_pearson_target_prediction_correlation': None, 'cur_pearson_abs_max_features_correlation': 0.1, 'cur_cramer_v_target_prediction_correlation': None, 'cur_cramer_v_abs_max_features_correlation': None, 'ref_pearson_target_prediction_correlation': None, 'ref_pearson_abs_max_features_correlation': 0.7, 'ref_cramer_v_target_prediction_correlation': None, 'ref_cramer_v_abs_max_features_correlation': None}]
If you don’t want the metric
column, you can either drop
it or omit it by defining rows
as
rows = [flattenDict(result['result'], rename=nrMap) for result in reprex_dict['metrics']]