Skip to content
Advertisement

Convert nested dict to dataframe, syntax error?

Problem

I am converting multiple nested dicts to dataframes. I have a slightly different dict that I haven’t been able to convert to a dataframe using my attempted solution. I am providing a shortened copy of my dict with dummy values as the reprex.

Reprex dict:

{'metrics': [{'metric': 'DatasetCorrelationsMetric',
   'result': {'current': {'stats': {'pearson': {'target_prediction_correlation': None,
       'abs_max_features_correlation': 0.1},
      'cramer_v': {'target_prediction_correlation': None,
       'abs_max_features_correlation': None}}},
    'reference': {'stats': {'pearson': {'target_prediction_correlation': None,
       'abs_max_features_correlation': 0.7},
      'cramer_v': {'target_prediction_correlation': None,
       'abs_max_features_correlation': None}}}}}]}

My attempted solution

Code is based on similar dict wrangling problems that I had, but I am not sure how to apply it for this specific dict.

data = {}
for result in reprex_dict['metrics']:
    data[result['result']] = {
        **{f"ref_{key}": val for key, val in result['result']['reference'].items()},
        **{f"cur_{key}": val for key, val in result['result']['current'].items()}
    }

Expected dataframe format:

cur_pearson_target_prediction_correlation cur_pearson_abs_max_features_correlation cur_cramer_v_target_prediction_correlation
None 0.1 None

Error message

I am currently getting this error too.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [403], line 7
      5 data = {}
      6 for result in corr_matrix_dict['metrics']:
----> 7     data[result['result']] = {
      8         **{f"ref_{key}": val for key, val in result['result']['reference']['stats'].items()},
      9         **{f"cur_{key}": val for key, val in result['result']['current']['stats'].items()}
     10     }

TypeError: unhashable type: 'dict'

Advertisement

Answer

As mentioned in the comments, the TypeError is due result['result'] (which is a dictionary) not being usable as a key. If you used some thing [like result['metric']] then the error would no longer be raised, but I think that the resulting structure is not the outcome you want.

For flattening nested data, I often take a recursive approach. Below is a simplified version of my flattenObj function:

def flattenDict(orig:dict, kList=[], kSep='_', rename={}):
    if not isinstance(orig, dict): return [(kList, orig)]
    tList, dCt = [], len([v for v in orig.values() if isinstance(v,dict)])
    for k, v in orig.items():
        kli = kList + ([] if isinstance(v,dict) and dCt==1 else [str(k)])
        tList += flattenDict(v, kli, None)
    if not isinstance(kSep, str): return tList
    return {kSep.join([rename.get(k,k) for k in kl]):v for kl,v in tList}

# import pandas as pd
nrMap = {'current':'cur','reference':'ref'}
rows = [flattenDict(result, rename=nrMap) for result in reprex_dict['metrics']]
rowsDf = pd.DataFrame(rows)
rows
[{'metric': 'DatasetCorrelationsMetric',
  'cur_pearson_target_prediction_correlation': None,
  'cur_pearson_abs_max_features_correlation': 0.1,
  'cur_cramer_v_target_prediction_correlation': None,
  'cur_cramer_v_abs_max_features_correlation': None,
  'ref_pearson_target_prediction_correlation': None,
  'ref_pearson_abs_max_features_correlation': 0.7,
  'ref_cramer_v_target_prediction_correlation': None,
  'ref_cramer_v_abs_max_features_correlation': None}]

rowsDf.T [Transposed to fit better]
opdf

If you don’t want the metric column, you can either drop it or omit it by defining rows as

rows = [flattenDict(result['result'], rename=nrMap) for result in reprex_dict['metrics']]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement