I’m Looking for a generic way of turning a DataFrame to a nested dictionary
This is a sample data frame
name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6
The number of columns may differ and so does the column names.
like this :
{ 'A' : { 'A1' : { 'A11' : 1 } 'A2' : { 'A12' : 2 , 'A21' : 6 }} , 'B' : { 'B1' : { 'B12' : 3 } } , 'C' : { 'C1' : { 'C11' : 4}} }
What is best way to achieve this ?
closest I got was with the zip
function but haven’t managed to make it work for more then one level (two columns).
Advertisement
Answer
I don’t understand why there isn’t a B2
in your dict. I’m also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
def recur_dictify(frame): if len(frame.columns) == 1: if frame.values.size == 1: return frame.values[0][0] return frame.values.squeeze() grouped = frame.groupby(frame.columns[0]) d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped} return d
which produces
>>> df name v1 v2 v3 0 A A1 A11 1 1 A A2 A12 2 2 B B1 B12 3 3 C C1 C11 4 4 B B2 B21 5 5 A A2 A21 6 >>> pprint.pprint(recur_dictify(df)) {'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}}, 'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}}, 'C': {'C1': {'C11': 4}}}
It might be simpler to use a non-pandas approach, though:
def retro_dictify(frame): d = {} for row in frame.values: here = d for elem in row[:-2]: if elem not in here: here[elem] = {} here = here[elem] here[row[-2]] = row[-1] return d