I’m Looking for a generic way of turning a DataFrame to a nested dictionary
This is a sample data frame
JavaScript
x
8
1
name v1 v2 v3
2
0 A A1 A11 1
3
1 A A2 A12 2
4
2 B B1 B12 3
5
3 C C1 C11 4
6
4 B B2 B21 5
7
5 A A2 A21 6
8
The number of columns may differ and so does the column names.
like this :
JavaScript
1
10
10
1
{
2
'A' : {
3
'A1' : { 'A11' : 1 }
4
'A2' : { 'A12' : 2 , 'A21' : 6 }} ,
5
'B' : {
6
'B1' : { 'B12' : 3 } } ,
7
'C' : {
8
'C1' : { 'C11' : 4}}
9
}
10
What is best way to achieve this ?
closest I got was with the zip
function but haven’t managed to make it work for more then one level (two columns).
Advertisement
Answer
I don’t understand why there isn’t a B2
in your dict. I’m also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
JavaScript
1
8
1
def recur_dictify(frame):
2
if len(frame.columns) == 1:
3
if frame.values.size == 1: return frame.values[0][0]
4
return frame.values.squeeze()
5
grouped = frame.groupby(frame.columns[0])
6
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
7
return d
8
which produces
JavaScript
1
13
13
1
>>> df
2
name v1 v2 v3
3
0 A A1 A11 1
4
1 A A2 A12 2
5
2 B B1 B12 3
6
3 C C1 C11 4
7
4 B B2 B21 5
8
5 A A2 A21 6
9
>>> pprint.pprint(recur_dictify(df))
10
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
11
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
12
'C': {'C1': {'C11': 4}}}
13
It might be simpler to use a non-pandas approach, though:
JavaScript
1
11
11
1
def retro_dictify(frame):
2
d = {}
3
for row in frame.values:
4
here = d
5
for elem in row[:-2]:
6
if elem not in here:
7
here[elem] = {}
8
here = here[elem]
9
here[row[-2]] = row[-1]
10
return d
11