I have a pandas DataFrame containing one column with multiple JSON data items as list of dicts. I want to normalize the JSON column and duplicate the non-JSON columns:
JavaScript
x
12
12
1
# creating dataframe
2
df_actions = pd.DataFrame(columns=['id', 'actions'])
3
rows = [[12,json.loads('[{"type": "a","value": "17"},{"type": "b","value": "19"}]')],
4
[15, json.loads('[{"type": "a","value": "1"},{"type": "b","value": "3"},{"type": "c","value": "5"}]')]]
5
df_actions.loc[0] = rows[0]
6
df_actions.loc[1] = rows[1]
7
8
>>>df_actions
9
id actions
10
0 12 [{'type': 'a', 'value': '17'}, {'type': 'b', '...
11
1 15 [{'type': 'a', 'value': '1'}, {'type': 'b', 'v...
12
I want
JavaScript
1
8
1
>>>df_actions_parsed
2
id type value
3
12 a 17
4
12 b 19
5
15 a 1
6
15 b 3
7
15 c 5
8
I can normalize JSON data using:
JavaScript
1
2
1
pd.concat([pd.DataFrame(json_normalize(x)) for x in df_actions['actions']],ignore_index=True)
2
but I don’t know how to join that back to the id column of the original DataFrame.
Advertisement
Answer
You can use concat
with dict comprehension
with pop
for extract column, remove second level and join
to original:
JavaScript
1
5
1
df1 = (pd.concat({i: pd.DataFrame(x) for i, x in df_actions.pop('actions').items()})
2
.reset_index(level=1, drop=True)
3
.join(df_actions)
4
.reset_index(drop=True))
5
What is same as:
JavaScript
1
5
1
df1 = (pd.concat({i: json_normalize(x) for i, x in df_actions.pop('actions').items()})
2
.reset_index(level=1, drop=True)
3
.join(df_actions)
4
.reset_index(drop=True))
5
JavaScript
1
8
1
print (df1)
2
type value id
3
0 a 17 12
4
1 b 19 12
5
2 a 1 15
6
3 b 3 15
7
4 c 5 15
8
Another solution if performance is important:
JavaScript
1
11
11
1
L = [{**{'i':k, **y}} for k, v in df_actions.pop('actions').items() for y in v]
2
3
df_actions = df_actions.join(pd.DataFrame(L).set_index('i')).reset_index(drop=True)
4
print (df_actions)
5
id type value
6
0 12 a 17
7
1 12 b 19
8
2 15 a 1
9
3 15 b 3
10
4 15 c 5
11