I have a column in a DataFrame that contains JSON strings, each representing a list of dictionaries:
id Number Type Class Name datiles
0 292 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
1 293 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:
id Number Type Class Name did NId Att ..... .... label ........
0292 C 1 2 A {"id":"3","num":1} a1,b1,c1 null [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]
I have done this as I need but I don’t know how to join it with the original data frame since I don’t have a key between them:
df['datiles']=data['datiles'].apply(json.loads)
df2 = pd.DataFrame([])
for x in df['datiles'].values.tolist():
df2 = df2 .append(pd.DataFrame(x))
display(df2)
How can I split the column and join at the same time? I have tried to use json_normalize but I get this error
AttributeError: 'list' object has no attribute 'values'
Also, I have seen those posts but does not work, may that because of the list structure
How to convert python JSON rows to dataframe columns without looping
Pandas split column of lists into multiple columns
How to split a list of dictionaries into multiple columns keeping the same index?
Advertisement
Answer
You can use the index of your df and explicitly set it to your new DataFrame to join with, like that:
df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame)
out = df.drop('datiles', axis=1).join(
pd.concat(df['datiles'].values, keys=df.index).droplevel(1))
Explanation
- The first line does a double
apply:json.loads(as you had figured out) andpd.DataFrame(that you had figured out too, but here we do it in anapplyinstead of a loop). - The second line concats all those
DataFramesinsidedf['datiles'], but uses the index ofdfitself as keys. The result is aMultiIndex, with possibly several rows for a given key (if the originaldatilesJSON string was a list of more than 1 element). In any case, we drop that second level. Thenjoindoes its usual thing (on indexes).
Example
The setup is a bit verbose for a SO answer (I wish we had an expand or fold macro), so I pasted it in pastebin.
The point is, the first datiles is a JSON list of two elements, just to exercise the logic above. Aside from that, it’s the same content as per the OP.
Output
id Number Type Class Name did NId Att
0 292 C 1 2 A {'id': '1', 'num': 1} a1,b1,c1 None
0 292 C 1 2 A {'id': '2', 'num': 1} a1,b1,c1 None
1 293 C 1 2 A {'id': '3', 'num': 1} a1,b1,c1 None
isnull number label codes
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
1 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... []
rule
0 None
0 None
1 None