I have a column in a DataFrame
that contains JSON strings, each representing a list of dictionaries:
id Number Type Class Name datiles 0 292 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}] 1 293 C 1 2 A [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:
id Number Type Class Name did NId Att ..... .... label ........ 0292 C 1 2 A {"id":"3","num":1} a1,b1,c1 null [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]
I have done this as I need but I don’t know how to join it with the original data frame since I don’t have a key between them:
df['datiles']=data['datiles'].apply(json.loads) df2 = pd.DataFrame([]) for x in df['datiles'].values.tolist(): df2 = df2 .append(pd.DataFrame(x)) display(df2)
How can I split the column and join at the same time? I have tried to use json_normalize but I get this error
AttributeError: 'list' object has no attribute 'values'
Also, I have seen those posts but does not work, may that because of the list structure
How to convert python JSON rows to dataframe columns without looping
Pandas split column of lists into multiple columns
How to split a list of dictionaries into multiple columns keeping the same index?
Advertisement
Answer
You can use the index of your df
and explicitly set it to your new DataFrame
to join with, like that:
df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame) out = df.drop('datiles', axis=1).join( pd.concat(df['datiles'].values, keys=df.index).droplevel(1))
Explanation
- The first line does a double
apply
:json.loads
(as you had figured out) andpd.DataFrame
(that you had figured out too, but here we do it in anapply
instead of a loop). - The second line concats all those
DataFrames
insidedf['datiles']
, but uses the index ofdf
itself as keys. The result is aMultiIndex
, with possibly several rows for a given key (if the originaldatiles
JSON string was a list of more than 1 element). In any case, we drop that second level. Thenjoin
does its usual thing (on indexes).
Example
The setup is a bit verbose for a SO answer (I wish we had an expand
or fold
macro), so I pasted it in pastebin.
The point is, the first datiles
is a JSON list of two elements, just to exercise the logic above. Aside from that, it’s the same content as per the OP.
Output
id Number Type Class Name did NId Att 0 292 C 1 2 A {'id': '1', 'num': 1} a1,b1,c1 None 0 292 C 1 2 A {'id': '2', 'num': 1} a1,b1,c1 None 1 293 C 1 2 A {'id': '3', 'num': 1} a1,b1,c1 None isnull number label codes 0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... [] 0 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... [] 1 False M90 [{'title': 'Dear', 'Info': {'Id': None, 'id2':... [] rule 0 None 0 None 1 None