Skip to content
Advertisement

list of dictionaries into data frame columns

I have a column in a DataFrame that contains JSON strings, each representing a list of dictionaries:

    id Number  Type  Class Name                                                                                                                                                                                                                                datiles
0  292      C     1      2    A  [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]
1  293      C     1      2    A  [{"did":{"id":"3","num":1},"NId":"a1,b1,c1","Att":null,"isnull":false,"number":"M90","label":[{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}],"codes":[],"rule":null}]

I want to convert each row in datiles column to rows and columns and join them with the original data frame as shown on the sample below:

       id     Number       Type     Class      Name            did                NId        Att  ..... .... label ........
      0292           C          1          2        A     {"id":"3","num":1}    a1,b1,c1    null    [{"title":"Dear","Info"{"Id":null,"id2":2,"Name":"x"}},{"title":"Dear","Info":{"Id":null,"id2":2,"Name":"x"}}]

I have done this as I need but I don’t know how to join it with the original data frame since I don’t have a key between them:

df['datiles']=data['datiles'].apply(json.loads)

df2 = pd.DataFrame([])

for x in df['datiles'].values.tolist():
    df2 = df2 .append(pd.DataFrame(x))
display(df2)

How can I split the column and join at the same time? I have tried to use json_normalize but I get this error

AttributeError: 'list' object has no attribute 'values'

Also, I have seen those posts but does not work, may that because of the list structure

How to convert python JSON rows to dataframe columns without looping

Pandas split column of lists into multiple columns

How to split a list of dictionaries into multiple columns keeping the same index?

Advertisement

Answer

You can use the index of your df and explicitly set it to your new DataFrame to join with, like that:

df['datiles'] = df['datiles'].apply(json.loads).apply(pd.DataFrame)
out = df.drop('datiles', axis=1).join(
    pd.concat(df['datiles'].values, keys=df.index).droplevel(1))

Explanation

  1. The first line does a double apply: json.loads (as you had figured out) and pd.DataFrame (that you had figured out too, but here we do it in an apply instead of a loop).
  2. The second line concats all those DataFrames inside df['datiles'], but uses the index of df itself as keys. The result is a MultiIndex, with possibly several rows for a given key (if the original datiles JSON string was a list of more than 1 element). In any case, we drop that second level. Then join does its usual thing (on indexes).

Example

The setup is a bit verbose for a SO answer (I wish we had an expand or fold macro), so I pasted it in pastebin.

The point is, the first datiles is a JSON list of two elements, just to exercise the logic above. Aside from that, it’s the same content as per the OP.

Output

    id Number  Type  Class Name                    did       NId   Att  
0  292      C     1      2    A  {'id': '1', 'num': 1}  a1,b1,c1  None   
0  292      C     1      2    A  {'id': '2', 'num': 1}  a1,b1,c1  None   
1  293      C     1      2    A  {'id': '3', 'num': 1}  a1,b1,c1  None   

   isnull number                                              label codes  
0   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   
0   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   
1   False    M90  [{'title': 'Dear', 'Info': {'Id': None, 'id2':...    []   

   rule  
0  None  
0  None  
1  None
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement