Expand selected keys in a json pandas column

Question

I have this sample dataset: And I want to 'expand' (or 'explode') each value in the json column, but only selecting some columns. This is the expected result: Firstly I tried using json_normalize and iterate over each row (even when the last row has no data), but I have to know before how many rows I'm going to expand: I

Accepted Answer

Here are the steps you could follow(1) define dfdf = pd.DataFrame(    {'id':['AM','AN','AP'],     'target':[130,60,180],     'moves':[[{'date':'2022-08-01','amount':285.0,'name':'Cookie'},               {'name':'Rush','amount':10,'date':'2022-08-02','type':'song'}],              [{'amount':250.5,'date':'2022-08-01','source':{'data':'bing'}}],[]]})print(df)   id  target                                                                                                                              moves0  AM     130  [{'date': '2022-08-01', 'amount': 285.0, 'name': 'Cookie'}, {'name': 'Rush', 'amount': 10, 'date': '2022-08-02', 'type': 'song'}]1  AN      60                                                              [{'amount': 250.5, 'date': '2022-08-01', 'source': {'data': 'bing'}}]2  AP     180                                                                                                                                 [](2) explode the column &#8216;moves&#8217;df1 = df.explode('moves', ignore_index=True)print(df1)   id  target                                                                 moves0  AM     130             {'date': '2022-08-01', 'amount': 285.0, 'name': 'Cookie'}1  AM     130  {'name': 'Rush', 'amount': 10, 'date': '2022-08-02', 'type': 'song'}2  AN      60   {'amount': 250.5, 'date': '2022-08-01', 'source': {'data': 'bing'}}3  AP     180                                                                   NaN(3) json_normalize the column &#8216;moves&#8217;df2 = pd.json_normalize(df1['moves'])print(df2)         date  amount    name  type source.data0  2022-08-01   285.0  Cookie   NaN         NaN1  2022-08-02    10.0    Rush  song         NaN2  2022-08-01   250.5     NaN   NaN        bing3         NaN     NaN     NaN   NaN         NaN(4) concat the 2 df with only the relevant columnsdf3 = pd.concat([df1[['id', 'target']], df2[['date', 'amount', 'name']]], axis=1)print(df3)   id  target        date  amount    name0  AM     130  2022-08-01   285.0  Cookie1  AM     130  2022-08-02    10.0    Rush2  AN      60  2022-08-01   250.5     NaN3  AP     180         NaN     NaN     NaN

Advertisement

Answer