Skip to content
Advertisement

How to extract json from nested column to dataframe

I’m pulling stock data from TD Ameritrade API and I want to store it in a DataFrame.

From the API I get a nested JSON object and when I put it in a data frame I get 4 columns: Index, Candles, Empty, Symbol. However inside of candles is a dictionary that I want as separate columns in the dataframe (‘open’,’close’,…)

I’ve tried json_normalize and pd.io.json.json_normalize

neither gave me the desired result

import pandas as pd
import requests
from pandas.io.json import json_normalize

endpoint = r'https://api.tdameritrade.com/v1/marketdata/{}/pricehistory'.format('GOOG')

client_id = 'AMSAFI1234567'

payload = {'apikey':client_id,
       'periodType': 'day',
       'frequencyType': 'minute',
       'frequency' :'1',
       'period':'2',
       'endDate': '1556158524000',
       'startDate': '1554535854000',
       'needExtendedHoursData':'true'}

content = requests.get(url = endpoint, params = payload)

data = content.json()
print(data)

Output:

{'candles': [{'open': 1260.25, 'high': 1260.5, 'low': 1260.0, 'close': 1260.28, 
'volume': 2544, 'datetime': 1556029980000}, {'open': 1260.39, 'high': 1260.61, 
'low': 1260.3501, 'close': 1260.3501, 'volume': 1703, 'datetime': 
1556030040000}, {'open': 1260.35, 'high': 1260.59, 'low': 1260.07, 'close': 
1260.56, 'volume': 2156, 'datetime': 1556030100000}, {'open': 1260.56, 'high': 
1260.56, 'low': 1259.27, 'close': 1259.7, 'volume': 1320, 'datetime': 
1556030160000}, {'open': 1260.06, 'high': 1260.06, 'low': 1259.56, 'close': 
1259.56, 'volume': 800, 'datetime': 1556030220000},

….

'close': 1264.61, 'volume': 100, 'datetime': 1556146920000}, {'open': 1265.87, 
'high': 1266.0, 'low': 1265.87, 'close': 1266.0, 'volume': 232, 'datetime': 
1556147220000}], 'symbol': 'GOOG', 'empty': False}

Input:

pd.DataFrame(data)

Output:

Data frame with 4 columns (‘Index’, ‘Candles’, ‘Empty’, ‘Symbol’). The Candles column is a dictionary. I’m trying to get all the keys in the dictionaries as columns and the key values as rows in the dataframe

Advertisement

Answer

You’re using json_normalize 1 level too high. You’re wanting to normalize/flatten out the data under data['candles']:

I’d also be careful about posting api keys.

import pandas as pd
import requests
from pandas.io.json import json_normalize

endpoint = r'https://api.tdameritrade.com/v1/marketdata/{}/pricehistory'.format('GOOG')

client_id = 'XXXXXXXXXXX'

payload = {'apikey':client_id,
       'periodType': 'day',
       'frequencyType': 'minute',
       'frequency' :'1',
       'period':'2',
       'endDate': '1556158524000',
       'startDate': '1554535854000',
       'needExtendedHoursData':'true'}

content = requests.get(url = endpoint, params = payload)

data = content.json()
df = json_normalize(data['candles'])

Output:

print (df)
         close       datetime       high        low       open  volume
0    1267.0000  1556035860000  1267.8600  1267.0000  1267.8600    1450
1    1266.8500  1556035920000  1266.8500  1266.8500  1266.8500     100
2    1266.5300  1556035980000  1266.7300  1266.2400  1266.6750    1290
3    1267.1613  1556036040000  1267.1613  1266.5400  1266.5500    1190
4    1267.4150  1556036100000  1267.4150  1266.8800  1266.8800    1100
5    1267.4299  1556036160000  1267.4299  1267.4299  1267.4299     250
6    1267.4540  1556036220000  1268.1800  1267.4540  1267.8100    1650
7    1267.0800  1556036280000  1267.5100  1267.0800  1267.4900     900
8    1265.6850  1556036340000  1267.1210  1265.5300  1267.1210    4148
9    1265.4600  1556036400000  1265.9600  1265.1703  1265.8300    2290
10   1266.2774  1556036460000  1266.4800  1265.4050  1265.4050    3341
11   1266.4684  1556036520000  1266.4684  1266.3247  1266.3247    1134
12   1266.8550  1556036580000  1267.0500  1266.4600  1266.4600    1500
13   1267.2550  1556036640000  1267.3500  1266.6401  1267.0393    1619
14   1267.2400  1556036700000  1267.2450  1267.2400  1267.2450     230
15   1266.8000  1556036760000  1267.4400  1266.8000  1267.4400     940
16   1266.0992  1556036820000  1266.5270  1266.0992  1266.5270    1523
17   1266.2599  1556036880000  1266.2700  1266.2599  1266.2700     600
18   1265.8400  1556036940000  1266.2350  1265.6800  1265.8400    2165
19   1265.5400  1556037000000  1265.8600  1265.5000  1265.5300    1400
20   1265.9650  1556037060000  1265.9900  1265.1200  1265.4532    1550
21   1265.6300  1556037120000  1265.7750  1265.4300  1265.5929    1580
22   1265.4469  1556037180000  1265.5300  1265.1000  1265.5300    1071
23   1265.6600  1556037240000  1265.7100  1265.6313  1265.7100     650
24   1266.1850  1556037300000  1266.1950  1265.6257  1265.6257     930
25   1266.1400  1556037360000  1266.2500  1265.9400  1266.1300    1050
26   1266.4250  1556037420000  1266.5750  1266.3000  1266.3294    1130
27   1266.4800  1556037480000  1266.6500  1266.3500  1266.6500     900
28   1266.7400  1556037540000  1266.8300  1266.5700  1266.7100    1103
29   1266.8450  1556037600000  1266.8600  1266.8100  1266.8600     600
..         ...            ...        ...        ...        ...     ...
585  1256.0000  1556136000000  1256.0000  1256.0000  1256.0000  211625
586  1258.0000  1556136360000  1258.0000  1256.0000  1256.0000    1154
587  1260.7100  1556136420000  1260.7100  1260.0000  1260.0000     550
588  1262.9500  1556136540000  1262.9500  1262.9500  1262.9500     100
589  1265.2600  1556136600000  1265.2600  1262.9500  1262.9500    2103
590  1264.5000  1556136660000  1264.5000  1263.9700  1263.9700     486
591  1264.0000  1556136840000  1264.0000  1264.0000  1264.0000     100
592  1265.6100  1556136900000  1265.6100  1265.5000  1265.5000     300
593  1264.0600  1556136960000  1264.0600  1264.0600  1264.0600     100
594  1265.1800  1556137020000  1265.1800  1265.1800  1265.1800     100
595  1264.0000  1556137140000  1264.0000  1264.0000  1264.0000     192
596  1264.9000  1556137320000  1265.1400  1264.9000  1264.9000     537
597  1264.6500  1556137620000  1264.6500  1264.6500  1264.6500     500
598  1264.7500  1556137680000  1264.7500  1264.7500  1264.7500     243
599  1266.4900  1556137740000  1266.4900  1266.4900  1266.4900     124
600  1268.0000  1556138580000  1268.0000  1268.0000  1268.0000     100
601  1267.2900  1556138700000  1267.2900  1267.2900  1267.2900     100
602  1268.9800  1556138820000  1268.9800  1268.9800  1268.9800     100
603  1269.0700  1556139240000  1269.1200  1269.0700  1269.1200     200
604  1256.0000  1556139420000  1256.0000  1256.0000  1256.0000     118
605  1269.0900  1556139480000  1269.0900  1269.0900  1269.0900     100
606  1270.0000  1556139540000  1270.0000  1270.0000  1270.0000     200
607  1267.3800  1556141040000  1267.3800  1267.3800  1267.3800     100
608  1268.0000  1556141100000  1268.0000  1268.0000  1268.0000     150
609  1268.6600  1556141940000  1268.6600  1268.6600  1268.6600     100
610  1265.0000  1556143620000  1265.0000  1265.0000  1265.0000     200
611  1265.0000  1556143740000  1265.0000  1265.0000  1265.0000     100
612  1256.0000  1556146620000  1256.0000  1256.0000  1256.0000     136
613  1264.6100  1556146920000  1264.6100  1264.6100  1264.6100     100
614  1266.0000  1556147220000  1266.0000  1265.8700  1265.8700     232

[615 rows x 6 columns]
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement