Hello i am trying to convert a dictionary into a dataframe, containing results from a search on amazon (I am using an API.). I would like each product to be a row in the dataframe with the keys as column headers. However there is some keys in the beginning, that i am not interested in having in the table.
Below am i converting the JSON into a dictionary, which i would like to convert it into a dataframe.
JavaScript
x
13
13
1
from pandas.io.json import json_normalize
2
3
filename = api_result.json()
4
5
def convert_json_to_dict(filename):
6
with open(filename) as JSON:
7
json_dict = json.load(JSON)
8
return json_dict
9
10
11
def convert_dict_to_df(filename):
12
return pd.json_normalize(convert_json_to_dict(filename))
13
Here is part the data in the dictionary (2 out of 25 products total).
JavaScript
1
121
121
1
filename = {
2
'request_info': {
3
'credits_remaining': 72,
4
'credits_used': 28,
5
'credits_used_this_request': 1,
6
'success': True
7
},
8
'request_metadata': {
9
'amazon_url': 'https://www.amazon.com/s/?k=memory+cards&ref=nb_sb_noss_2',
10
'created_at': '2021-02-14T15:09:04.802Z',
11
'processed_at': '2021-02-14T15:09:11.003Z',
12
'timing': ['global_init - 0ms (total 0ms)',
13
'auth_apikey - 35ms (total 35ms)',
14
'auth_retrieve_plan - 20ms (total 56ms)',
15
'auth_retrieve_credit_usage - 22ms (total '
16
'79ms)',
17
'processing_invoking_worker - 31ms (total '
18
'111ms)',
19
'processing_execution_complete - 6202ms '
20
'(total 6313ms)',
21
'auth_credit_usage_reconcile - 81ms (total '
22
'6394ms)',
23
'global_end - 0ms (total 6394ms)'],
24
'total_time_taken': 6.2
25
},
26
'request_parameters': {
27
'amazon_domain': 'amazon.com',
28
'search_term': 'memory cards',
29
'type': 'search'
30
},
31
'search_results': [{
32
'asin': 'B08L26TYQ3',
33
'categories': [{
34
'id': 'search-alias=aps',
35
'name': 'All Departments'
36
}
37
],
38
'delivery': {
39
'price': {
40
'currency': 'USD',
41
'is_free': True,
42
'raw': 'FREE Shipping by Amazon',
43
'symbol': '$',
44
'value': 0
45
},
46
'tagline': 'Get it as soon as Tue, Feb 16'
47
},
48
'image': 'https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg',
49
'is_amazon_fresh': False,
50
'is_prime': True,
51
'is_whole_foods_market': False,
52
'link': 'https://www.amazon.com/dp/B08L26TYQ3',
53
'position': 1,
54
'price': {
55
'currency': 'USD',
56
'raw': '$29.99',
57
'symbol': '$',
58
'value': 29.99
59
},
60
'prices': [{
61
'currency': 'USD',
62
'raw': '$29.99',
63
'symbol': '$',
64
'value': 29.99
65
}
66
],
67
'rating': 4.3,
68
'ratings_total': 74,
69
'sponsored': True,
70
'title': 'Micro Center Premium 256GB SDXC Card Class 10 '
71
'SD Flash Memory Card UHS-I C10 U3 V30 4K UHD '
72
'Video R/W Speed up to 80 MB/s for Cameras '
73
'Computers Trail Cams (256GB)'
74
}, {
75
'asin': 'B08N46XMPH',
76
'categories': [{
77
'id': 'search-alias=aps',
78
'name': 'All Departments'
79
}
80
],
81
'delivery': {
82
'price': {
83
'currency': 'USD',
84
'is_free': True,
85
'raw': 'FREE Shipping on orders '
86
'over $25 shipped by Amazon',
87
'symbol': '$',
88
'value': 0
89
},
90
'tagline': 'Get it as soon as Tue, Feb 16'
91
},
92
'image': 'https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg',
93
'is_amazon_fresh': False,
94
'is_prime': True,
95
'is_whole_foods_market': False,
96
'link': 'https://www.amazon.com/dp/B08N46XMPH',
97
'position': 2,
98
'price': {
99
'currency': 'USD',
100
'raw': '$22.68',
101
'symbol': '$',
102
'value': 22.68
103
},
104
'prices': [{
105
'currency': 'USD',
106
'raw': '$22.68',
107
'symbol': '$',
108
'value': 22.68
109
}
110
],
111
'rating': 4.8,
112
'ratings_total': 16,
113
'sponsored': True,
114
'title': '256GB Micro SD Memory Card SD Memory Card/TF '
115
'Card Class 10 High Speed Card with Adapter for '
116
'Camera, Phone, Computer, Dash Came, '
117
'Surveillance,Drone'
118
}
119
]
120
}
121
The dataframe would look something like below, although with more columns:
JavaScript
1
4
1
search_term Position ASIN Categories Price Currency
2
1 memory cards 1 B08L26TYQ3 All Departments 29.99 USD
3
2 memory cards 2 B08N46XMPH All Departments 22.68 USD
4
I have already tried the answers from this question, but it did not work: Convert Python dict into a dataframe
Advertisement
Answer
json_normalize
is no longer imported frompandas.io.json
. It is now in the top-level namespace.- Update your pandas to the current version with pip or conda, depending on your environment.
- Most of the required information is in the
'search_results'
key, but'search_term'
is nested in'request_parameters'
, so that key must be set into alist
for themeta
parameter ofpandas.json_normalize
- The information in the
'prices'
column seems to overlap with existing data in other columns.- The column has been normalized below, but it could just be dropped, since there is no new information in it.
- Unneeded columns can be removed with
pandas.DataFrame.drop
, or usepandas.DataFrame.loc
to select only the needed columns, as shown below. - As per the timing analysis for this question,
df.join(pd.DataFrame(df.pop(col).values.tolist()))
is the fastest way to normalize a single leveldict
from a column and join it back to the main dataframe, but this answer shows how to deal with columns that are problematic (e.g. result in errors when trying.values.tolist()
).
JavaScript
1
31
31
1
import pandas as pd
2
3
# object returned from api
4
filename = api_result.json()
5
6
# begin by normalizing filename
7
main = pd.json_normalize(filename, record_path=['search_results'], meta=['request_parameters'])
8
9
# request_paramets is a column of dicts, which must be converted to individual columns for each key
10
x = main.join(pd.DataFrame(main.pop('request_parameters').values.tolist()))
11
12
# categories and prices are lists of dicts, which must be exploded into separate rows
13
x = x.apply(pd.Series.explode)
14
15
# convert the dicts in categories and prices to separate columns for each key
16
x = x.join(pd.DataFrame(x.pop('categories').values.tolist()))
17
x = x.join(pd.DataFrame(x.pop('prices').values.tolist()))
18
19
# display(x)
20
asin image is_amazon_fresh is_prime is_whole_foods_market link position rating ratings_total sponsored title delivery.price.currency delivery.price.is_free delivery.price.raw delivery.price.symbol delivery.price.value delivery.tagline price.currency price.raw price.symbol price.value amazon_domain search_term type id name currency raw symbol value
21
0 B08L26TYQ3 https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08L26TYQ3 1 4.3 74 True Micro Center Premium 256GB SDXC Card Class 10 SD Flash Memory Card UHS-I C10 U3 V30 4K UHD Video R/W Speed up to 80 MB/s for Cameras Computers Trail Cams (256GB) USD True FREE Shipping by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $29.99 $ 29.99 amazon.com memory cards search search-alias=aps All Departments USD $29.99 $ 29.99
22
1 B08N46XMPH https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08N46XMPH 2 4.8 16 True 256GB Micro SD Memory Card SD Memory Card/TF Card Class 10 High Speed Card with Adapter for Camera, Phone, Computer, Dash Came, Surveillance,Drone USD True FREE Shipping on orders over $25 shipped by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $22.68 $ 22.68 amazon.com memory cards search search-alias=aps All Departments USD $22.68 $ 22.68
23
24
# use loc to select only the desired columns and rename them as needed
25
final = x.loc[:, ['asin', 'search_term', 'position', 'name', 'price.currency', 'price.value']].rename(columns={'name': 'Categories', 'price.currency': 'currency', 'price.value': 'price'})
26
27
# display(final)
28
asin search_term position Categories currency price
29
0 B08L26TYQ3 memory cards 1 All Departments USD 29.99
30
1 B08N46XMPH memory cards 2 All Departments USD 22.68
31