Hello i am trying to convert a dictionary into a dataframe, containing results from a search on amazon (I am using an API.). I would like each product to be a row in the dataframe with the keys as column headers. However there is some keys in the beginning, that i am not interested in having in the table.
Below am i converting the JSON into a dictionary, which i would like to convert it into a dataframe.
from pandas.io.json import json_normalize filename = api_result.json() def convert_json_to_dict(filename): with open(filename) as JSON: json_dict = json.load(JSON) return json_dict def convert_dict_to_df(filename): return pd.json_normalize(convert_json_to_dict(filename))
Here is part the data in the dictionary (2 out of 25 products total).
filename = { 'request_info': { 'credits_remaining': 72, 'credits_used': 28, 'credits_used_this_request': 1, 'success': True }, 'request_metadata': { 'amazon_url': 'https://www.amazon.com/s/?k=memory+cards&ref=nb_sb_noss_2', 'created_at': '2021-02-14T15:09:04.802Z', 'processed_at': '2021-02-14T15:09:11.003Z', 'timing': ['global_init - 0ms (total 0ms)', 'auth_apikey - 35ms (total 35ms)', 'auth_retrieve_plan - 20ms (total 56ms)', 'auth_retrieve_credit_usage - 22ms (total ' '79ms)', 'processing_invoking_worker - 31ms (total ' '111ms)', 'processing_execution_complete - 6202ms ' '(total 6313ms)', 'auth_credit_usage_reconcile - 81ms (total ' '6394ms)', 'global_end - 0ms (total 6394ms)'], 'total_time_taken': 6.2 }, 'request_parameters': { 'amazon_domain': 'amazon.com', 'search_term': 'memory cards', 'type': 'search' }, 'search_results': [{ 'asin': 'B08L26TYQ3', 'categories': [{ 'id': 'search-alias=aps', 'name': 'All Departments' } ], 'delivery': { 'price': { 'currency': 'USD', 'is_free': True, 'raw': 'FREE Shipping by Amazon', 'symbol': '$', 'value': 0 }, 'tagline': 'Get it as soon as Tue, Feb 16' }, 'image': 'https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg', 'is_amazon_fresh': False, 'is_prime': True, 'is_whole_foods_market': False, 'link': 'https://www.amazon.com/dp/B08L26TYQ3', 'position': 1, 'price': { 'currency': 'USD', 'raw': '$29.99', 'symbol': '$', 'value': 29.99 }, 'prices': [{ 'currency': 'USD', 'raw': '$29.99', 'symbol': '$', 'value': 29.99 } ], 'rating': 4.3, 'ratings_total': 74, 'sponsored': True, 'title': 'Micro Center Premium 256GB SDXC Card Class 10 ' 'SD Flash Memory Card UHS-I C10 U3 V30 4K UHD ' 'Video R/W Speed up to 80 MB/s for Cameras ' 'Computers Trail Cams (256GB)' }, { 'asin': 'B08N46XMPH', 'categories': [{ 'id': 'search-alias=aps', 'name': 'All Departments' } ], 'delivery': { 'price': { 'currency': 'USD', 'is_free': True, 'raw': 'FREE Shipping on orders ' 'over $25 shipped by Amazon', 'symbol': '$', 'value': 0 }, 'tagline': 'Get it as soon as Tue, Feb 16' }, 'image': 'https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg', 'is_amazon_fresh': False, 'is_prime': True, 'is_whole_foods_market': False, 'link': 'https://www.amazon.com/dp/B08N46XMPH', 'position': 2, 'price': { 'currency': 'USD', 'raw': '$22.68', 'symbol': '$', 'value': 22.68 }, 'prices': [{ 'currency': 'USD', 'raw': '$22.68', 'symbol': '$', 'value': 22.68 } ], 'rating': 4.8, 'ratings_total': 16, 'sponsored': True, 'title': '256GB Micro SD Memory Card SD Memory Card/TF ' 'Card Class 10 High Speed Card with Adapter for ' 'Camera, Phone, Computer, Dash Came, ' 'Surveillance,Drone' } ] }
The dataframe would look something like below, although with more columns:
search_term Position ASIN Categories Price Currency 1 memory cards 1 B08L26TYQ3 All Departments 29.99 USD 2 memory cards 2 B08N46XMPH All Departments 22.68 USD
I have already tried the answers from this question, but it did not work: Convert Python dict into a dataframe
Advertisement
Answer
json_normalize
is no longer imported frompandas.io.json
. It is now in the top-level namespace.- Update your pandas to the current version with pip or conda, depending on your environment.
- Most of the required information is in the
'search_results'
key, but'search_term'
is nested in'request_parameters'
, so that key must be set into alist
for themeta
parameter ofpandas.json_normalize
- The information in the
'prices'
column seems to overlap with existing data in other columns.- The column has been normalized below, but it could just be dropped, since there is no new information in it.
- Unneeded columns can be removed with
pandas.DataFrame.drop
, or usepandas.DataFrame.loc
to select only the needed columns, as shown below. - As per the timing analysis for this question,
df.join(pd.DataFrame(df.pop(col).values.tolist()))
is the fastest way to normalize a single leveldict
from a column and join it back to the main dataframe, but this answer shows how to deal with columns that are problematic (e.g. result in errors when trying.values.tolist()
).
import pandas as pd # object returned from api filename = api_result.json() # begin by normalizing filename main = pd.json_normalize(filename, record_path=['search_results'], meta=['request_parameters']) # request_paramets is a column of dicts, which must be converted to individual columns for each key x = main.join(pd.DataFrame(main.pop('request_parameters').values.tolist())) # categories and prices are lists of dicts, which must be exploded into separate rows x = x.apply(pd.Series.explode) # convert the dicts in categories and prices to separate columns for each key x = x.join(pd.DataFrame(x.pop('categories').values.tolist())) x = x.join(pd.DataFrame(x.pop('prices').values.tolist())) # display(x) asin image is_amazon_fresh is_prime is_whole_foods_market link position rating ratings_total sponsored title delivery.price.currency delivery.price.is_free delivery.price.raw delivery.price.symbol delivery.price.value delivery.tagline price.currency price.raw price.symbol price.value amazon_domain search_term type id name currency raw symbol value 0 B08L26TYQ3 https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08L26TYQ3 1 4.3 74 True Micro Center Premium 256GB SDXC Card Class 10 SD Flash Memory Card UHS-I C10 U3 V30 4K UHD Video R/W Speed up to 80 MB/s for Cameras Computers Trail Cams (256GB) USD True FREE Shipping by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $29.99 $ 29.99 amazon.com memory cards search search-alias=aps All Departments USD $29.99 $ 29.99 1 B08N46XMPH https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg False True False https://www.amazon.com/dp/B08N46XMPH 2 4.8 16 True 256GB Micro SD Memory Card SD Memory Card/TF Card Class 10 High Speed Card with Adapter for Camera, Phone, Computer, Dash Came, Surveillance,Drone USD True FREE Shipping on orders over $25 shipped by Amazon $ 0 Get it as soon as Tue, Feb 16 USD $22.68 $ 22.68 amazon.com memory cards search search-alias=aps All Departments USD $22.68 $ 22.68 # use loc to select only the desired columns and rename them as needed final = x.loc[:, ['asin', 'search_term', 'position', 'name', 'price.currency', 'price.value']].rename(columns={'name': 'Categories', 'price.currency': 'currency', 'price.value': 'price'}) # display(final) asin search_term position Categories currency price 0 B08L26TYQ3 memory cards 1 All Departments USD 29.99 1 B08N46XMPH memory cards 2 All Departments USD 22.68