Skip to content
Advertisement

Converting dictionary into dataframe

Hello i am trying to convert a dictionary into a dataframe, containing results from a search on amazon (I am using an API.). I would like each product to be a row in the dataframe with the keys as column headers. However there is some keys in the beginning, that i am not interested in having in the table.

Below am i converting the JSON into a dictionary, which i would like to convert it into a dataframe.

from pandas.io.json import json_normalize

filename = api_result.json()

def convert_json_to_dict(filename):
    with open(filename) as JSON:
        json_dict = json.load(JSON)
    return json_dict


def convert_dict_to_df(filename):
    return pd.json_normalize(convert_json_to_dict(filename))

Here is part the data in the dictionary (2 out of 25 products total).

filename = {
    'request_info': {
        'credits_remaining': 72,
        'credits_used': 28,
        'credits_used_this_request': 1,
        'success': True
    },
    'request_metadata': {
        'amazon_url': 'https://www.amazon.com/s/?k=memory+cards&ref=nb_sb_noss_2',
        'created_at': '2021-02-14T15:09:04.802Z',
        'processed_at': '2021-02-14T15:09:11.003Z',
        'timing': ['global_init - 0ms (total 0ms)',
            'auth_apikey - 35ms (total 35ms)',
            'auth_retrieve_plan - 20ms (total 56ms)',
            'auth_retrieve_credit_usage - 22ms (total '
            '79ms)',
            'processing_invoking_worker - 31ms (total '
            '111ms)',
            'processing_execution_complete - 6202ms '
            '(total 6313ms)',
            'auth_credit_usage_reconcile - 81ms (total '
            '6394ms)',
            'global_end - 0ms (total 6394ms)'],
        'total_time_taken': 6.2
    },
    'request_parameters': {
        'amazon_domain': 'amazon.com',
        'search_term': 'memory cards',
        'type': 'search'
    },
    'search_results': [{
            'asin': 'B08L26TYQ3',
            'categories': [{
                    'id': 'search-alias=aps',
                    'name': 'All Departments'
                }
            ],
            'delivery': {
                'price': {
                    'currency': 'USD',
                    'is_free': True,
                    'raw': 'FREE Shipping by Amazon',
                    'symbol': '$',
                    'value': 0
                },
                'tagline': 'Get it as soon as Tue, Feb 16'
            },
            'image': 'https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg',
            'is_amazon_fresh': False,
            'is_prime': True,
            'is_whole_foods_market': False,
            'link': 'https://www.amazon.com/dp/B08L26TYQ3',
            'position': 1,
            'price': {
                'currency': 'USD',
                'raw': '$29.99',
                'symbol': '$',
                'value': 29.99
            },
            'prices': [{
                    'currency': 'USD',
                    'raw': '$29.99',
                    'symbol': '$',
                    'value': 29.99
                }
            ],
            'rating': 4.3,
            'ratings_total': 74,
            'sponsored': True,
            'title': 'Micro Center Premium 256GB SDXC Card Class 10 '
            'SD Flash Memory Card UHS-I C10 U3 V30 4K UHD '
            'Video R/W Speed up to 80 MB/s for Cameras '
            'Computers Trail Cams (256GB)'
        }, {
            'asin': 'B08N46XMPH',
            'categories': [{
                    'id': 'search-alias=aps',
                    'name': 'All Departments'
                }
            ],
            'delivery': {
                'price': {
                    'currency': 'USD',
                    'is_free': True,
                    'raw': 'FREE Shipping on orders '
                    'over $25 shipped by Amazon',
                    'symbol': '$',
                    'value': 0
                },
                'tagline': 'Get it as soon as Tue, Feb 16'
            },
            'image': 'https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg',
            'is_amazon_fresh': False,
            'is_prime': True,
            'is_whole_foods_market': False,
            'link': 'https://www.amazon.com/dp/B08N46XMPH',
            'position': 2,
            'price': {
                'currency': 'USD',
                'raw': '$22.68',
                'symbol': '$',
                'value': 22.68
            },
            'prices': [{
                    'currency': 'USD',
                    'raw': '$22.68',
                    'symbol': '$',
                    'value': 22.68
                }
            ],
            'rating': 4.8,
            'ratings_total': 16,
            'sponsored': True,
            'title': '256GB Micro SD Memory Card SD Memory Card/TF '
            'Card Class 10 High Speed Card with Adapter for '
            'Camera, Phone, Computer, Dash Came, '
            'Surveillance,Drone'
        }
    ]
}

The dataframe would look something like below, although with more columns:

    search_term   Position   ASIN           Categories          Price   Currency
1   memory cards  1          B08L26TYQ3     All Departments     29.99   USD
2   memory cards  2          B08N46XMPH     All Departments     22.68   USD

I have already tried the answers from this question, but it did not work: Convert Python dict into a dataframe

Advertisement

Answer

  • json_normalize is no longer imported from pandas.io.json. It is now in the top-level namespace.
    • Update your pandas to the current version with pip or conda, depending on your environment.
  • Most of the required information is in the 'search_results' key, but 'search_term' is nested in 'request_parameters', so that key must be set into a list for the meta parameter of pandas.json_normalize
  • The information in the 'prices' column seems to overlap with existing data in other columns.
    • The column has been normalized below, but it could just be dropped, since there is no new information in it.
  • Unneeded columns can be removed with pandas.DataFrame.drop, or use pandas.DataFrame.loc to select only the needed columns, as shown below.
  • As per the timing analysis for this question, df.join(pd.DataFrame(df.pop(col).values.tolist())) is the fastest way to normalize a single level dict from a column and join it back to the main dataframe, but this answer shows how to deal with columns that are problematic (e.g. result in errors when trying .values.tolist()).
import pandas as pd

# object returned from api
filename = api_result.json()

# begin by normalizing filename
main = pd.json_normalize(filename, record_path=['search_results'], meta=['request_parameters'])

# request_paramets is a column of dicts, which must be converted to individual columns for each key
x = main.join(pd.DataFrame(main.pop('request_parameters').values.tolist()))

# categories and prices are lists of dicts, which must be exploded into separate rows
x = x.apply(pd.Series.explode)

# convert the dicts in categories and prices to separate columns for each key
x = x.join(pd.DataFrame(x.pop('categories').values.tolist()))
x = x.join(pd.DataFrame(x.pop('prices').values.tolist()))

# display(x)
         asin                                                           image  is_amazon_fresh  is_prime  is_whole_foods_market                                  link  position  rating  ratings_total  sponsored                                                                                                                                                              title delivery.price.currency  delivery.price.is_free                                  delivery.price.raw delivery.price.symbol  delivery.price.value               delivery.tagline price.currency price.raw price.symbol  price.value amazon_domain   search_term    type                id             name currency     raw symbol  value
0  B08L26TYQ3  https://m.media-amazon.com/images/I/71z86CNVZ3L._AC_UY218_.jpg            False      True                  False  https://www.amazon.com/dp/B08L26TYQ3         1     4.3             74       True  Micro Center Premium 256GB SDXC Card Class 10 SD Flash Memory Card UHS-I C10 U3 V30 4K UHD Video R/W Speed up to 80 MB/s for Cameras Computers Trail Cams (256GB)                     USD                    True                             FREE Shipping by Amazon                     $                     0  Get it as soon as Tue, Feb 16            USD    $29.99            $        29.99    amazon.com  memory cards  search  search-alias=aps  All Departments      USD  $29.99      $  29.99
1  B08N46XMPH  https://m.media-amazon.com/images/I/51AP3QhINtL._AC_UY218_.jpg            False      True                  False  https://www.amazon.com/dp/B08N46XMPH         2     4.8             16       True                 256GB Micro SD Memory Card SD Memory Card/TF Card Class 10 High Speed Card with Adapter for Camera, Phone, Computer, Dash Came, Surveillance,Drone                     USD                    True  FREE Shipping on orders over $25 shipped by Amazon                     $                     0  Get it as soon as Tue, Feb 16            USD    $22.68            $        22.68    amazon.com  memory cards  search  search-alias=aps  All Departments      USD  $22.68      $  22.68

# use loc to select only the desired columns and rename them as needed
final = x.loc[:, ['asin', 'search_term', 'position', 'name', 'price.currency', 'price.value']].rename(columns={'name': 'Categories', 'price.currency': 'currency', 'price.value': 'price'})

# display(final)
         asin   search_term  position       Categories currency  price
0  B08L26TYQ3  memory cards         1  All Departments      USD  29.99
1  B08N46XMPH  memory cards         2  All Departments      USD  22.68
Advertisement