I have the below code that works fully up until I set x=37
. At this point, I receive the error
TypeError: ‘NoneType’ object is not subscriptable on the variable t[“vintage”][“wine”][“region”][“country”][“name”].
I have added another variable that the same issue happens on almost everytime, so you may find the error there instead.
I think this is because one of the 25 results on that page does not have a country name assigned to it and therefore the variable is giving an error.
I think I need to add an exception to each variable to handle where this is the case. I have seen examples of adding these except, they seem to be at the level of the request not finding a legitimate page rather than one of the variables and I can’t find guidance to add them at the variable level.
# Import packages import requests import json import pandas as pd import time x=37 # Get request from the Vivino website r = requests.get( "https://www.vivino.com/api/explore/explore", params={ #"country_code": "FR", #"country_codes[]":"pt", "currency_code":"GBP", "grape_filter":"varietal", "min_rating":"1", "order_by":"price", "order":"asc", "page": x, "price_range_max":"100", "price_range_min":"25", "wine_type_ids[]":"1" }, headers= { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0" }, ) # Variables to scrape from the Vivino website results = [ ( t["vintage"]["wine"]["winery"]["name"], t["vintage"]["year"], t["vintage"]["wine"]["id"], t["vintage"]["wine"]["name"], t["vintage"]["statistics"]["ratings_average"], t["prices"][0]["amount"], t["vintage"]["wine"]["region"]["country"]["name"], t["vintage"]["wine"]["region"]["country"]["code"], t["vintage"]["wine"]["region"]["name"], t["vintage"]["wine"]["style"]["name"] ) for t in r.json()["explore_vintage"]["matches"] ] # Saving the results in a dataframe dataframe = pd.DataFrame( results, columns=["Winery", "Vintage", "Wine ID", "Wine", "Rating", "Price", "Country", "CountryCode", "Region", "Style"] ) #output the dataframe df_out = dataframe df_out.to_csv("data.csv", index=False) print("Complete -",x,"iterations")
Advertisement
Answer
The problem is that some keys are randomly missing (notated with None) in a deeply nested dictionary. A sample dictionary demonstrating the struggle:
data = [ {'k1': {'k2': {'k3': 'value_i_want'}}}, {'k1': {'k2': None}}, {'k1': {'k2': {'k3': 'value_i_want'}}}, ]
When you assume the key k3
certainly exists in each dictionary in an array, it does not. Hence when you try to do something like
result = [t['k1']['k2']['k3'] for t in data]
You get TypeError: 'NoneType' object is not subscriptable
.
TypeError
arises when t['k1']['k2']
evaluates to None
in the second iteration under the for-loop, and you attempt to look for a key in it. You are basically asking the program to execute None['k3']
, which explains the error message you’ve got.
To sovle this issue (which is very common in returned data from API requests), you will need to try-catch the block. You may find this helper function useful:
def try_to_get(d: dict, *args, default=None): try: for k in args: d = d[k] return d except (KeyError, TypeError) as _: print(f'Cannot find the key {args}') return default
Using the helper function, we can write try_to_get(t, 'k1, 'k2', 'k3)
. While a non-problematic dictionary would traverse down the nests and get the value you want, a problematic one will trigger the Exception block and return a deafult value when there is an error (here, the default value is None).
You can try to replace the list comprehension part in your code with this:
results = [ ( try_to_get(t, "vintage", "wine", "winery", "name"), try_to_get(t, "vintage", "year"), try_to_get(t, "vintage", "wine", "id"), try_to_get(t, "vintage", "wine", "name"), try_to_get(t, "vintage", "statistics", "ratings_average"), try_to_get(t, "prices", 0, "amount"), try_to_get(t, "vintage", "wine", "region", "country", "name"), try_to_get(t, "vintage", "wine", "region", "country", "code"), try_to_get(t, "vintage", "wine", "region", "name"), try_to_get(t, "vintage", "wine", "style", "name"), ) for t in r.json()["explore_vintage"]["matches"] ]