Skip to content
Advertisement

BeautifulSoup Web Scraping to find values of a specific key within the result set

I am scraping a webpage using beautiful soup:

import requests
from bs4 import BeautifulSoup

r= requests.get("https://cooking.nytimes.com/recipes/1018849-classic-caprese-salad?action=click&module=Collection%20Page%20Recipe%20Card&region=46%20Ways%20to%20Do%20Justice%20to%20Summer%20Tomatoes&pgType=collection&rank=1")
c= r.content

soup= BeautifulSoup(c, "html.parser")
result= soup.find("script", {"type": "application/ld+json"})

print(type(result))

<class ‘bs4.element.Tag’> , 1

print(len(result))

0

Here is what ‘result’ looks like:

>>>print(result)

I am unable to access recipeIngredient (highlighted in the image) as a dictionary keys. It gives me a keyerror.

print(result['recipeIngredient'])

KeyError: ‘recipeIngredient’

How can I do this? I want to extract this from ‘result’:

“recipeIngredient”:[“1 pound fresh, best-quality mozzarella (preferably buffalo milk)”,”4 medium heirloom tomatoes”,”1 bunch fresh basil, leaves only, some reserved for garnish”,”Flaky sea salt, such as Maldon”,”Coarsely ground black pepper”,”High-quality extra-virgin olive oil”]

Advertisement

Answer

You would need to convert the data inside the script tag to json using json.loads. In order to get the data inside the script tag use .get_text method

import requests, json
from bs4 import BeautifulSoup

r= requests.get("https://cooking.nytimes.com/recipes/1018849-classic-caprese-salad?action=click&module=Collection%20Page%20Recipe%20Card&region=46%20Ways%20to%20Do%20Justice%20to%20Summer%20Tomatoes&pgType=collection&rank=1")
c= r.content

soup= BeautifulSoup(c, "html.parser")
result= soup.find("script", {"type": "application/ld+json"})
data = json.loads(result.get_text())

print(data["recipeIngredient"])

Output:

['1 pound fresh, best-quality mozzarella (preferably buffalo milk)', '4 medium heirloom tomatoes', '1 bunch fresh basil, leaves only, some reserved for garnish', 'Flaky sea salt, such as Maldon', 'Coarsely ground black pepper', 'High-quality extra-virgin olive oil']
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement