Beautiful Soup has problems with amazon,it

I’m trying to take the name and the prize from amazon page, this is the code:

import requests
from bs4 import BeautifulSoup

URL = "https://www.amazon.it/Nuovo-Apple-iPhone-SE-64GB/dp/B087616RMM/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone&qid=1597409499&s=electronics&sr=1-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNElPQVhUUTFTOTFOJmVuY3J5cHRlZElkPUEwMTA3ODI0M1RZVTc2MTdRM1A3SiZlbmNyeXB0ZWRBZElkPUEwODI4MDExMTdYMlhUOFlUTVY0TCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU="

URL2 = "https://www.amazon.it/intermittenze-della-morte-Jos%C3%A9-Saramago-ebook/dp/B019KBH3CC/ref=sr_1_1?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=saramago&qid=1597410061&sr=8-1"

headers = {"User-Agent": 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}



page = requests.get(URL2, headers=headers)
soup = BeautifulSoup(page.content, 'lxml')

title = soup.find(id='productTitle').get_text().strip()
price = soup.find(id='priceblock_ourprice').get_text().strip()

print(title)
print(price)

JavaScript
​x
 
import requests
from bs4 import BeautifulSoup
​
URL = "https://www.amazon.it/Nuovo-Apple-iPhone-SE-64GB/dp/B087616RMM/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone&qid=1597409499&s=electronics&sr=1-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNElPQVhUUTFTOTFOJmVuY3J5cHRlZElkPUEwMTA3ODI0M1RZVTc2MTdRM1A3SiZlbmNyeXB0ZWRBZElkPUEwODI4MDExMTdYMlhUOFlUTVY0TCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU="
​
URL2 = "https://www.amazon.it/intermittenze-della-morte-Jos%C3%A9-Saramago-ebook/dp/B019KBH3CC/ref=sr_1_1?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=saramago&qid=1597410061&sr=8-1"
​
headers = {"User-Agent": 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0'}
​
​
​
page = requests.get(URL2, headers=headers)
soup = BeautifulSoup(page.content, 'lxml')
​
title = soup.find(id='productTitle').get_text().strip()
price = soup.find(id='priceblock_ourprice').get_text().strip()
​
print(title)
print(price)
​

The problem is that with URL it works but with URL2 it doesn’t work. How can I fix it ?? Thanks :)

Answer

before getting text you have to check if you find required element and if so, you can extract text:

title = soup.find(id='productTitle')
if title:
    title = title.get_text().strip()

price = soup.find(id='priceblock_ourprice')
if price:
    price = price.get_text().strip()

JavaScript
 
title = soup.find(id='productTitle')
if title:
    title = title.get_text().strip()
​
price = soup.find(id='priceblock_ourprice')
if price:
    price = price.get_text().strip()
​

Please NOTE amazon has a few different page layouts, so if you want to make generic crawler you wil have to cover all of them

Advertisement

Answer