Attempting to web scrape. Downloaded html code is slightly different from code on live site

I’m new to web scraping and I’m trying to build a very basic stock tracker for the site pokemoncenter.com. When visiting the product pages of items on the live site, the add to cart button displays as:

<button type="button" class="jsx-2748458255 product-add btn btn-secondary">Add to Cart</button>

JavaScript
​x
 
<button type="button" class="jsx-2748458255 product-add btn btn-secondary">Add to Cart</button>
​

When the item is out of stock the button is:

<button type="button" disabled="" class="jsx-2748458255 product-add btn btn-tertiary disabled">Out of Stock</button>

JavaScript
 
<button type="button" disabled="" class="jsx-2748458255 product-add btn btn-tertiary disabled">Out of Stock</button>
​

But whenever I try to scrape the site, regardless of whether the item is in stock or not, the button is:

<button class="jsx-2748458255 product-add btn btn-tertiary disabled" disabled="" type="button"></button>

JavaScript
 
<button class="jsx-2748458255 product-add btn btn-tertiary disabled" disabled="" type="button"></button>
​

So essentially it always displays as out of stock when I download the html code with requests.get().

import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request 
import requests
 
page_url = "https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in"

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}

req = requests.get(page_url, headers = headers)

page_soup = soup(req.text, "html.parser")

#Find add to cart button
divs = page_soup.findAll("div", {"class" : "jsx-829839431 product-col"})
button = str(divs[1].find("button", {"class" : "jsx-2748458255"}))


#Check if button is disabled or not
if (button.find('disabled') != -1): 
    print("Out of Stock")
else:
    print("In Stock")

JavaScript
 
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen, Request 
import requests
 
page_url = "https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in"
​
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
​
req = requests.get(page_url, headers = headers)
​
page_soup = soup(req.text, "html.parser")
​
#Find add to cart button
divs = page_soup.findAll("div", {"class" : "jsx-829839431 product-col"})
button = str(divs[1].find("button", {"class" : "jsx-2748458255"}))
​
​
#Check if button is disabled or not
if (button.find('disabled') != -1): 
    print("Out of Stock")
else:
    print("In Stock")
​

In stock example: https://www.pokemoncenter.com/product/701-00364/primal-groudon-poke-plush-17-3-4-in
Out of stock example: https://www.pokemoncenter.com/product/701-06558/gigantamax-pikachu-poke-plush-17-in

Answer

As goalie1998 mentioned, the site could be using javascript to only load necessary images first to reduce initial load time. You could probably still use Selenium to scrape that website since it can imitate browser behavior.

Advertisement

Answer