Scraping search results off of Sportchek with Beautiful Soup 4 to find prices

Tags: , , , ,



So I’m trying to web scrape search results from Sportchek with BS4, specifically this link “https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1”. I want to get the prices off of the shoes here and put them all into a system to sort it, however, to do this I need to get the prices first and I cannot find a way to do that. In the HTML, the class is product-price-text but I can’t glean anything off of it. At this point, getting even the price of only 1 shoe would be fine. I just need help on scraping anything class-related on BS4 because none of it works. I’ve tried

print(soup.find_all("span", class_="product-price-text"))

and even that won’t work so please help.

Answer

The data is loaded dynamically via JavaScript. You can use the requests module to load it:

import json
import requests

url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&lastVisibleProductNumber=12&x1=ast-id-level-3&q1=men%3A%3Ashoes-footwear%3A%3Abasketball&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&count=24"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}

data = requests.get(url, headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for p in data["products"]:
    print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))

Prints:

332799300  83.97      Nike Unisex KD Trey 5 VII TB Basketball Shoes - Black/White/Volt - Black
333323940  180.0      Nike Men's Air Jordan 1 Zoom Air Comfort Basketball Shoes - Black/Chile Red-white-university Gold
333107663  134.99     Nike Men's Mamba Fury Basketball Shoes - Black/Smoke Grey/White
333003748  134.99     Nike Men's Lebron Witness IV Basketball Shoes - White
333003606  104.99     Nike Men's Kyrie Flytrap III Basketball Shoes - Black/Uni Red/Bright Crimson
333003543  94.99      Nike Men's Precision III Basketball Shoes - Black/White
333107554  94.99      Nike Men's Precision IV Basketball Shoes - Black/Mtlc Gold/Dk Smoke Grey
333107404  215.0      Nike Men's LeBron XVII Low Basketball Shoes - Black/White/Multicolor
333107617  119.99     Nike Men's KD Trey 5 VIII Basketball Shoes - Black/White-aurora Green/Smoke Grey
333166326  125.98     Nike Men's KD13 Basketball Shoes - Black/White-wolf Grey
333166731  138.98     Nike Men's LeBron XVII Low Basketball Shoes - Particle Grey/White-lt Smoke Grey-black
333183810  129.99     adidas Men's D.O.N 2 Basketball Shoes - Gold/Black/Gold
333206770  111.97     Under Armour Men's Embid Basketball Shoes - Red/White
333181809  165.0      Nike Men's Air Jordan React Elevation Basketball Shoes - Black/White-lt Smoke Grey-volt
333307276  104.99     adidas Men's Harden Stepback 2 Basketball Shoes - White/Blackwhite/Black
333017256  89.99      Under Armour Men's Jet Mid Sneaker - Black/Halo Grey
332912833  134.99     Nike Men's Zoom LeBron Witness IV Running Shoes - Black/Gym Red/University Red
332799162  79.88      Under Armour Men's Curry 7 "Quiet Eye" Basketball Shoes - Black - Black
333276525  119.99     Nike Men's Kyrie Flytrap IV Basketball Shoes - Black/White-metallic Silver
333106290  145.97     Nike Men's KD13 Basketball Shoes - Black/White/Wolf Grey
333181345  144.99     Nike Men's PG 4 TB Basketball Shoes - Black/White-pure Platinum
333241817  149.99     PUMA Men's Clyde All-Pro Basketball Shoes - Puma White/Blue Atolpuma White/Blue Atol
333186052  77.97      adidas Men's Harden Stepback Basketball Shoes - Black/Gold/White
333316063  245.0      Nike Men's Air Jordan 13 Retro Basketball Shoes - White/Blackwhite/Starfish-black

EDIT: To extract the API Url:

import re
import json
import requests

# your URL:
url = "https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1"

api_url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&x1=ast-id-level-3&q1={cat}&count=24"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}
html_text = requests.get(url, headers=headers).text
cat = re.search(r"br_data.cat_id='(.*?)';", html_text).group(1)

data = requests.get(api_url.format(cat=cat), headers=headers).json()

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for p in data["products"]:
    print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))


Source: stackoverflow