So I’m trying to web scrape search results from Sportchek with BS4, specifically this link “https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1”. I want to get the prices off of the shoes here and put them all into a system to sort it, however, to do this I need to get the prices first and I cannot find a way to do that. In the HTML, the class is product-price-text
but I can’t glean anything off of it. At this point, getting even the price of only 1 shoe would be fine. I just need help on scraping anything class-related on BS4 because none of it works. I’ve tried
print(soup.find_all("span", class_="product-price-text"))
and even that won’t work so please help.
Advertisement
Answer
The data is loaded dynamically via JavaScript. You can use the requests
module to load it:
import json
import requests
url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&lastVisibleProductNumber=12&x1=ast-id-level-3&q1=men%3A%3Ashoes-footwear%3A%3Abasketball&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&count=24"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}
data = requests.get(url, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for p in data["products"]:
print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))
Prints:
332799300 83.97 Nike Unisex KD Trey 5 VII TB Basketball Shoes - Black/White/Volt - Black
333323940 180.0 Nike Men's Air Jordan 1 Zoom Air Comfort Basketball Shoes - Black/Chile Red-white-university Gold
333107663 134.99 Nike Men's Mamba Fury Basketball Shoes - Black/Smoke Grey/White
333003748 134.99 Nike Men's Lebron Witness IV Basketball Shoes - White
333003606 104.99 Nike Men's Kyrie Flytrap III Basketball Shoes - Black/Uni Red/Bright Crimson
333003543 94.99 Nike Men's Precision III Basketball Shoes - Black/White
333107554 94.99 Nike Men's Precision IV Basketball Shoes - Black/Mtlc Gold/Dk Smoke Grey
333107404 215.0 Nike Men's LeBron XVII Low Basketball Shoes - Black/White/Multicolor
333107617 119.99 Nike Men's KD Trey 5 VIII Basketball Shoes - Black/White-aurora Green/Smoke Grey
333166326 125.98 Nike Men's KD13 Basketball Shoes - Black/White-wolf Grey
333166731 138.98 Nike Men's LeBron XVII Low Basketball Shoes - Particle Grey/White-lt Smoke Grey-black
333183810 129.99 adidas Men's D.O.N 2 Basketball Shoes - Gold/Black/Gold
333206770 111.97 Under Armour Men's Embid Basketball Shoes - Red/White
333181809 165.0 Nike Men's Air Jordan React Elevation Basketball Shoes - Black/White-lt Smoke Grey-volt
333307276 104.99 adidas Men's Harden Stepback 2 Basketball Shoes - White/Blackwhite/Black
333017256 89.99 Under Armour Men's Jet Mid Sneaker - Black/Halo Grey
332912833 134.99 Nike Men's Zoom LeBron Witness IV Running Shoes - Black/Gym Red/University Red
332799162 79.88 Under Armour Men's Curry 7 "Quiet Eye" Basketball Shoes - Black - Black
333276525 119.99 Nike Men's Kyrie Flytrap IV Basketball Shoes - Black/White-metallic Silver
333106290 145.97 Nike Men's KD13 Basketball Shoes - Black/White/Wolf Grey
333181345 144.99 Nike Men's PG 4 TB Basketball Shoes - Black/White-pure Platinum
333241817 149.99 PUMA Men's Clyde All-Pro Basketball Shoes - Puma White/Blue Atolpuma White/Blue Atol
333186052 77.97 adidas Men's Harden Stepback Basketball Shoes - Black/Gold/White
333316063 245.0 Nike Men's Air Jordan 13 Retro Basketball Shoes - White/Blackwhite/Starfish-black
EDIT: To extract the API Url:
import re
import json
import requests
# your URL:
url = "https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1"
api_url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&x1=ast-id-level-3&q1={cat}&count=24"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0",
}
html_text = requests.get(url, headers=headers).text
cat = re.search(r"br_data.cat_id='(.*?)';", html_text).group(1)
data = requests.get(api_url.format(cat=cat), headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for p in data["products"]:
print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))