So I’m trying to web scrape search results from Sportchek with BS4, specifically this link “https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1”. I want to get the prices off of the shoes here and put them all into a system to sort it, however, to do this I need to get the prices first and I cannot find a way to do that. In the HTML, the class is product-price-text
but I can’t glean anything off of it. At this point, getting even the price of only 1 shoe would be fine. I just need help on scraping anything class-related on BS4 because none of it works. I’ve tried
print(soup.find_all("span", class_="product-price-text"))
and even that won’t work so please help.
Advertisement
Answer
The data is loaded dynamically via JavaScript. You can use the requests
module to load it:
import json import requests url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&lastVisibleProductNumber=12&x1=ast-id-level-3&q1=men%3A%3Ashoes-footwear%3A%3Abasketball&preselectedCategoriesNumber=3&preselectedBrandsNumber=0&count=24" headers = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0", } data = requests.get(url, headers=headers).json() # uncomment this to print all data: # print(json.dumps(data, indent=4)) for p in data["products"]: print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))
Prints:
332799300 83.97 Nike Unisex KD Trey 5 VII TB Basketball Shoes - Black/White/Volt - Black 333323940 180.0 Nike Men's Air Jordan 1 Zoom Air Comfort Basketball Shoes - Black/Chile Red-white-university Gold 333107663 134.99 Nike Men's Mamba Fury Basketball Shoes - Black/Smoke Grey/White 333003748 134.99 Nike Men's Lebron Witness IV Basketball Shoes - White 333003606 104.99 Nike Men's Kyrie Flytrap III Basketball Shoes - Black/Uni Red/Bright Crimson 333003543 94.99 Nike Men's Precision III Basketball Shoes - Black/White 333107554 94.99 Nike Men's Precision IV Basketball Shoes - Black/Mtlc Gold/Dk Smoke Grey 333107404 215.0 Nike Men's LeBron XVII Low Basketball Shoes - Black/White/Multicolor 333107617 119.99 Nike Men's KD Trey 5 VIII Basketball Shoes - Black/White-aurora Green/Smoke Grey 333166326 125.98 Nike Men's KD13 Basketball Shoes - Black/White-wolf Grey 333166731 138.98 Nike Men's LeBron XVII Low Basketball Shoes - Particle Grey/White-lt Smoke Grey-black 333183810 129.99 adidas Men's D.O.N 2 Basketball Shoes - Gold/Black/Gold 333206770 111.97 Under Armour Men's Embid Basketball Shoes - Red/White 333181809 165.0 Nike Men's Air Jordan React Elevation Basketball Shoes - Black/White-lt Smoke Grey-volt 333307276 104.99 adidas Men's Harden Stepback 2 Basketball Shoes - White/Blackwhite/Black 333017256 89.99 Under Armour Men's Jet Mid Sneaker - Black/Halo Grey 332912833 134.99 Nike Men's Zoom LeBron Witness IV Running Shoes - Black/Gym Red/University Red 332799162 79.88 Under Armour Men's Curry 7 "Quiet Eye" Basketball Shoes - Black - Black 333276525 119.99 Nike Men's Kyrie Flytrap IV Basketball Shoes - Black/White-metallic Silver 333106290 145.97 Nike Men's KD13 Basketball Shoes - Black/White/Wolf Grey 333181345 144.99 Nike Men's PG 4 TB Basketball Shoes - Black/White-pure Platinum 333241817 149.99 PUMA Men's Clyde All-Pro Basketball Shoes - Puma White/Blue Atolpuma White/Blue Atol 333186052 77.97 adidas Men's Harden Stepback Basketball Shoes - Black/Gold/White 333316063 245.0 Nike Men's Air Jordan 13 Retro Basketball Shoes - White/Blackwhite/Starfish-black
EDIT: To extract the API Url:
import re import json import requests # your URL: url = "https://www.sportchek.ca/categories/men/footwear/basketball-shoes.html?page=1" api_url = "https://www.sportchek.ca/services/sportchek/search-and-promote/products?page=1&x1=ast-id-level-3&q1={cat}&count=24" headers = { "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0", } html_text = requests.get(url, headers=headers).text cat = re.search(r"br_data.cat_id='(.*?)';", html_text).group(1) data = requests.get(api_url.format(cat=cat), headers=headers).json() # uncomment this to print all data: # print(json.dumps(data, indent=4)) for p in data["products"]: print("{:<10} {:<10} {}".format(p["code"], p["price"], p["title"]))