I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues.
url = 'https://www.amazon.ca/Nintendo-SwitchTM-Neon-Blue-Joy-E2-80-91ConTM-dp-B0BFJWCYTL/dp/B0BFJWCYTL/ref=dp_ob_title_vg' headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"} page = requests.get(url, headers=headers) soup1 = BeautifulSoup(page.content, "lxml") soup2 = BeautifulSoup(soup1.prettify(), "lxml") title = soup2.find(id='productTitle').get_text() price = soup2.find(id='corePriceDisplay_desktop_feature_div').get_text()
When I print the price variable, my output is a bit weird:
$394.00 $ 394 . 00
There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do I gather just the price so when I print it should just display $394.00?
I believe this can be solved with the span class but I could not figure it out.
Advertisement
Answer
As you can see below, searching for “corePriceDisplay_desktop_feature_div” is far to broad. Searching for span element with class=”a-offscreen” should fit your needs.
Try:
price = soup.find(“span”, {“class”: “a-offscreen”})