I'm currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues. When I print the price variable, my output is a bit weird: There's alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do

Python

I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues.

url = 'https://www.amazon.ca/Nintendo-SwitchTM-Neon-Blue-Joy-E2-80-91ConTM-dp-B0BFJWCYTL/dp/B0BFJWCYTL/ref=dp_ob_title_vg'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}

page = requests.get(url, headers=headers)

soup1 = BeautifulSoup(page.content, "lxml")

soup2 = BeautifulSoup(soup1.prettify(), "lxml")

title = soup2.find(id='productTitle').get_text()

price = soup2.find(id='corePriceDisplay_desktop_feature_div').get_text()

JavaScript
​x
 
url = 'https://www.amazon.ca/Nintendo-SwitchTM-Neon-Blue-Joy-E2-80-91ConTM-dp-B0BFJWCYTL/dp/B0BFJWCYTL/ref=dp_ob_title_vg'
​
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}
​
page = requests.get(url, headers=headers)
​
soup1 = BeautifulSoup(page.content, "lxml")
​
soup2 = BeautifulSoup(soup1.prettify(), "lxml")
​
title = soup2.find(id='productTitle').get_text()
​
price = soup2.find(id='corePriceDisplay_desktop_feature_div').get_text()
​

When I print the price variable, my output is a bit weird:

There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do I gather just the price so when I print it should just display $394.00?

I believe this can be solved with the span class but I could not figure it out.

Answer

As you can see below, searching for “corePriceDisplay_desktop_feature_div” is far to broad. Searching for span element with class=”a-offscreen” should fit your needs.

Try:

price = soup.find(“span”, {“class”: “a-offscreen”})

Amazon Web Scraping – retrieving price data

Advertisement

Answer