I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues.
JavaScript
x
14
14
1
url = 'https://www.amazon.ca/Nintendo-SwitchTM-Neon-Blue-Joy-E2-80-91ConTM-dp-B0BFJWCYTL/dp/B0BFJWCYTL/ref=dp_ob_title_vg'
2
3
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}
4
5
page = requests.get(url, headers=headers)
6
7
soup1 = BeautifulSoup(page.content, "lxml")
8
9
soup2 = BeautifulSoup(soup1.prettify(), "lxml")
10
11
title = soup2.find(id='productTitle').get_text()
12
13
price = soup2.find(id='corePriceDisplay_desktop_feature_div').get_text()
14
When I print the price variable, my output is a bit weird:
JavaScript
1
15
15
1
$394.00
2
3
4
5
$
6
7
8
394
9
10
.
11
12
13
14
00
15
There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do I gather just the price so when I print it should just display $394.00?
I believe this can be solved with the span class but I could not figure it out.
Advertisement
Answer
As you can see below, searching for “corePriceDisplay_desktop_feature_div” is far to broad. Searching for span element with class=”a-offscreen” should fit your needs.
Try:
price = soup.find(“span”, {“class”: “a-offscreen”})