Skip to content
Advertisement

Amazon Web Scraping – retrieving price data

I’m currently working on my first project experimenting with web scraping on python. I am attempting to retrieve price data from an amazon url but am having some issues.

url = 'https://www.amazon.ca/Nintendo-SwitchTM-Neon-Blue-Joy-E2-80-91ConTM-dp-B0BFJWCYTL/dp/B0BFJWCYTL/ref=dp_ob_title_vg'

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36"}

page = requests.get(url, headers=headers)

soup1 = BeautifulSoup(page.content, "lxml")

soup2 = BeautifulSoup(soup1.prettify(), "lxml")

title = soup2.find(id='productTitle').get_text()

price = soup2.find(id='corePriceDisplay_desktop_feature_div').get_text()

When I print the price variable, my output is a bit weird:

$394.00
           


             $
            

             394
             
              .
             


             00

There’s alot of whitespace and the numbers are formatted in weird way with a alot of newline. How do I gather just the price so when I print it should just display $394.00?

I believe this can be solved with the span class but I could not figure it out.

Advertisement

Answer

As you can see below, searching for “corePriceDisplay_desktop_feature_div” is far to broad. Searching for span element with class=”a-offscreen” should fit your needs.

Try:

price = soup.find(“span”, {“class”: “a-offscreen”})

enter image description here

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement