Skip to content
Advertisement

web scraping amazon reviews precents bs4

So I’m Trying To Get The Review Precent Of Each Amount of Stars In an Amazon Product Page.

This Is The Output I want To Get:

Awesome Feedback: 72%
Good Feedback: 15%
Regular Feedback: 7%
Bad Feedback: 3%
Awful Feedback: 4%

And So Far This Is The Output I Got:

Awesome Feedback: 72%
Traceback (most recent call last):
  File "c:UsersNanaDesktopstuffPythonWeb ScrapingAmazon Smart 
BuyeramazonR.py", line 34, in <module>    bot()
  File "c:UsersNanaDesktopstuffPythonWeb ScrapingAmazon Smart 
BuyeramazonR.py", line 14, in __init__    self.r()
  File "c:UsersNanaDesktopstuffPythonWeb ScrapingAmazon Smart 
BuyeramazonR.py", line 26, in r       
  print(f'Good Feedback: {self.pd[1]}')
IndexError: list index out of range

As You see, I Have Managed To Get The Awesome Feedback Working But Not The Other Ones… The problem is that I got all the precentages in isolated list and every precntage has his one list. As you see here:

['72%'],  ['15%'], ['7%'], ['3%'], ['4%']

I’m quite straggling with it… If there is a way to access all of the indexes of the for loop and merge them all into one list, please share it with me… here is my code:

from bs4 import BeautifulSoup
from selenium import webdriver




class bot:
    def __init__(self):
        self.path = 'C:/Users/Nana/Desktop/stuff/Python/Web Scraping/chromedriver.exe'
        self.browser = webdriver.Chrome(self.path)
        self.browser.get('https://www.amazon.com/%D7%9E%D7%A7%D7%9C%D7%93%D7%AA-%D7%9E%D7%95%D7%90%D7%A8%D7%AA-%D7%91%D7%A6%D7%91%D7%A2%D7%99-%D7%95%D7%A2%D7%9B%D7%91%D7%A8-%D7%9C%D7%92%D7%99%D7%99%D7%9E%D7%99%D7%A0%D7%92/dp/B016Y2BVKA/ref=sr_1_1_sspa?dchild=1&keywords=keyboard&qid=1633809059&sr=8-1-spons&psc=1&smid=A3TJEO884AOUB3&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUFCRTg0S1dWNjRTQUMmZW5jcnlwdGVkSWQ9QTA2NTEwNzgzNFdKSVA5NEpQODRQJmVuY3J5cHRlZEFkSWQ9QTAwMjcwNDExUFJOUjA4U0pEWDlRJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ==')
        self.r()


    def r(self):
        self.soup = BeautifulSoup(self.browser.page_source, 'lxml')
        self.div5 = self.soup.find('div', id = 'reviewsMedley')
        self.tbody = self.div5.find('tbody')
        self.trs = self.tbody.find_all('tr')
        for self.tr in self.trs:
            self.precents = self.tr.find('td', class_ = 'a-text-right a-nowrap')
            self.pd = [self.precents.text.strip()]
            print(f'Awesome Feedback: {self.pd[0]}')
            print(f'Good Feedback: {self.pd[1]}')
            print(f'Regular Feedback: {self.pd[2]}')
            print(f'Bad Feedback: {self.pd[3]}')
            print(f'Awful Feedback: {self.pd[4]}')



        
bot()

Advertisement

Answer

There are two Options #1 define pd as empty list outside the loop, append each result of iteration and also print outside the loop or do the following:

Example

def r(self):
    self.soup = BeautifulSoup(self.browser.page_source, 'lxml')
    self.pd = [x.text.strip() for x in self.soup.select('div#reviewsMedley tr td.a-text-right.a-nowrap')]
    print(f'Awesome Feedback: {self.pd[0]}')
    print(f'Good Feedback: {self.pd[1]}')
    print(f'Regular Feedback: {self.pd[2]}')
    print(f'Bad Feedback: {self.pd[3]}')
    print(f'Awful Feedback: {self.pd[4]}')

Output:

Awesome Feedback: 72%
Good Feedback: 15%
Regular Feedback: 7%
Bad Feedback: 3%
Awful Feedback: 4%
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement