I am trying to scrape amazon reviews for a certain product, but I am unable to locate the text for the ratings using selenium. But the same thing is easily scraped using soup.
Here is my code using Soup:
JavaScript
x
6
1
link='same link as mentioned above'
2
url=requests.get(link).content
3
bs=soup(url,'html.parser')
4
for i in bs.find_all('span',{'class':'a-icon-alt'}):
5
print(i.text.split(' ')[0])
6
##Output 4.3 5.0 1.0 5.0 2.0 4.0 1.0 5.0 5.0 5.0 5.0 5.0 5.0
Here is my code using Selenium:
JavaScript
1
11
11
1
import time
2
from selenium import webdriver
3
from bs4 import BeautifulSoup as soup
4
import requests
5
6
link='link to the above mentioned page'
7
driver=webdriver.Chrome()
8
driver.get(link)
9
for i in driver.find_elements_by_css_selector('.a-icon-alt'):
10
print(i.text)
11
I am unable to get the same results with Selenium, all I get are blanks equivalent to the number of items present on that page. I have also tried using XPath and class_name but didn’t get the required response.
Advertisement
Answer
To get the review ratings Induce WebDriverWait
and wait for presence_of_all_elements_located
() and use get_attribute("innerHTML")
instead of text
Code:
JavaScript
1
12
12
1
from selenium import webdriver
2
from selenium.webdriver.support.ui import WebDriverWait
3
from selenium.webdriver.support import expected_conditions as EC
4
from selenium.webdriver.common.by import By
5
6
link='https://www.amazon.in/BenQ-inch-Bezel-Monitor-Built/product-reviews/B073NTCT4R/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=39'
7
driver=webdriver.Chrome()
8
driver.get(link)
9
elements=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".a-icon-alt")))
10
for i in elements:
11
print(i.get_attribute("innerHTML").split(' ')[0])
12
Output on console :
JavaScript
1
14
14
1
4.3
2
5.0
3
1.0
4
5.0
5
2.0
6
4.0
7
1.0
8
5.0
9
5.0
10
5.0
11
5.0
12
5.0
13
5.0
14