The program is running good but they will scrape only one TITLE
I want they scrape all the title in the page These is the page link https://www.eurobike.com/en/index-exhibitors/exhibitors/?
JavaScript
x
41
41
1
import time
2
from selenium import webdriver
3
from selenium.webdriver.support import expected_conditions as EC
4
from selenium.webdriver.chrome.service import Service
5
from selenium.webdriver.common.by import By
6
from selenium.webdriver.support.wait import WebDriverWait
7
from webdriver_manager.chrome import ChromeDriverManager
8
9
options = webdriver.ChromeOptions()
10
options.add_argument("--headless")
11
options.add_argument("--no-sandbox")
12
options.add_argument("--disable-gpu")
13
options.add_argument("--window-size=1920x1080")
14
options.add_argument("--disable-extensions")
15
16
chrome_driver = webdriver.Chrome(
17
service=Service(ChromeDriverManager().install()),
18
options=options
19
)
20
21
def supplyvan_scraper():
22
with chrome_driver as driver:
23
driver.implicitly_wait(15)
24
URL = 'https://www.eurobike.com/en/index-exhibitors/exhibitors/?'
25
driver.get(URL)
26
time.sleep(3)
27
28
# opt #1 visit first link, print the title uncomment to see
29
# click the single link
30
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.card-exhibitor"))).click()
31
time.sleep(2)
32
33
# parse the h1 tag text
34
title = driver.find_element(By.CSS_SELECTOR, 'h1.underlined').text
35
print(title)
36
37
driver.quit()
38
39
40
supplyvan_scraper()
41
Advertisement
Answer
The website is populated completely by complex JavaScript.First of all, to display listing from this url,accepting the cookies is a must but to accept and click on the cookie button isn’t a easy task because cookies button is under shadow root (open)
selenium and webdriverWait can do nothing on shadow root,so to execute shadow root you need to apply JavaScript querySelector
.
Full Working code:
JavaScript
1
34
34
1
import time
2
from selenium import webdriver
3
from selenium.webdriver.support import expected_conditions as EC
4
from selenium.webdriver.chrome.service import Service
5
from selenium.webdriver.common.by import By
6
from selenium.webdriver.support.wait import WebDriverWait
7
from webdriver_manager.chrome import ChromeDriverManager
8
from bs4 import BeautifulSoup
9
10
11
options = webdriver.ChromeOptions()
12
options.add_argument("start-maximized")
13
#chrome to stay open to see what's happening in the real word or make it comment to close
14
options.add_experimental_option("detach", True)
15
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)
16
URL ='https://www.eurobike.com/en/index-exhibitors/exhibitors/?'
17
driver.get(URL)
18
time.sleep(5)
19
#To execute shadow root and accept cookies
20
driver.execute_script('''return document.querySelector('div#usercentrics-root').shadowRoot.querySelector('button[data-testid="uc-accept-all-button"]')''').click()
21
#Grabbing all listing url and iterate,append and new deriver request
22
links=[]
23
for card in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.list__results > div > div > a'))):
24
25
link=card.get_attribute('href')
26
links.append(link)
27
for u in links:
28
driver.get(u)
29
time.sleep(5)
30
#extracting desired data using bs4 to avoid much uses of selenium because of it's complexity and time killing
31
soup = BeautifulSoup(driver.page_source,'lxml')
32
title=soup.select_one('h1.underlined').get_text(strip=True)
33
print(title)
34
Output:
JavaScript
1
39
39
1
ANGLE is
2
A&C Solutions
3
A&J International Co.,Ltd
4
(Taiwan Branch)
5
A-Pro Tech Co., LTD
6
A-Rim Ent. Co., Ltd.
7
Abbey Bike Tools
8
ABIMOTA
9
Associacao Nacional das Industrias
10
de Duas Rodas, Ferragens, Mobiliári
11
ABIMOTA
12
Associacao Nacional das Industrias
13
de Duas Rodas, Ferragens, Mobiliári
14
ABUS |August Bremicker Söhne KG
15
ABUS |August Bremicker Söhne KG
16
Accelerated Systems Inc. (ASI)
17
ACCORD ENTERPRISE CORP.
18
Acer Gadget Inc.
19
Acetrikes Industrial Co., Ltd.
20
ACT LAB LLC
21
ACTIA
22
Action Sports SRL
23
Activent 365 s.r.o.
24
ADAC e.V.
25
ADD-ONE
26
AddBike
27
AddRE-Mo
28
(Electric Bike Solutions GmbH)
29
ADFC e. V.
30
Adhestick Innovations Ltd. (Joe's No Flats)
31
ADViTEX GMBH
32
Äike
33
AER Electric Company Ltd
34
King Edward House
35
Aero Sensor Ltd
36
Aeroe Limited
37
Aforge Enterprise Co., Ltd
38
Agentura REPRO spol. s r.o.
39