scrape data using selenium

Question

The program is running good but they will scrape only one TITLE I want they scrape all the title in the page These is the page link https://www.eurobike.com/en/index-exhibitors/exhibitors/? Answer The website is populated completely by complex JavaScript.First of all, to display listing from this url,accepting the cookies is a must but to accept and click on the cookie button isn't

Accepted Answer

The website is populated completely by complex JavaScript.First of all, to display listing from this url,accepting the cookies is a must but to accept and click on the cookie button isn&#8217;t a easy task because cookies button is under shadow root (open) selenium and webdriverWait can do nothing on shadow root,so to execute shadow root you need to apply JavaScript querySelector.Full Working code:import timefrom selenium import webdriverfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.chrome.service import Servicefrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.wait import WebDriverWaitfrom webdriver_manager.chrome import ChromeDriverManagerfrom bs4 import BeautifulSoupoptions = webdriver.ChromeOptions()options.add_argument("start-maximized")#chrome to stay open to see what's happening in the real word or make it comment to closeoptions.add_experimental_option("detach", True)driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=options)   URL ='https://www.eurobike.com/en/index-exhibitors/exhibitors/?'driver.get(URL)time.sleep(5)#To execute shadow root and accept cookiesdriver.execute_script('''return document.querySelector('div#usercentrics-root').shadowRoot.querySelector('button[data-testid="uc-accept-all-button"]')''').click()#Grabbing all listing url and iterate,append and new deriver requestlinks=[]for card in WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.list__results > div > div > a'))):    link=card.get_attribute('href')    links.append(link)for u in links:    driver.get(u)    time.sleep(5)    #extracting desired data using bs4 to  avoid much uses  of selenium because of it's complexity and time killing    soup = BeautifulSoup(driver.page_source,'lxml')    title=soup.select_one('h1.underlined').get_text(strip=True)    print(title)Output: ANGLE isA&C SolutionsA&J International Co.,Ltd                        (Taiwan Branch)A-Pro Tech Co., LTDA-Rim Ent. Co., Ltd.Abbey Bike ToolsABIMOTA                        Associacao Nacional das Industrias                         de Duas Rodas, Ferragens, MobiliáriABIMOTA                        Associacao Nacional das Industrias                         de Duas Rodas, Ferragens, MobiliáriABUS |August Bremicker Söhne KGABUS |August Bremicker Söhne KGAccelerated Systems Inc. (ASI)ACCORD ENTERPRISE CORP.Acer Gadget Inc.Acetrikes Industrial Co., Ltd.ACT LAB LLCACTIAAction Sports SRLActivent 365 s.r.o.ADAC e.V.ADD-ONEAddBikeAddRE-Mo                        (Electric Bike Solutions GmbH)ADFC e. V.Adhestick Innovations Ltd. (Joe's No Flats)ADViTEX GMBHÄikeAER Electric Company Ltd                        King Edward HouseAero Sensor LtdAeroe LimitedAforge Enterprise Co., LtdAgentura REPRO spol. s r.o.

Advertisement

Answer