Python – Selenium – Scraping through multiple websites

Question

I am trying to build a webscraper with python / selenium that scrapes data from multiple websites and stores the data in an Excel sheet. The sites I want to scrape are the following: From all sites I want to scrape the &#8220;Omsättning&#8221;, &#8220;Volym&#8221; and &#8220;VWAP&#8221; values and store them …

Accepted Answer

If you want to run them one after another in a loop, then you may have to use something like this:urlist = ['https://www.ngm.se/marknaden/vardepapper?symbol=ETH%20ZERO%20SEK',          'https://www.ngm.se/marknaden/vardepapper?symbol=BTC%20ZERO%20SEK',          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20CARDANO%20SEK',          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20POLKADOT%20SEK',          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20SOLANA%20SEK',          'https://www.ngm.se/marknaden/vardepapper?symbol=VALOUR%20UNISWAP%20SEK']for i in urlist:    driver.get(i)    print(i)    time.sleep(5)    iframe = driver.find_element(By.XPATH, '//iframe').get_attribute("src")    driver.get(iframe)    element = WebDriverWait(driver, 10).until(EC.presence_of_element_located(        (By.XPATH, '//div[@id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))    volym = element.text.split('n')[-3]    vwap = element.text.split('n')[-2]    Omsaettning = element.text.split('n')[-4]    print(volym, vwap, Omsaettning)driver.quit()In this above option, you have to take care of the list indices as they may not stay the same for all the urls.Contrarily, if you want all of them separately but simultaneously, then you may have to use the xdist library (which you have to install btw). But note that the more number of workers you require, the more resources the system will take.If you want the browser not to be displayed, then you may use chromeoption --headlessfrom selenium.webdriver.chrome.options import Optionsopt.add_argument('--headless')driver = webdriver.Chrome(your driver path, options=opt)The above options would not open the browser to visibility; however, I have seen that with the headless mode, your code is failing to find this element (which btw works with the head mode)element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, '//div[@id="detailviewDiv"]//thead[.//span[contains(text(),"Volym")]]/following-sibling::tbody')))

Advertisement

Answer