I’m interested in getting a better idea of what scrapy can do. Here is a very simple selenium code that interacts with a website, fills in some boxes, clicks some elements and downloads a file. Could this code be replicated using scrapy?, so that a code is written using scrapy that does the exact same thing.
from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC options=Options() options.add_argument("--window-size=1920,1080") driver=webdriver.Chrome(options=options) driver.get("https://www.ons.gov.uk/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare") click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click() click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click() click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click() click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()
Advertisement
Answer
"selenium code be recreated using scrapy"
is also working fine with SeleniuRequest
which is superfast
than general selenium. You need scrapy project.It works as headless mode but always get screenshot for each step.
script:
import scrapy from scrapy_selenium import SeleniumRequest from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC class TestSpider(scrapy.Spider): name = 'test' def start_requests(self): yield SeleniumRequest( url='https://www.ons.gov.uk', callback=self.parse, wait_time = 3, screenshot = True ) def parse(self, response): driver = response.meta['driver'] driver.save_screenshot('screenshot.png') WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("Education and childcare") driver.save_screenshot('screenshot_1.png') click_button=driver.find_element_by_xpath('//*[@id="nav-search-submit"]').click() driver.save_screenshot('screenshot_2.png') click_button=driver.find_element_by_xpath('//*[@id="results"]/div[1]/div[2]/div[1]/h3/a/span').click() click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div[1]/section/div/div[1]/div/div[2]/h3/a/span').click() click_button=driver.find_element_by_xpath('//*[@id="main"]/div[2]/div/div[1]/div[2]/p[2]/a').click()
settings.py file:
You have to add the following options in settings.py file
# Middleware DOWNLOADER_MIDDLEWARES = { 'scrapy_selenium.SeleniumMiddleware': 800 } # Selenium from shutil import which SELENIUM_DRIVER_NAME = 'chrome' SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver') SELENIUM_DRIVER_ARGUMENTS = ['--headless']
Output:
'downloader/response_status_count/200'
screenshot of the project looks like