Skip to content

Tag: scrapy

Could this selenium code be recreated using scrapy?

I’m interested in getting a better idea of what scrapy can do. Here is a very simple selenium code that interacts with a website, fills in some boxes, clicks some elements and downloads a file. Could this code be replicated using scrapy?, so that a code is written using scrapy that does the exact same thing. Answer “selenium code be

Run scrapy splash as a script

I am trying to run a scrapy script with splash, as I want to scrape a javascript based webpage, but with no results. When I execute this script with python command, I get this error: crochet._eventloop.TimeoutError. In addition the print statement in parse method never printed, so I consider something is wrong with SplashRequest. The code that I wrote in

Python Scrapy -> Use a scrapy spider as a function

so I have the following Scrapy Spider in spiders.py But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn’t find

passing table name to pipeline scrapy python

I have different spiders that scrape similar values and I want to store the scraped values in different slite3 tables. I can do this by using a different pipeline for each spider but, since the only thing that changes is the table name, would it be possible to pass somehow the table name from the spider to the pipeline? This

What is this Scrapy error: ReactorNotRestartable?

I do not understand why my spider wont run. I tested the css selector separately, so I do not think it is the parsing method. Traceback message: ReactorNotRestartable: Answer urls = “https://www.espn.com/college-football/team/_/id/52” for url in urls: You’re going through the characters of “urls”, change it to a list: Also you don’t have “parse_front” function, if you just didn’t add it

Why is Scrapy not following all rules / running all callbacks?

I have two spiders inheriting from a parent spider class as follows: The parse_tournament_page callback for the Rule in first spider works fine. However, the second spider only runs the parse_tournament callback from the first Rule despite the fact that the second Rule is the same as the first spider and is operating on the same page. I’m clearly missing

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using

Downloading all JS files using Scrapy?

I am trying to crawl a website searching for all JS files to download them. I am new to Scrapy and I have found that I can use CrawlSpider but seems I have an issue with LinkExtractors as my parser is …