Skip to content
Advertisement

Tag: scrapy

Python Scrapy -> Use a scrapy spider as a function

so I have the following Scrapy Spider in spiders.py But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn’t find

passing table name to pipeline scrapy python

I have different spiders that scrape similar values and I want to store the scraped values in different slite3 tables. I can do this by using a different pipeline for each spider but, since the only thing that changes is the table name, would it be possible to pass somehow the table name from the spider to the pipeline? This

What is this Scrapy error: ReactorNotRestartable?

I do not understand why my spider wont run. I tested the css selector separately, so I do not think it is the parsing method. Traceback message: ReactorNotRestartable: Answer urls = “https://www.espn.com/college-football/team/_/id/52” for url in urls: You’re going through the characters of “urls”, change it to a list: Also you don’t have “parse_front” function, if you just didn’t add it

Why is Scrapy not following all rules / running all callbacks?

I have two spiders inheriting from a parent spider class as follows: The parse_tournament_page callback for the Rule in first spider works fine. However, the second spider only runs the parse_tournament callback from the first Rule despite the fact that the second Rule is the same as the first spider and is operating on the same page. I’m clearly missing

Scrapy can’t find items

I am currently still learning Scrapy and trying to work with pipelines and ItemLoader. However, I currently have the problem that the spider shows that Item.py does not exist. What exactly am I doing wrong and why am I not getting any data from the spider into my pipeline? Running the Spider without importing the items works fine. The Pipeline

Advertisement