so I have the following Scrapy Spider in spiders.py
import scrapy class TwitchSpider(scrapy.Spider): name = "clips" def start_requests(self): urls = [ f'https://www.twitch.tv/wilbursoot/clips?filter=clips&range=7d' ] def parse(self, response): for clip in response.css('.tw-tower'): yield { 'title': clip.css('::text').get() }
But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes
in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn’t find much
Advertisement
Answer
Run the spider from main.py:
from scrapy.crawler import CrawlerProcess from scrapy.utils.project import get_project_settings if __name__ == "__main__": spider = 'TwitchSpider' settings = get_project_settings() # change/update settings: settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36' process = CrawlerProcess(settings) process.crawl(spider) process.start()