I want to run scrapy from a single script and I want to get all settings from settings.py but I would like to be able to change some of them:
JavaScript
x
12
12
1
from scrapy.crawler import CrawlerProcess
2
from scrapy.utils.project import get_project_settings
3
4
process = CrawlerProcess(get_project_settings())
5
6
*### so what im missing here is being able to set or override one or two of the settings###*
7
8
9
# 'followall' is the name of one of the spiders of the project.
10
process.crawl('testspider', domain='scrapinghub.com')
11
process.start() # the script will block here until the crawling is finished
12
I wasn’t able to use this. I tried the following:
JavaScript
1
3
1
settings=scrapy.settings.Settings()
2
settings.set('RETRY_TIMES',10)
3
but it didn’t work.
Note: I’m using the latest version of scrapy.
Advertisement
Answer
So in order to override some settings, one way would be overriding/setting custom_settings, the spider’s static variable, in our script.
so I imported the spider’s class and then override the custom_setting:
JavaScript
1
4
1
from testspiders.spiders.followall import FollowAllSpider
2
3
FollowAllSpider.custom_settings={'RETRY_TIMES':10}
4
So this is the whole script:
JavaScript
1
12
12
1
from scrapy.crawler import CrawlerProcess
2
from scrapy.utils.project import get_project_settings
3
from testspiders.spiders.followall import FollowAllSpider
4
5
FollowAllSpider.custom_settings={'RETRY_TIMES':10}
6
process = CrawlerProcess(get_project_settings())
7
8
9
# 'followall' is the name of one of the spiders of the project.
10
process.crawl('testspider', domain='scrapinghub.com')
11
process.start() # the script will block here until the crawling is finished
12