Im trying to create a new spider by running scrapy genspider -t crawl newspider "example.com"
. This is run in my recently created spider project directory C:Usersdonikbo_guigui_project. As a result I get an error message:
File "C:Usersdonikbo_guigui_projectgui_projectspidersrequisites.py", line 6, in <module> from gui_project.gui_project.updated_kw import translated_kw_dicts ModuleNotFoundError: No module named 'gui_project.gui_project'
This error message refers to a different spider that I previously created in requisites.py that is called
class RequisitesSpider(CrawlSpider): name = 'requisites'
I cant understand why genspider
command is even bothered with this old spider in requisites.py and thus denies creating a new spider. The requisites.py has these import statements that do not show any error when I run the spider to which the error is refering to, yet when I want to create a new spider suddenly the gui_project.gui_project module cannot be found:
from bs4 import BeautifulSoup from scrapy.linkextractors import LinkExtractor from scrapy.spiders import CrawlSpider, Rule from langdetect import detect import re from gui_project.gui_project.updated_kw import translated_kw_dicts from urllib.parse import urlparse
If I comment out the from gui_project.gui_project.updated_kw import translated_kw_dicts
and run scrapy genspider -t crawl newspider "example.com"
again then my new spider is created succesfully. The same applies when I try to run a third spider in the same project. It also gets stopped due to an error in the requisites.py spider although they are not interconnected in any way and the names are different for each spider. cfg and settings files have not been moved.
Any ideas what is causing this?
Advertisement
Answer
When you try creating a new spider, scrapy genspider
will check if a spider with that name already exists.
To do this, an instance of SpiderLoader
is used.
SpiderLoader
imports and caches all of the spiders in a project when initiated.
Since one of those imports results in an error, the command fails.