Skip to content
Advertisement

Scrapy spider shows errors of another unrelated spider in the same project

Im trying to create a new spider by running scrapy genspider -t crawl newspider "example.com". This is run in my recently created spider project directory C:Usersdonikbo_guigui_project. As a result I get an error message:

  File "C:Usersdonikbo_guigui_projectgui_projectspidersrequisites.py", line 6, in <module>
    from gui_project.gui_project.updated_kw import translated_kw_dicts
ModuleNotFoundError: No module named 'gui_project.gui_project'

This error message refers to a different spider that I previously created in requisites.py that is called

class RequisitesSpider(CrawlSpider):
    name = 'requisites'

I cant understand why genspider command is even bothered with this old spider in requisites.py and thus denies creating a new spider. The requisites.py has these import statements that do not show any error when I run the spider to which the error is refering to, yet when I want to create a new spider suddenly the gui_project.gui_project module cannot be found:

from bs4 import BeautifulSoup
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from langdetect import detect
import re
from gui_project.gui_project.updated_kw import translated_kw_dicts
from urllib.parse import urlparse

If I comment out the from gui_project.gui_project.updated_kw import translated_kw_dicts and run scrapy genspider -t crawl newspider "example.com" again then my new spider is created succesfully. The same applies when I try to run a third spider in the same project. It also gets stopped due to an error in the requisites.py spider although they are not interconnected in any way and the names are different for each spider. cfg and settings files have not been moved.

Any ideas what is causing this?

Advertisement

Answer

When you try creating a new spider, scrapy genspider will check if a spider with that name already exists.
To do this, an instance of SpiderLoader is used.

SpiderLoader imports and caches all of the spiders in a project when initiated.
Since one of those imports results in an error, the command fails.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement