I am getting below error while running my scrapy project. I tried everything suggested on stackoverflow but yet nothing has solved the problem. Feel free to ask for more information. Looking forward for any help. Answer Make sure your scrapy.cfg file has the same default and project name as your spider crawler name inside the spiders folder. I tried changing
Tag: scrapy
XPath Selector to get IMDB release Date
I am practicing using Xpath selectors, and it seems to be very difficult to extract the release date from this website. I can get to the div class=’txt-block’, but not past that. I am trying to the get the date underneath it. E.g. “18 July 2008 (USA)” https://www.imdb.com/title/tt0468569/?ref_=adv_li_tt I can get up to this part. But I cannot get the
Scrapy 404 Error – FormRequest redirecting problem on BrickSeek website
I am currently trying to login brickseek’s website using FormRequest method but I am unable to login successfully. I keep on getting 404 error when using the scrapy crawl command. It seems to me that scrapy is redirecting the page incorrectly. I also noticed that my login and password are inputted in the redirected webpage which is weird. The first
How to get url and row id from database before scraping to use it in pipeline to store data?
I’m trying to make a spider that gets some outdated urls from database, parses it and updates data in database. I need to get urls to scrape and ids to use it pipeline that saves the scraped data. I made this code, but I don’t know why scrapy changes the order of scraped links, looks like its random, so my
How can I get the text clean with scrapy shell
I’m trying the following command on scrapy shell which returns this result: The thing is, I want to extract only the word “Ajax” that is is between <strong> tags. Answer You need to add <strong> tag to your selector
Python, extract XHR response data from website
I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from the barchart website) which is shown in the picture below. Picture
Pyinstaller error on scrapy?
I am using scrapy importing it. I built the python file using pyinstaller. After building it I ran the file ./new.py. But the error pops: Answer You did not use Pyinstaller properly when you had built your stand-alone program. Here is a short, layman’s description of how Pyinstaller works: Pyinstaller bundles the Python interpreter, necessary DLLs (for Windows), your project’s
Scrapy: populate items with item loaders over multiple pages
I’m trying to crawl and scrape multiple pages, given multiple urls. I am testing with Wikipedia, and to make it easier I just used the same Xpath selector for each page, but I eventually want to use many different Xpath selectors unique to each page, so each page has its own separate parsePage method. This code works perfectly when I
Replacing characters in Scrapy item
I’m trying to scrape from a commerce website using Scrapy. For the price tag, I want to remove the “$”, but my current code does not work. What is the appropriate method to remove characters when using Scrapy? Answer extract() would return you a list, you can use extract_first() to get a single value: Or, you can use the .re()
__init__() got an unexpected keyword argument ‘_job’
I am trying to use scrapyd with scrapy. When I use this the code below it works fine. But when I use it with selenium, it doesn’t My spider never runs. In jobs it gets listed under finished, and on error log I see exceptions.TypeError: __init__() got an unexpected keyword argument ‘_job’. Here is the full error log What do