I am getting below error while running my scrapy project. I tried everything suggested on stackoverflow but yet nothing has solved the problem. Feel free to ask for more information. Looking forward for any help. Answer Make sure your scrapy.cfg file has the same default and project name as your spider crawle…
Tag: scrapy
XPath Selector to get IMDB release Date
I am practicing using Xpath selectors, and it seems to be very difficult to extract the release date from this website. I can get to the div class=’txt-block’, but not past that. I am trying to the get the date underneath it. E.g. “18 July 2008 (USA)” https://www.imdb.com/title/tt04685…
Scrapy 404 Error – FormRequest redirecting problem on BrickSeek website
I am currently trying to login brickseek’s website using FormRequest method but I am unable to login successfully. I keep on getting 404 error when using the scrapy crawl command. It seems to me that scrapy is redirecting the page incorrectly. I also noticed that my login and password are inputted in th…
How to get url and row id from database before scraping to use it in pipeline to store data?
I’m trying to make a spider that gets some outdated urls from database, parses it and updates data in database. I need to get urls to scrape and ids to use it pipeline that saves the scraped data. I made this code, but I don’t know why scrapy changes the order of scraped links, looks like its rand…
How can I get the text clean with scrapy shell
I’m trying the following command on scrapy shell which returns this result: The thing is, I want to extract only the word “Ajax” that is is between <strong> tags. Answer You need to add <strong> tag to your selector
Python, extract XHR response data from website
I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from…
Pyinstaller error on scrapy?
I am using scrapy importing it. I built the python file using pyinstaller. After building it I ran the file ./new.py. But the error pops: Answer You did not use Pyinstaller properly when you had built your stand-alone program. Here is a short, layman’s description of how Pyinstaller works: Pyinstaller b…
Scrapy: populate items with item loaders over multiple pages
I’m trying to crawl and scrape multiple pages, given multiple urls. I am testing with Wikipedia, and to make it easier I just used the same Xpath selector for each page, but I eventually want to use many different Xpath selectors unique to each page, so each page has its own separate parsePage method. T…
Replacing characters in Scrapy item
I’m trying to scrape from a commerce website using Scrapy. For the price tag, I want to remove the “$”, but my current code does not work. What is the appropriate method to remove characters when using Scrapy? Answer extract() would return you a list, you can use extract_first() to get a sin…
__init__() got an unexpected keyword argument ‘_job’
I am trying to use scrapyd with scrapy. When I use this the code below it works fine. But when I use it with selenium, it doesn’t My spider never runs. In jobs it gets listed under finished, and on error log I see exceptions.TypeError: __init__() got an unexpected keyword argument ‘_job’. He…