Tag: scrapy

scrapy not running ModuleNotFoundError: No module named ‘scraper.settings’

I am getting below error while running my scrapy project. I tried everything suggested on stackoverflow but yet nothing has solved the problem. Feel free to ask for more information. Looking forward for any help. Answer Make sure your scrapy.cfg file has the same default and project name as your spider crawler name inside the spiders folder. I tried changing

XPath Selector to get IMDB release Date

python scrapy web-scraping xpath

I am practicing using Xpath selectors, and it seems to be very difficult to extract the release date from this website. I can get to the div class=’txt-block’, but not past that. I am trying to the get the date underneath it. E.g. “18 July 2008 (USA)” https://www.imdb.com/title/tt0468569/?ref_=adv_li_tt I can get up to this part. But I cannot get the

Scrapy 404 Error – FormRequest redirecting problem on BrickSeek website

python scrapy

I am currently trying to login brickseek’s website using FormRequest method but I am unable to login successfully. I keep on getting 404 error when using the scrapy crawl command. It seems to me that scrapy is redirecting the page incorrectly. I also noticed that my login and password are inputted in the redirected webpage which is weird. The first

How to get url and row id from database before scraping to use it in pipeline to store data?

python python-3.x scrapy

I’m trying to make a spider that gets some outdated urls from database, parses it and updates data in database. I need to get urls to scrape and ids to use it pipeline that saves the scraped data. I made this code, but I don’t know why scrapy changes the order of scraped links, looks like its random, so my

How can I get the text clean with scrapy shell

python scrapy web-scraping

I’m trying the following command on scrapy shell which returns this result: The thing is, I want to extract only the word “Ajax” that is is between <strong> tags. Answer You need to add <strong> tag to your selector

Python, extract XHR response data from website

ajax python scrapy web-scraping xmlhttprequest

I am trying to extract some data from https://www.barchart.com/stocks/signals/top-bottom/top?viewName=main. I am able to extract data from normal html using the xpath method, however i noticed that this website gets its data from a network. I have found the location of where the data I want is (the table from the barchart website) which is shown in the picture below. Picture

Pyinstaller error on scrapy?

pyinstaller python scrapy

I am using scrapy importing it. I built the python file using pyinstaller. After building it I ran the file ./new.py. But the error pops: Answer You did not use Pyinstaller properly when you had built your stand-alone program. Here is a short, layman’s description of how Pyinstaller works: Pyinstaller bundles the Python interpreter, necessary DLLs (for Windows), your project’s

Scrapy: populate items with item loaders over multiple pages

python scrapy

I’m trying to crawl and scrape multiple pages, given multiple urls. I am testing with Wikipedia, and to make it easier I just used the same Xpath selector for each page, but I eventually want to use many different Xpath selectors unique to each page, so each page has its own separate parsePage method. This code works perfectly when I

Replacing characters in Scrapy item

python scrapy web-scraping

I’m trying to scrape from a commerce website using Scrapy. For the price tag, I want to remove the “$”, but my current code does not work. What is the appropriate method to remove characters when using Scrapy? Answer extract() would return you a list, you can use extract_first() to get a single value: Or, you can use the .re()

init() got an unexpected keyword argument ‘_job’

python python-2.7 scrapy scrapyd selenium

I am trying to use scrapyd with scrapy. When I use this the code below it works fine. But when I use it with selenium, it doesn’t My spider never runs. In jobs it gets listed under finished, and on error log I see exceptions.TypeError: __init__() got an unexpected keyword argument ‘_job’. Here is the full error log What do