Tag: scrapy

Scrapy not sending post request

I was trying to make a post request to a URL but scrapy isn’t sending the post request. I am not getting the correct response. Below is my code. Answer You have a typo in your code here:

How can I use scrapy middlewares to call a mail function?

python scrapy scrapy-middleware web-scraping

I have 15 spiders and every spider has its own content to send mail. My spiders also have their own spider_closed method which starts the mail sender but all of them same. At some point, the spider count will be 100 and I don’t want to use the same functions again and again. Because of that, I try to use

Using Scrapy to add up numbers across several pages

python scrapy

I am using Scrapy to go from page to page and collect numbers that are on a page. The pages are all similar in the way that I can use the same function to parse them. Simple enough, but I don’t need each individual number on the pages, or even each number total from each page. I just need the

Scrapy spider shows errors of another unrelated spider in the same project

python scrapy

Im trying to create a new spider by running scrapy genspider -t crawl newspider “example.com”. This is run in my recently created spider project directory C:Usersdonikbo_guigui_project. As a result I get an error message: This error message refers to a different spider that I previously created in requisites.py that is called I cant understand why genspider command is even bothered

Scrapy run crawl after another

python scrapy web-crawler

I’m quite new to webscraping. I’m trying to crawl at novel reader website, to get the novel info and chapter content, so the way i do it is by creating 2 spider, one to fetch novel information and another one to fetch content of the chapter After that i created a collector to collect and process all of the data

scrapy/regex get json_object from html

python regex scrapy web-crawler

I’m crawling reviews from a website in scrapy python and want to get all the reviews from the following part of the raw html as a dictionary. Getting the window.cj.listings is no problem, but I can’t seem to get the window.cj.app_data out with regex. The following code works for getting the listing. But I get nothing from window.cj.app_data, when I

scrapy css selector returning None then finds value

css html python scrapy

So basically I am adding this portion to my code and I have no clue whats going on. This is the link i am using https://www.digikey.com/products/en?keywords=ID82C55 All in the same Process: -So my css selector returns none. -Then it finds a couple of the html elements returns some of them. -Then finds the last element. So this is causing my

invalid xpath in scrapy (python)

python scrapy web-scraping

hello i’m trying to build a crawler using scrapy my crawler code is : but when i run the command : scrapy crawl shopspider -o info.csv to see the output i can find just the informations about the first product not all the products in this page. so i remove the numbers between [ ] in the xpath for exemple

scrapy internal links + pipeline and mongodb collection relationships

mongodb python relationship scrapy

I am watching videos and reading some articles about how scrapy works with python and inserting to mongodb. Then two questions popped up which either I am not googling with the correct keywords or just couldn’t find the answer. Anyways, let me take example on this tutorial site https://blog.scrapinghub.com to scrape blog posts. I know we can get things like

Is there any wrong in my css selection in this web scraping code?

python scrapy web-scraping

My css selectors response.css(‘div.jhfizC’) and (‘a[itemprop=”url”]’) show 97 items in the web page, but my code is only scraping 35 items. Where is the fault? Here is my code: Answer In the end of the url just put length 90 instead of 30 , length indicate 30 item per page.