Skip to content
Advertisement

Using Scrapy to add up numbers across several pages

I am using Scrapy to go from page to page and collect numbers that are on a page. The pages are all similar in the way that I can use the same function to parse them. Simple enough, but I don’t need each individual number on the pages, or even each number total from each page. I just need the total sum of all the numbers across all the pages I am visiting. The Scrapy documentation talks about using cb_kwargs to pass arguments, and this is what I have so far.

JavaScript

I cut out things irrelevant to the question to make my code more clear. I feel like using a for loop to add up the numbers is okay, but how do I get that total value to the next page (if there is one) and then export it with the rest of the data at the end?

Advertisement

Answer

I don’t see the need for passing data from one request to another. The most obvious way I can think of to go about it would be as follows:

  • You collect the count of the page and yield the result as an item
  • You create an item pipeline that keeps track of the total count
  • When the scraping is finished, you have the total count in your item pipeline and you write it to a file, database, …

Your spider would look something like this:

JavaScript

For the item pipeline you can use logic like this:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement