Skip to content
Advertisement

During recursive scraping in scrapy, how extract info from multiple nodes of parent url and associated children url together?

The parent url got multiple nodes (quotes), each parent node got child url (author info). I am facing trouble linking the quote to author info, due to asynchronous nature of scrapy?

How can I fix this issue, here’s the code so far. Added # <--- comment for easy spot.

JavaScript

Please note that in order to allow duplication, added DUPEFILTER_CLASS = 'scrapy.dupefilters.BaseDupeFilter' in settings.py

Output I am getting presently-

JavaScript

Thanks in advance!

Advertisement

Answer

Here is the minimal working solution. Both type of pagination is working and I use meta keyword to transfer quote item from one response to another.

JavaScript
Advertisement