Skip to content
Advertisement

When using python script to run scrapy crawler, data is scraped successfully but the output file shows no data in it and is of 0 kb

#Scrapy News Crawler

JavaScript

#defining function to set headers and setting Link from where to start scraping

JavaScript

#Iterating headline links and getting healine details and date/time

JavaScript

#Python script (Separate FIle )

JavaScript

Advertisement

Answer

  1. Instead of running you spider with cmdline.execute you can run it with CrawlerProcess, read about common practices. You can see main.py as an example.
  2. You can declare the headers once.
  3. You’re getting a lot of 403, so you should add download delay to avoid getting banned.
  4. You can use feeds export for the csv file.
  5. It’s possible you’re interrupting the writing of the csv file, but it’s only a guess.

Here’s a working example (I checked it with 'CLOSESPIDER_ITEMCOUNT': 10, so give it some time when run run it).

spider.py:

JavaScript

main.py:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement