Downloading all JS files using Scrapy?

Tags: ,

I am trying to crawl a website searching for all JS files to download them. I am new to Scrapy and I have found that I can use CrawlSpider but seems I have an issue with LinkExtractors as my parser is not executed.

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

class JSDownloader(CrawlSpider):
  name = 'jsdownloader'
  allowed_domains = ['']
  start_urls = ['']

  rules = (
    Rule(LinkExtractor(allow=('.js', )), callback='parse_item'),

  def parse_item(self, response):'JS File %s', response.url)
    item = scrapy.Item()
    # Process Item here
    yield item


I found that LinkExtractor has tags and attrs parameters where the default are for ‘a’ and ‘area’ tags only. LinxExtractor Documentation

So the solution is to add ” tag:

Rule(LinkExtractor(tags=('a', '<script>'), attrs('href','src')), callback='parse_item'),

Source: stackoverflow