Skip to content
Advertisement

Tag: scrapy

XMLFeedSpider not Producing an Output CSV

Having an issue with XMLFeedSpider. I can get the parsing to work on the scrapy shell, so it seems there is something going on with either the request, or the spider’s engagement. Whether I add a start_request() method or not, I seem to get the same error. No output_file.csv is produced after running the spider. I am able to get

Scrapy extracting entire HTML element instead of following link

I’m trying to access or follow every link that appears for commercial contractors from this website: https://lslbc.louisiana.gov/contractor-search/search-type-contractor/ then extract the emails from the sites that each link leads to but when I run this script, scrapy follows the base url with the entire HTML element attached to the end of the base url instead of following only the link at

Trying to add multiple yields into a single json file using Scrapy

I am trying to figure out if my scrapy tool is correctly hitting the product_link for the request callback – ‘yield scrapy.Request(product_link, callback=self.parse_new_item)’ product_link should be ‘https://www.antaira.com/products/10-100Mbps/LNX-500A’ but I have not been able to confirm if my program is jumping into the next step created so that I can retrieve the correct yield return. Thank you! Answer You have a

Following links and crawling them

I was trying to make a crawler to follow links, with this code I was able to get the links but the part of entering the links and getting the information I need was not working, so a friend helped me to come up with this code It gets the json with the page items, but in loop number 230

When using python script to run scrapy crawler, data is scraped successfully but the output file shows no data in it and is of 0 kb

#Scrapy News Crawler #defining function to set headers and setting Link from where to start scraping #Iterating headline links and getting healine details and date/time #Python script (Separate FIle ) Answer Instead of running you spider with cmdline.execute you can run it with CrawlerProcess, read about common practices. You can see main.py as an example. You can declare the headers

Advertisement