Tag: scrapy

Being able to change the settings while running scrapy from a script

I want to run scrapy from a single script and I want to get all settings from settings.py but I would like to be able to change some of them: I wasn’t able to use this. I tried the following: but it didn’t work. Note: I’m using the latest version of scrapy. Answer So in order to override some settings,

GtkWarning: could not open display

gtk python scrapy ubuntu vps

I am trying to run a spider on a vps (using scrapyjs which uses python-gtk2). On running the spider I am getting the error How do I run this in a headless setup? Answer First of all, you didn’t specify if you have a desktop environment (or X) installed on your server? Regardless of that, you can achieve headless setup

Scrapy: Passing item between methods

python scrapy

Suppose I have a Bookitem, I need to add information to it in both the parse phase and detail phase Using the code as is would led to undefined item in the detail phase. How can I pass the item to the detail? detail(self,response,item) doesn’t seem to work. Answer There is an argument named meta for Request: then in function

How to run python script inside rails application in heroku?

heroku python ruby-on-rails scrapy web-scraping

I have a rails application hosted in heroku. I also wrote a web scraper using scrapy in python. I need to run the python script from the rails application in heroku,I will explain with an example. Eg:The user will input the url to scrape in my rails app.Then the rails app give control to python script to scrape data which

Recording the total time taken for running a spider in scrapy

python scrapy

I am using scrapy to scrap a site I had written a spider and fetched all the items from the page and saved to a csv file, and now i want to save the total execution time taken by scrapy to run the spider file, actually after spider execution is completed and when we have at at terminal it will

generate python regex at runtime to match numbers from ‘n’ to infinite

python regex scrapy

I am using scrapy to crawl a website and extract data from it, scrapy uses regex-based rules to check if a page has to be parsed, or a link has to be followed. I am implementing a resume feature for my spider, so it could continue crawling from the last visited page. For this, I get the last followed link