Tag: scrapy

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using

Unable to send requests in the right way after replacing redirected url with original one using middleware

middleware python python-3.x scrapy web-scraping

I’ve created a script using scrapy to fetch some fields from a webpage. The url of the landing page and the urls of inner pages get redirected very often, so I created a middleware to handle that redirection. However, when I came across this post, I could understand that I need to return request in process_request() after replacing the redirected

How to click all of the same text span by using selenium?

python scrapy selenium

I try the code but only first element been clicked so I edit like this but won’t work website is http://211.21.63.217:81/ticket_online.aspx How to click all of the span text is 08/28 ? Answer See, when you did this : you must have got web element is not iterabel error. that is due to you are using find_element, try to use

Can not access array element in Python comming from Xpath

python scrapy xpath

I am trying to cycle through an array in python: Somehow this does not work as it always is accessing the first element. Looking into the xpaths manually I do get the following: Accessing the element directly (note 8=7 due to 0), I am able to get it. Just cycling through it does not work out. It cycles correclty n

How do I include part of my code into ‘yield’?

nested-loops python scrapy yield

Thank you for your time! Each products, sometimes have more than one model. I got the model ‘name’ and ‘price’ of the respective models within a single product via a for loop. But, how do I ‘transfer’ these details to the ‘yield’ section along with other variables of that same product? Below is my attempt, but i am not getting

Install Scrapy on Windows Server 2019, running in a Docker container

anaconda docker python scrapy windows-server-2019

I want to install Scrapy on Windows Server 2019, running in a Docker container (please see here and here for the history of my installation). On my local Windows 10 machine I can run my Scrapy commands like so in Windows PowerShell (after simply starting Docker Desktop): scrapy crawl myscraper -o allobjects.json in folder C:scrapymy1stscraper For Windows Server as recommended

Downloading all JS files using Scrapy?

python scrapy

I am trying to crawl a website searching for all JS files to download them. I am new to Scrapy and I have found that I can use CrawlSpider but seems I have an issue with LinkExtractors as my parser is not executed. Answer I found that LinkExtractor has tags and attrs parameters where the default are for ‘a’ and

Scrapy spider: Download all images from img src

python scrapy web-crawler

I scraped some links from a website and I’m using scrapy spider for scraping purpose. But I got none type value. Just I am any number of image link of li. I download via loop. This is my HTML code I just want to get all link inside li like this Answer Try this, to extract the all image use

Why does linkextractor skip link?

hyperlink python scrapy web-scraping

I am Scraping some pages and am trying to use the LinkExtractor to get the URLs from the response. In general that is going quite ok, but the LinkExtractor is not able to extract the relative link to a pdf file that is found at line 111 of the html I have tried a lot, but haven’t been able to

Scrapy – Request Payload format and types for AJAX based websites

ajax python scrapy

I am trying to scrape the noon.com. Here is the product which I am interested to scrape https://www.noon.com/uae-en/face-and-beard-wash-multicolour-80ml/N22130693A/p?o=f7adb85c3296590b. I am able to get all information of product except Ratings/Review data. Issue here is that website is loading the Ratings data through API link https://www.noon.com/_svc/reviews/fetch/v1/product-reviews/list, which is basically POST request method. I tried with including headers and appropriate payload in the