I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using
Tag: scrapy
Unable to send requests in the right way after replacing redirected url with original one using middleware
I’ve created a script using scrapy to fetch some fields from a webpage. The url of the landing page and the urls of inner pages get redirected very often, so I created a middleware to handle that redirection. However, when I came across this post, I could understand that I need to return request in process_request() after replacing the redirected
How to click all of the same text span by using selenium?
I try the code but only first element been clicked so I edit like this but won’t work website is http://211.21.63.217:81/ticket_online.aspx How to click all of the span text is 08/28 ? Answer See, when you did this : you must have got web element is not iterabel error. that is due to you are using find_element, try to use
Can not access array element in Python comming from Xpath
I am trying to cycle through an array in python: Somehow this does not work as it always is accessing the first element. Looking into the xpaths manually I do get the following: Accessing the element directly (note 8=7 due to 0), I am able to get it. Just cycling through it does not work out. It cycles correclty n
How do I include part of my code into ‘yield’?
Thank you for your time! Each products, sometimes have more than one model. I got the model ‘name’ and ‘price’ of the respective models within a single product via a for loop. But, how do I ‘transfer’ these details to the ‘yield’ section along with other variables of that same product? Below is my attempt, but i am not getting
Install Scrapy on Windows Server 2019, running in a Docker container
I want to install Scrapy on Windows Server 2019, running in a Docker container (please see here and here for the history of my installation). On my local Windows 10 machine I can run my Scrapy commands like so in Windows PowerShell (after simply starting Docker Desktop): scrapy crawl myscraper -o allobjects.json in folder C:scrapymy1stscraper For Windows Server as recommended
Downloading all JS files using Scrapy?
I am trying to crawl a website searching for all JS files to download them. I am new to Scrapy and I have found that I can use CrawlSpider but seems I have an issue with LinkExtractors as my parser is not executed. Answer I found that LinkExtractor has tags and attrs parameters where the default are for ‘a’ and
Scrapy spider: Download all images from img src
I scraped some links from a website and I’m using scrapy spider for scraping purpose. But I got none type value. Just I am any number of image link of li. I download via loop. This is my HTML code I just want to get all link inside li like this Answer Try this, to extract the all image use
Why does linkextractor skip link?
I am Scraping some pages and am trying to use the LinkExtractor to get the URLs from the response. In general that is going quite ok, but the LinkExtractor is not able to extract the relative link to a pdf file that is found at line 111 of the html I have tried a lot, but haven’t been able to
Scrapy – Request Payload format and types for AJAX based websites
I am trying to scrape the noon.com. Here is the product which I am interested to scrape https://www.noon.com/uae-en/face-and-beard-wash-multicolour-80ml/N22130693A/p?o=f7adb85c3296590b. I am able to get all information of product except Ratings/Review data. Issue here is that website is loading the Ratings data through API link https://www.noon.com/_svc/reviews/fetch/v1/product-reviews/list, which is basically POST request method. I tried with including headers and appropriate payload in the