Skip to content

Tag: scrapy

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“ Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1×1 with-hover]/@href“‘ returns [] I tried using …

How to click all of the same text span by using selenium?

I try the code but only first element been clicked so I edit like this but won’t work website is http://211.21.63.217:81/ticket_online.aspx How to click all of the span text is 08/28 ? Answer See, when you did this : you must have got web element is not iterabel error. that is due to you are using find_…

Can not access array element in Python comming from Xpath

I am trying to cycle through an array in python: Somehow this does not work as it always is accessing the first element. Looking into the xpaths manually I do get the following: Accessing the element directly (note 8=7 due to 0), I am able to get it. Just cycling through it does not work out. It cycles correc…

Downloading all JS files using Scrapy?

I am trying to crawl a website searching for all JS files to download them. I am new to Scrapy and I have found that I can use CrawlSpider but seems I have an issue with LinkExtractors as my parser is not executed. Answer I found that LinkExtractor has tags and attrs parameters where the default are for &#821…

Scrapy spider: Download all images from img src

I scraped some links from a website and I’m using scrapy spider for scraping purpose. But I got none type value. Just I am any number of image link of li. I download via loop. This is my HTML code I just want to get all link inside li like this Answer Try this, to extract the all image use

Why does linkextractor skip link?

I am Scraping some pages and am trying to use the LinkExtractor to get the URLs from the response. In general that is going quite ok, but the LinkExtractor is not able to extract the relative link to a pdf file that is found at line 111 of the html I have tried a lot, but haven’t been able to