Skip to content
Advertisement

Scrapy Python can‘t extract links with more stable xpath

I‘m Building a scraper for this website. I‘m using Python and scrapy Shell to extract the data that I want: xpath would be: //a[@class=“sb-card sb-card-company site-1x1 with-hover]/@href“

Using response.xpath(‘//a[@class=“sb-card sb-card-company site-1x1 with-hover]/@href“‘ returns []

I tried using contains(@class,“sb-card-company“) with the same result. Using other containers in the same way, changed nothing. Using a different page also had no effect. Using hard nodes instead worked but I‘m curious about what I did wrong.

Advertisement

Answer

It’s not a problem with xpath. It’s a dynamically-loaded content issue.

Here’s an example of how you can get it from the json file:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement