Skip to content
Advertisement

Python Selenium – how to get all urls on a page that only load the link after clicking on the div?

I’m trying to scrap the results from this page https://www.zapimoveis.com.br/aluguel/apartamentos/sp+sao-paulo+zona-sul+itaim-bibi/ using Selenium, but I got stuck on obtaining the url of each result. It seems safe to say that each card’s url is not stored on a <a> element and apparently not stored at all at any point of the inner html of each div.

The only way to obtain the address is by clicking on the div, which opens a new tab. Currently, I’m using selenium to click on each one, copying the address and then closing the tab, but not only this is a much more complex and time consuming process but also could trigger some captcha by doing that many requests to the website.

Is there a way to obtain the urls of all offers on this page without this clicking process? I tried using the inspect tool on chrome but couldn’t figure out what is the js or wtv resposible for this behavior.

Thanks!

Advertisement

Answer

I checked out the site and it looks like each card-container has a data-id that can be used to access the listing. The link for this card:

<div data-id="2593637292" class="card-container js-listing-card">{THE HTML FOR THAT CARD}</div>

would be https://www.zapimoveis.com.br/imovel/2593637292.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement