Skip to content
Advertisement

Scrapy extracting entire HTML element instead of following link

I’m trying to access or follow every link that appears for commercial contractors from this website: https://lslbc.louisiana.gov/contractor-search/search-type-contractor/ then extract the emails from the sites that each link leads to but when I run this script, scrapy follows the base url with the entire HTML element attached to the end of the base url instead of following only the link at the given element.

Does anyone know how I can get the desired result or what I’m doing wrong?

Here’s the code that I have so far:

JavaScript

Which returns:

JavaScript

Advertisement

Answer

The webpage contains its in-built search option. Whenever you search by selecting the commercial contractors then data is loaded dynamically by JS via API as json format alomg with GET method.That’s why you can’t get the desired data from the plain HTML DOM.

Full working Code as an example:

JavaScript
Advertisement