Skip to content
Advertisement

How to extract a table from website without specifying the web browser in python

I’m trying to automate data extraction from ASX (https://www.asxenergy.com.au/futures_nz) website into my database by writing a web scraping python script and deploying it in Azure Databrick. Currently, the script I have is working in Visual Studio Code, but when I try to run it in databrick, it crashes, throwing the error below.

JavaScript

I believe I will need to simplify my code in order to obtain the table without mentioning the we browser.

My sample code is below:

JavaScript

I tried to use the below code instead, with just the request package, but it failed since it couldn’t find the ‘market-dataset’ div class.

JavaScript

Can anyone please help me.

Advertisement

Answer

This page uses JavaScript to load table from https://www.asxenergy.com.au/futures_nz/dataset

Server checks if it is AJAX/XHR request so it needs header

JavaScript

But your findAll("div",href=True, ... tries to find <div href="..."> but this page doesn’t have it – so I search normal <div> with class="market-dataset"


Minimal working code.

JavaScript

Result:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement