Skip to content
Advertisement

Accessing the dataLayer (JS variable) when scraping with python

I’m using beautiful soup to scrape a webpages. I want to access the dataLayer (a javascript variable) that is present on this webpage? How can I retrieve it using python? enter image description here

Advertisement

Answer

You can parse it from the source with the help of re and json.loads to find the correct script tag that contains the json:

JavaScript

Running it you see we get what you want:

JavaScript

You could also use a regex but in this case using str.find to get the end of the data is sufficient.

Advertisement