Skip to content
Advertisement

Web scraping content of ::before using BeautifulSoup?

I am quite new to python and tried scraping some websites. A few of em worked well but i now stumbled upon one that is giving me a hard time. the url im using is: https://www.drankdozijn.nl/groep/rum. Im trying to get all product titles and urls from this page. But since there is a ::before in the HTML code i am unable to scrape it. Any help would be very appreciated! This is the code i have so far:

JavaScript

Advertisement

Answer

The product details for that site are returned via a different URL using JSON. The HTML returned does not contain this. This could easily be accessed as follows:

JavaScript

This gets the first 10 pages of details, starting:

Excel screenshot

I recommend you print(data) to have a look at all of the information that is available.

The URL was found using the browser’s network tools to watch the request it made whilst loading the page. An alternative approach would be to use something like Selenium to fully render the HTML, but this will be slower and more resource intensive.

openpyxl is used to create an output spreadsheet. You could modify the column width’s and appearance if needed for the Excel output.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement