Skip to content
Advertisement

Beautiful Soup findAll() finds half of them

I’m trying to scrape information on the price of offices in France and I successfully developed the code to scrape all the information I needed.

Though, I quickly noticed that something was wrong with the number of outputs and more precisely that my algorithm was returning only half of the occurences present on each page of the website.

Here’s how the basic code looks like:

JavaScript

Like suggested here Beautiful Soup findAll doesn’t find them all I’m already using the html.parser and I’ve tried with others but in vain.

I still don’t understand why it’s picking up only the first half of the page whereas the html code clearly comprises all of them.

Advertisement

Answer

The data you see on the page is stored as Json. You can use json module to extract it.

For example:

JavaScript

Prints:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement