Skip to content
Advertisement

Python – Beautiful Soup – extract text between and

HTML content

I have a webpage to parse. The HTML code is in the figure.

I need to extract the price, which is simple text:

<div class="price">
"212,25 € "
<sup>HT</sup>

This is the only “price” class on the page. So I call the find() method:

soup = BeautifulSoup(get(url, headers=headers, params=params).content, 'lxml')
container = soup.find_all('div', class_="side-content") # Find a container
cost = container.find('div', {'class': 'price'}) # Find price class
cost_value = cost.next_sibling

The cost is None. I have tried .next_sibling function and .text functions. But as find() returns None, I have an exception. How can I fix it?

Advertisement

Answer

I have resolved it. The problem was in the JavaScript-generated data. So static parsing methods don’t work with it. I tried several solutions (including Selenium and an XHR script results capturing).

Finally, inside my parsed data I have found a static URL of a page that links to a separate web page, where this JavaScript code is executed and can be parsed by static methods.

The video Python Web Scraping Tutorial: scraping dynamic JavaScript/Ajax websites with Beautiful Soup explains a similar solution.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement