Skip to content
Advertisement

webscraping an image with highlighted text

I am doing web scraping on this URL which is a newspaper image with highlighted words. My purpose is to retrieve all those highlighted words in red. Inspecting the page gives the class: image-overlay hit-rect ng-star-inserted in which attribute title must be extract:

enter image description here Using the following code snippet with BeautifulSoup:

JavaScript

However, I get [] as a result!

My expected result is a list with length of 17 in this specific example, containing all the highlighted words in this page, e.g., the ones identified with title attribute in inspect as follows:

JavaScript

Is BeautifulSoup a correct tool to extract information when dealing with dynamic content?

Cheers,

Advertisement

Answer

The data you’re looking for is loaded from external URL via JavaScript. To get the data you can use following example:

JavaScript

Prints:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement