Skip to content
Advertisement

Webscraping Dynamic Website to Pull Recent News Article URLs

I am attempting to pull investing news articles from a dynamic website using Python. I have tried a couple of tutorials that worked for static websites, but I have had issues pulling the URL to a specific article. The code I am working with is as follows:

JavaScript

Which gets me a list of the links within the page in an array format:

JavaScript

But I cannot find the links to the articles themselves. When I inspect the source code, this is what I see:

JavaScript

As someone relatively new to HTML, that h2 section towards the end leads me to believe that the site is dynamic, which is where I am stuck. Any help would be appreciated. My ideal output for this question is to get the title of the article, the source (in this case “Institutional Investor”), a preview of the article (the first couple of lines or so, and the URL for the article into a dataframe that can be sent to me each morning to save time I would otherwise spend manually pulling news. I have put together the rest of the project, outside of the news pull for sites such as Institutional Investor that are not included in an API I am using.

I am open to any and all new methods, if necessary or recommended. Thank you in advance!

Advertisement

Answer

Ya it is dynamic. You could use selenium to allow the page to first render, then pull out the html like you’d normally do with a static site. Or, its all there with their api (I think even the full article is in there too but I just pulled out what you asked for):

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement