Skip to content
Advertisement

How to get all external links found on a page using BeautifulSoup?

I’m reading the book, Web Scraping with Python which has the following function to retrieve external links found on a page:

JavaScript

The problem is that it does not work the way it should. When i run it using the URL: http://www.oreilly.com, it returns this:

JavaScript

Output:

JavaScript

Question:

Why are the first 16-17 entries considered “external links”? They belong to the same domain of http://www.oreilly.com.

Advertisement

Answer

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement