Skip to content
Advertisement

How to recover http link from a tag

I am trying to recover web links from an RSS page. I am using Python3, requests,and BeautifulSoup4, on a Windows 10 system. My code is as follows:

JavaScript

This prints out as follows:

JavaScript

Individual items in Articles are of the following form:

JavaScript

The problem is with

JavaScript

as it is not captured in the appropriate form i.e.

JavaScript

When I open the same link (rSS above) in my browser (Firefox), the link tags are being shown correctly:

JavaScript

I am guessing the problem lies with using the html.parser for an xml page. If I need to use some xml parser, could you guide me which one to use on Python3. The code would be running on a raspberry pi, but I am developing it on Windows10.

Thanks in advance for a solution!

Advertisement

Answer

Since <link></link> tag is converted into a <link/>, You need to use .next_sibling to get the link you need. Code will look something like this:

JavaScript

Also, if you want to get just the Title and Pub without tags, use .text.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement