Skip to content
Advertisement

Equivalent to InnerHTML when using lxml.html to parse HTML

I’m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

I would like to know what the most sensible way in the library is to do the equivalent of Javascript’s InnerHtml – that is, to retrieve or set the complete contents of a tag.

JavaScript

InnerHtml is therefore:

JavaScript

I can do it using hacks (converting to string/regexes etc) but I’m assuming that there is a correct way to do this using the library which I am missing due to unfamiliarity. Thanks for any help.

EDIT: Thanks to pobk for showing me the way on this so quickly and effectively. For anyone trying the same, here is what I ended up with:

JavaScript

Note that the lxml.html parser will fix up the unclosed tag, so beware if this is a problem.

Advertisement

Answer

You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:

JavaScript

This can be shorthanded as follows:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement