Equivalent to InnerHTML when using lxml.html to parse HTML

Question

I'm working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed. I would like to know what the most sensible way in the library is to do the equivalent of Javascript's InnerHtml - that is, to retrieve or set

Accepted Answer

You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:>>> from lxml import etree>>> from cStringIO import StringIO>>> t = etree.parse(StringIO("""...

A title

...

Some text

... """))>>> root = t.getroot()>>> for child in root.iterdescendants(),:... print etree.tostring(child)...

A title

Some text

This can be shorthanded as follows:print ''.join([etree.tostring(child) for child in root.iterdescendants()])

Advertisement

Answer