Skip to content

Tag: lxml

Python lxml – get index of tag’s text

I have an xml-file with a format similar to docx, i.e.: I need to get an index of BIG_TEXT in source xml, like: I can start a new search from position of current index + len(text), but is there another way? Element may have one character, w for example. It will find index of w, but not index of tag

How to properly escape single and double quotes

I have a lxml etree HTMLParser object that I’m trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the text of the tag has either single-quotes(‘) or double-quotes(“) and I’ve exhausted all my options. Here’s a sam…

selecting attribute values from lxml

I want to use an xpath expression to get the value of an attribute. I expected the following to work but this gives an error : Am I wrong to expect this to work? Answer find and findall only implement a subset of XPath. Their presence is meant to provide compatibility with other ElementTree implementations (l…

Equivalent to InnerHTML when using lxml.html to parse HTML

I’m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed. I would like to know what the most sensible way in the library is to do the equivalent of Javascript’s InnerHtml – that is, to r…