Assume I have the following two example pieces of HTML:
<p>This is some text: <b>ABCD12345</b></p>
<p><b>Name:</b> John Doe</p>
I’m able to separate the <b> and non-<b> parts, but I (also) want to know how to determine whether the <b> part is at the start or at the end of the text (in other words; whether it has text before or after). How to do that?
I’m using Python (with lxml) if it matters (I don’t think it really does).
Advertisement
Answer
This XPath,
not(/p/b/following-sibling::text())
will return true iff there are no text nodes following b in p, as in your first case:
<p>This is some text: <b>ABCD12345</b></p>
This XPath,
not(/p/b/preceding-sibling::text())
will return true iff there are no text nodes preceding b in p, as in your second case:
<p><b>Name:</b> John Doe</p>
If it’s not the absence but the presence of text before/after the b element that’s of interest, you can change the not() to boolean() in those XPath expressions.