Assume I have the following two example pieces of HTML:
<p>This is some text: <b>ABCD12345</b></p>
<p><b>Name:</b> John Doe</p>
I’m able to separate the <b>
and non-<b>
parts, but I (also) want to know how to determine whether the <b>
part is at the start or at the end of the text (in other words; whether it has text before or after). How to do that?
I’m using Python (with lxml) if it matters (I don’t think it really does).
Advertisement
Answer
This XPath,
not(/p/b/following-sibling::text())
will return true iff there are no text nodes following b
in p
, as in your first case:
<p>This is some text: <b>ABCD12345</b></p>
This XPath,
not(/p/b/preceding-sibling::text())
will return true iff there are no text nodes preceding b
in p
, as in your second case:
<p><b>Name:</b> John Doe</p>
If it’s not the absence but the presence of text before/after the b
element that’s of interest, you can change the not()
to boolean()
in those XPath expressions.