I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml:
<project> <samples> <sample>show my balance</sample> <sample>show me the <subsample value='USD'>money</subsample>today</sample> </samples> </project>
Here’s my python code:
from lxml import etree somenode="<project><samples><sample>show my balance</sample><sample>show me the <subsample value='USD'>money</subsample>today</sample></samples></project>" somenode_etree = etree.fromstring(somenode) for x in somenode_etree.iterfind(".//sample"): print (etree.tostring(x))
I get the output:
b'<sample>show my balance</sample><sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>' b'<sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>'
when I expected:
show my balance show me the <subsample value="USD">money</subsample>today
What am I doing wrong?
Advertisement
Answer
This XPath will get text and elements as expected
result = somenode_etree.xpath(".//sample/text() | .//sample/*") result ['show my balance', 'show me the ', <Element subsample at 0x7f0516cfa288>, 'today']
Printing found nodes as OP requested
for x in somenode_etree.xpath(".//sample/text() | .//sample/*[node()]"): if type(x) == etree._Element: print(etree.tostring(x, method='xml',with_tail=False).decode('UTF-8')) else: print(x)
Result
show my balance show me the <subsample value="USD">money</subsample> today
with_tail
argument prevents tail text to be appended to element.
Or
>>> for x in somenode_etree.xpath(".//sample/text() | .//sample/*"): ... if type(x) == etree._Element: ... print(x.text) ... else: ... print(x) ... show my balance show me the money today