I’m trying to parse a complex XML and xpath isn’t behaving like I thought it would. Here’s my sample xml:
JavaScript
x
7
1
<project>
2
<samples>
3
<sample>show my balance</sample>
4
<sample>show me the <subsample value='USD'>money</subsample>today</sample>
5
</samples>
6
</project>
7
Here’s my python code:
JavaScript
1
9
1
from lxml import etree
2
3
somenode="<project><samples><sample>show my balance</sample><sample>show me the <subsample value='USD'>money</subsample>today</sample></samples></project>"
4
5
somenode_etree = etree.fromstring(somenode)
6
7
for x in somenode_etree.iterfind(".//sample"):
8
print (etree.tostring(x))
9
I get the output:
JavaScript
1
3
1
b'<sample>show my balance</sample><sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>'
2
b'<sample>show me the <subsample value="USD">money</subsample>today</sample></samples></project>'
3
when I expected:
JavaScript
1
3
1
show my balance
2
show me the <subsample value="USD">money</subsample>today
3
What am I doing wrong?
Advertisement
Answer
This XPath will get text and elements as expected
JavaScript
1
4
1
result = somenode_etree.xpath(".//sample/text() | .//sample/*")
2
result
3
['show my balance', 'show me the ', <Element subsample at 0x7f0516cfa288>, 'today']
4
Printing found nodes as OP requested
JavaScript
1
6
1
for x in somenode_etree.xpath(".//sample/text() | .//sample/*[node()]"):
2
if type(x) == etree._Element:
3
print(etree.tostring(x, method='xml',with_tail=False).decode('UTF-8'))
4
else:
5
print(x)
6
Result
JavaScript
1
5
1
show my balance
2
show me the
3
<subsample value="USD">money</subsample>
4
today
5
with_tail
argument prevents tail text to be appended to element.
Or
JavaScript
1
11
11
1
>>> for x in somenode_etree.xpath(".//sample/text() | .//sample/*"):
2
if type(x) == etree._Element:
3
print(x.text)
4
else:
5
print(x)
6
7
show my balance
8
show me the
9
money
10
today
11