How do I save an Element Tree to a list based on an attribute in a child tag using Python’s LXML module?

Tags: , , , ,



I have an xml document that I have to parse. I’m using python 3.8 and the lxml module.

The XML contains Titles which has other child element tags like the xml below. I need to only find the “change” events and keep that “Title” in a list. I would like to save all of the tags of that title, so I can extract the data that I need.

Here is my XML example:

'''
<root>
    <Title ref="111111">
        <Events>
            <Event type="change"/>
        </Events>
        <tag1>John</tag1>
        <tag2>A.</tag2>
        <tag3>Smith</tag3>
    </Title>
        <Title ref="222222">
        <Events>
            <Event type="cancel"/>
        </Events>
        <tag1>Bob</tag1>
        <tag2>B.</tag2>
        <tag3>Hope</tag3>
    </Title>
        <Title ref="333333">
        <Events>
            <Event type="change"/>
        </Events>
        <tag1>Julie</tag1>
        <tag2>A.</tag2>
        <tag3>Moore</tag3>
    </Title>
        <Title ref="444444">
        <Events>
            <Event type="cancel"/>
        </Events>
        <tag1>First</tag1>
        <tag2>M</tag2>
        <tag3>Last</tag3>
    </Title>
</root>
'''

I’ve tried using the findall() function, but it only seems to keep the “Event” tag not the “Title” tag and all of its children. I get the same results when using xpath too.

Answer

If txt is your XML snippet from the question, then you can do this to extract <Title> tags which contain <Event type="change">:

from lxml import etree, html

root = etree.fromstring(txt)

for title in root.xpath('.//Title[.//Event[@type="change"]]'):
    print(html.tostring(title).decode('utf-8'))
    print('-' * 80)

Prints:

<Title ref="111111">
        <Events>
            <Event type="change"></Event>
        </Events>
        <tag1>John</tag1>
        <tag2>A.</tag2>
        <tag3>Smith</tag3>
    </Title>
        
--------------------------------------------------------------------------------
<Title ref="333333">
        <Events>
            <Event type="change"></Event>
        </Events>
        <tag1>Julie</tag1>
        <tag2>A.</tag2>
        <tag3>Moore</tag3>
    </Title>
        
--------------------------------------------------------------------------------


Source: stackoverflow