Edit: others have responded showing xslt as a better solution for the simple problem I have posted here. I have deleted my answer for now.
I’ve been through about a dozen StackOverflow posts trying to understand how to import an XML document that has namespaces, modify it, and then write it without changing the namespaces. I discovered a few things that weren’t clear or had conflicting information. Having finally got it to work I want to record what I learned hoping it helps someone else equally confused. I will put the question here and the answer in a response.
The question: given the sample XML data in the Python docs how do I navigate the tree without having to explicitly include the name-space URIs in the xpaths for findall and write it back out with the namespace prefixes preserved. The example code in the doc does not give the full solution.
Here is the XML data:
<?xml version="1.0"?> <actors xmlns:fictional="http://characters.example.com" xmlns="http://people.example.com"> <actor> <name>John Cleese</name> <fictional:character>Lancelot</fictional:character> <fictional:character>Archie Leach</fictional:character> </actor> <actor> <name>Eric Idle</name> <fictional:character>Sir Robin</fictional:character> <fictional:character>Gunther</fictional:character> <fictional:character>Commander Clement</fictional:character> </actor> </actors>
The desired output is just to add “Sir ” in front of the actors’ names like this:
<actors xmlns="http://people.example.com" xmlns:fictional="http://characters.example.com"> <actor> <name>Sir John Cleese</name> <fictional:character>Lancelot</fictional:character> <fictional:character>Archie Leach</fictional:character> </actor> <actor> <name>Sir Eric Idle</name> <fictional:character>Sir Robin</fictional:character> <fictional:character>Gunther</fictional:character> <fictional:character>Commander Clement</fictional:character> </actor> </actors>
Advertisement
Answer
When parsing XML with lxml, the original namespace prefixes are preserved. The following code does what you want. Note the use of {*}
as a namespace wildcard.
from lxml import etree tree = etree.parse("data1.xml") for name in tree.findall(".//{*}name"): name.text = "Sir " + name.text