I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:
JavaScript
x
6
1
<?xml version='1.0' encoding='UTF-8'?>
2
<package xmlns="http://apple.com/itunes/importer">
3
<provider>some data</provider>
4
<language>en-GB</language>
5
</package>
6
I can make the other changes I need, but can’t find out how to remove the namespace and prefix. This is the reusklt xml I need:
JavaScript
1
6
1
<?xml version='1.0' encoding='UTF-8'?>
2
<package>
3
<provider>some data</provider>
4
<language>en-GB</language>
5
</package>
6
And here is my script which will open and parse the xml and save it:
JavaScript
1
8
1
metadata = '/Users/user1/Desktop/Python/metadata.xml'
2
from lxml import etree
3
parser = etree.XMLParser(remove_blank_text=True)
4
open(metadata)
5
tree = etree.parse(metadata, parser)
6
root = tree.getroot()
7
tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')
8
So how would I add code in my script which will remove the namespace and prefix?
Advertisement
Answer
Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.
JavaScript
1
19
19
1
from lxml import etree, objectify
2
3
metadata = '/Users/user1/Desktop/Python/metadata.xml'
4
parser = etree.XMLParser(remove_blank_text=True)
5
tree = etree.parse(metadata, parser)
6
root = tree.getroot()
7
8
####
9
for elem in root.getiterator():
10
if not hasattr(elem.tag, 'find'): continue # guard for Comment tags
11
i = elem.tag.find('}')
12
if i >= 0:
13
elem.tag = elem.tag[i+1:]
14
objectify.deannotate(root, cleanup_namespaces=True)
15
####
16
17
tree.write('/Users/user1/Desktop/Python/done.xml',
18
pretty_print=True, xml_declaration=True, encoding='UTF-8')
19
Note: Some tags like Comment
return a function when accessing tag
attribute. added a guard for that.