I have an xml file I need to open and make some changes to, one of those changes is to remove the namespace and prefix and then save to another file. Here is the xml:
<?xml version='1.0' encoding='UTF-8'?> <package xmlns="http://apple.com/itunes/importer"> <provider>some data</provider> <language>en-GB</language> </package>
I can make the other changes I need, but can’t find out how to remove the namespace and prefix. This is the reusklt xml I need:
<?xml version='1.0' encoding='UTF-8'?> <package> <provider>some data</provider> <language>en-GB</language> </package>
And here is my script which will open and parse the xml and save it:
metadata = '/Users/user1/Desktop/Python/metadata.xml' from lxml import etree parser = etree.XMLParser(remove_blank_text=True) open(metadata) tree = etree.parse(metadata, parser) root = tree.getroot() tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')
So how would I add code in my script which will remove the namespace and prefix?
Advertisement
Answer
Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.
from lxml import etree, objectify metadata = '/Users/user1/Desktop/Python/metadata.xml' parser = etree.XMLParser(remove_blank_text=True) tree = etree.parse(metadata, parser) root = tree.getroot() #### for elem in root.getiterator(): if not hasattr(elem.tag, 'find'): continue # guard for Comment tags i = elem.tag.find('}') if i >= 0: elem.tag = elem.tag[i+1:] objectify.deannotate(root, cleanup_namespaces=True) #### tree.write('/Users/user1/Desktop/Python/done.xml', pretty_print=True, xml_declaration=True, encoding='UTF-8')
Note: Some tags like Comment
return a function when accessing tag
attribute. added a guard for that.