Extracting XML Attributes

Question

I have an XML file with several thousand records in it in the form of: How can I convert this into a CSV or tab-delimited file? I know I can hard-code it in Python using re.compile() statements, but there has to be something easier, and more portable among diff XML file layouts. I've found a couple threads here about attribs,

Accepted Answer

I (also) wouldn&#8217;t use BeautifulSoup for this, and though I like lxml, that&#8217;s an extra install, and if you don&#8217;t want to bother, this is simple enough to do with the standard lib ElementTree module.Something like:import xml.etree.ElementTree as ETimport systree=ET.parse( 'test.xml' )root=tree.getroot()rs=root.getchildren()keys = rs[0].attrib.keys()for a in keys: sys.stdout.write(a); sys.stdout.write('t')sys.stdout.write('n')for r in rs:    assert keys == r.attrib.keys()    for k in keys: sys.stdout.write( r.attrib[k]); sys.stdout.write('t')    sys.stdout.write('n')will, from python-3, produce :zip m_name  current city    cust_ID l_name  f_name  00010   OfThe   1   Fairbanks   B123456@Y1996   Jungle  George  03010   P   1   Yellow River    Q975697@Z2000   Freely  I   07008       0   Fallen Arches   M7803@J2323 Jungle  Jim Note that with Python-2.7, the order of the attributes will be different.If you want them to output in a different specific order, you should sort ororder the list &#8220;keys&#8221; . The assert is checking that all rows have the same attributes. If you actually have missing or different attributes in the elements,then you&#8217;ll have to remove that and add some code to deal with the differencesand supply defaults for missing values. ( In your sample data, you have a null value  ( m_name=&#8221;&#8221; ), rather than a missing value. You might want to checkthat this case is handled OK by the consumer of this output, or else add somemore special handling for this case.

Advertisement

Answer