Skip to content
Advertisement

Extracting XML Attributes

I have an XML file with several thousand records in it in the form of:

JavaScript

How can I convert this into a CSV or tab-delimited file? I know I can hard-code it in Python using re.compile() statements, but there has to be something easier, and more portable among diff XML file layouts.

I’ve found a couple threads here about attribs, (Beautifulsoup unable to extract data using attrs=class, Extracting an attribute value with beautifulsoup) and they have gotten me almost there with:

JavaScript

What’s the next (last?) step?

Advertisement

Answer

I (also) wouldn’t use BeautifulSoup for this, and though I like lxml, that’s an extra install, and if you don’t want to bother, this is simple enough to do with the standard lib ElementTree module.

Something like:

JavaScript

will, from python-3, produce :

JavaScript

Note that with Python-2.7, the order of the attributes will be different. If you want them to output in a different specific order, you should sort or order the list “keys” .

The assert is checking that all rows have the same attributes. If you actually have missing or different attributes in the elements, then you’ll have to remove that and add some code to deal with the differences and supply defaults for missing values. ( In your sample data, you have a null value ( m_name=”” ), rather than a missing value. You might want to check that this case is handled OK by the consumer of this output, or else add some more special handling for this case.

Advertisement