Skip to content
Advertisement

Listing path and data from a xml file to store in a dataframe

Here is a xml file :

JavaScript

I want to save in a dataframe : 1) the path and 2) the text of the elements corresponding to the path. To do this dataframe, I am thinking to do a dictionary to store both. So first I would like to get a dictionary like that (where I have the values associated to the corresonding path).

JavaScript

Like that I just have to use the function df=pd.DataFrame() to create a dataframe that I can export in a excel sheet. I have already a part for the listing of the path, however I can not get text from those paths. I do not get how the lxml library works. I tried the function .text() and text_content() but I have an error.

Here is my code :

JavaScript

Thank you for the help !

Advertisement

Answer

Here is an example of traversing XML tree. For this purpose recursive function will be needed. Fortunately lxml provides all functionality for this.

JavaScript

Output:

JavaScript

Please note, the dictionary d contains lists as values. That’s because elements can be repeated in XML and otherwise last value will override previous one. If that’s not the case for your particular XML, use regular dict instead of defaultdict d = {} and use assignment instead of appending d[tree.getelementpath(el)] = el.text.

The same when reading from file:

JavaScript
Advertisement