I am trying to extract the following values from an xml file:
NAME, Mode,LEVELS,Group,Type
and after I want to make data.frame
. The problem I having so far is that I cannot get <Name>ALICE</Name>
variables and output data.frame format is different than I need.
Here is the some post that I used when I built my read_xml
function
- https://www.geeksforgeeks.org/xml-parsing-python/
- Extracting text from XML using python
- How do I parse XML in Python?
here is the example xml
file format
<?xml version="1.0"?> <Body> <DocType>List</DocType> <DocVersion>1</DocVersion> <LIST> <Name>ALICE</Name> <Variable> <Mode>Hole</Mode> <LEVELS>1</LEVELS> <Group>11</Group> <Type>0</Type> <Paint /> </Variable> <Variable> <Mode>BEWEL</Mode> <LEVELS>2</LEVELS> <Group>22</Group> <Type>0</Type> <Paint /> </Variable> <Name>WONDERLAND</Name> <Variable> <Mode>Mole</Mode> <LEVELS>1</LEVELS> <Group>11</Group> <Type>0</Type> <Paint /> </Variable> <Variable> <Mode>Matrix</Mode> <LEVELS>6</LEVELS> <Group>66</Group> <Type>0</Type> <Paint /> </Variable> </LIST> </Body>
I built the following function;
xml_file = r"C:xml.xml" def read_xml(xml_file): etree = ET.parse(xml_file) root = etree.getroot() items = [] for item in root.findall('./LIST/'): values = {} for it in item: #print(it) values[it.tag] = it.text items.append(values) columns = ['Name','Mode', 'LEVELS','Group','Type'] df = pd.DataFrame(items, columns = columns) return df print(read_xml(xml_file))
giving me this output
Name Mode LEVELS Group Type 0 NaN NaN NaN NaN NaN 1 NaN Hole 1 11 0 2 NaN BEWEL 2 22 0 3 NaN NaN NaN NaN NaN 4 NaN Mole 1 11 0 5 NaN Matrix 6 66 0
the expected output
NAME MODE LEVELS Group Type 1 ALICE Hole 1 11 0 2 ALICE BEWEL 11 22 0 3 WONDERLAND MOLE 1 11 0 4 WONDERLAND MATRIX 6 66 0
How can I get the expected output!!
Thx!
Advertisement
Answer
If tag is Name
in loop then set to variable and last add to dictionary
values:
import xml.etree.cElementTree as ET def read_xml(xml_file): etree = ET.parse(xml_file) root = etree.getroot() items = [] for item in root.findall('LIST/'): values = {} if (item.tag == 'Name'): name = item.text continue for it in item: values[it.tag] = it.text values['Name'] = name items.append(values) columns = ['Name','Mode', 'LEVELS','Group','Type'] df = pd.DataFrame(items, columns = columns) return df xml_file = 'xml.xml' print(read_xml(xml_file)) Name Mode LEVELS Group Type 0 ALICE Hole 1 11 0 1 ALICE BEWEL 2 22 0 2 WONDERLAND Mole 1 11 0 3 WONDERLAND Matrix 6 66 0