I am trying to extract the following values from an xml file: 
NAME, Mode,LEVELS,Group,Type and after I want to make data.frame. The problem I having so far is that I cannot get <Name>ALICE</Name> variables and output data.frame format is different than I need.
Here is the some post that I used when I built my read_xml function
- https://www.geeksforgeeks.org/xml-parsing-python/
- Extracting text from XML using python
- How do I parse XML in Python?
here is the example xml file format
<?xml version="1.0"?>
<Body>
    <DocType>List</DocType>
    <DocVersion>1</DocVersion>
    <LIST>
            <Name>ALICE</Name>
            <Variable>
                <Mode>Hole</Mode>
                <LEVELS>1</LEVELS>
                <Group>11</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
            <Variable>
                <Mode>BEWEL</Mode>
                <LEVELS>2</LEVELS>
                <Group>22</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
            <Name>WONDERLAND</Name>
            <Variable>
                <Mode>Mole</Mode>
                <LEVELS>1</LEVELS>
                <Group>11</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
            <Variable>
                <Mode>Matrix</Mode>
                <LEVELS>6</LEVELS>
                <Group>66</Group>
                <Type>0</Type>
                <Paint />
            </Variable>
    </LIST>
</Body>
I built the following function;
xml_file = r"C:xml.xml"
def read_xml(xml_file):
   etree = ET.parse(xml_file)
   root = etree.getroot()
   items = []
   for item in root.findall('./LIST/'):
      values  = {}
      for it in item:
         #print(it)
        values[it.tag] = it.text
      items.append(values)
   columns = ['Name','Mode', 'LEVELS','Group','Type']
   df = pd.DataFrame(items, columns = columns)
   return df
    print(read_xml(xml_file))
giving me this output
Name Mode LEVELS Group Type 0 NaN NaN NaN NaN NaN 1 NaN Hole 1 11 0 2 NaN BEWEL 2 22 0 3 NaN NaN NaN NaN NaN 4 NaN Mole 1 11 0 5 NaN Matrix 6 66 0
the expected output
NAME MODE LEVELS Group Type 1 ALICE Hole 1 11 0 2 ALICE BEWEL 11 22 0 3 WONDERLAND MOLE 1 11 0 4 WONDERLAND MATRIX 6 66 0
How can I get the expected output!!
Thx!
Advertisement
Answer
If tag is Name in loop then set to variable and last add to dictionary values:
import xml.etree.cElementTree as ET
def read_xml(xml_file):
   etree = ET.parse(xml_file)
   root = etree.getroot()
   items = []
   for item in root.findall('LIST/'):
       values = {}
       if (item.tag == 'Name'):
           name = item.text
           continue
       for it in item:
           values[it.tag] = it.text
       values['Name'] = name
       items.append(values)
   columns = ['Name','Mode', 'LEVELS','Group','Type']
   df = pd.DataFrame(items, columns = columns)
   return df
xml_file = 'xml.xml'
print(read_xml(xml_file))
         Name    Mode LEVELS Group Type
0       ALICE    Hole      1    11    0
1       ALICE   BEWEL      2    22    0
2  WONDERLAND    Mole      1    11    0
3  WONDERLAND  Matrix      6    66    0
