I am trying to extract the following values from an xml file:
NAME, Mode,LEVELS,Group,Type
and after I want to make data.frame
. The problem I having so far is that I cannot get <Name>ALICE</Name>
variables and output data.frame format is different than I need.
Here is the some post that I used when I built my read_xml
function
- https://www.geeksforgeeks.org/xml-parsing-python/
- Extracting text from XML using python
- How do I parse XML in Python?
here is the example xml
file format
JavaScript
x
39
39
1
<?xml version="1.0"?>
2
<Body>
3
<DocType>List</DocType>
4
<DocVersion>1</DocVersion>
5
<LIST>
6
<Name>ALICE</Name>
7
<Variable>
8
<Mode>Hole</Mode>
9
<LEVELS>1</LEVELS>
10
<Group>11</Group>
11
<Type>0</Type>
12
<Paint />
13
</Variable>
14
<Variable>
15
<Mode>BEWEL</Mode>
16
<LEVELS>2</LEVELS>
17
<Group>22</Group>
18
<Type>0</Type>
19
<Paint />
20
</Variable>
21
22
<Name>WONDERLAND</Name>
23
<Variable>
24
<Mode>Mole</Mode>
25
<LEVELS>1</LEVELS>
26
<Group>11</Group>
27
<Type>0</Type>
28
<Paint />
29
</Variable>
30
<Variable>
31
<Mode>Matrix</Mode>
32
<LEVELS>6</LEVELS>
33
<Group>66</Group>
34
<Type>0</Type>
35
<Paint />
36
</Variable>
37
</LIST>
38
</Body>
39
I built the following function;
JavaScript
1
22
22
1
xml_file = r"C:xml.xml"
2
3
def read_xml(xml_file):
4
5
etree = ET.parse(xml_file)
6
root = etree.getroot()
7
items = []
8
for item in root.findall('./LIST/'):
9
values = {}
10
for it in item:
11
#print(it)
12
values[it.tag] = it.text
13
items.append(values)
14
15
columns = ['Name','Mode', 'LEVELS','Group','Type']
16
df = pd.DataFrame(items, columns = columns)
17
18
return df
19
20
21
print(read_xml(xml_file))
22
giving me this output
JavaScript
1
8
1
Name Mode LEVELS Group Type
2
0 NaN NaN NaN NaN NaN
3
1 NaN Hole 1 11 0
4
2 NaN BEWEL 2 22 0
5
3 NaN NaN NaN NaN NaN
6
4 NaN Mole 1 11 0
7
5 NaN Matrix 6 66 0
8
the expected output
JavaScript
1
6
1
NAME MODE LEVELS Group Type
2
1 ALICE Hole 1 11 0
3
2 ALICE BEWEL 11 22 0
4
3 WONDERLAND MOLE 1 11 0
5
4 WONDERLAND MATRIX 6 66 0
6
How can I get the expected output!!
Thx!
Advertisement
Answer
If tag is Name
in loop then set to variable and last add to dictionary
values:
JavaScript
1
29
29
1
import xml.etree.cElementTree as ET
2
def read_xml(xml_file):
3
4
etree = ET.parse(xml_file)
5
root = etree.getroot()
6
items = []
7
for item in root.findall('LIST/'):
8
values = {}
9
if (item.tag == 'Name'):
10
name = item.text
11
continue
12
for it in item:
13
values[it.tag] = it.text
14
values['Name'] = name
15
items.append(values)
16
17
columns = ['Name','Mode', 'LEVELS','Group','Type']
18
df = pd.DataFrame(items, columns = columns)
19
20
return df
21
22
xml_file = 'xml.xml'
23
print(read_xml(xml_file))
24
Name Mode LEVELS Group Type
25
0 ALICE Hole 1 11 0
26
1 ALICE BEWEL 2 22 0
27
2 WONDERLAND Mole 1 11 0
28
3 WONDERLAND Matrix 6 66 0
29