The XML:
JavaScript
x
42
42
1
<?xml version="1.0"?>
2
<pages>
3
<page>
4
<url>http://example.com/Labs</url>
5
<title>Labs</title>
6
<subpages>
7
<page>
8
<url>http://example.com/Labs/Email</url>
9
<title>Email</title>
10
<subpages>
11
<page/>
12
<url>http://example.com/Labs/Email/How_to</url>
13
<title>How-To</title>
14
</subpages>
15
</page>
16
<page>
17
<url>http://example.com/Labs/Social</url>
18
<title>Social</title>
19
</page>
20
</subpages>
21
</page>
22
<page>
23
<url>http://example.com/Tests</url>
24
<title>Tests</title>
25
<subpages>
26
<page>
27
<url>http://example.com/Tests/Email</url>
28
<title>Email</title>
29
<subpages>
30
<page/>
31
<url>http://example.com/Tests/Email/How_to</url>
32
<title>How-To</title>
33
</subpages>
34
</page>
35
<page>
36
<url>http://example.com/Tests/Social</url>
37
<title>Social</title>
38
</page>
39
</subpages>
40
</page>
41
</pages>
42
The code:
JavaScript
1
10
10
1
// rexml is the XML string read from a URL
2
from xml.etree import ElementTree as ET
3
tree = ET.fromstring(rexml)
4
for node in tree.iter('page'):
5
for url in node.iterfind('url'):
6
print url.text
7
for title in node.iterfind('title'):
8
print title.text.encode("utf-8")
9
print '-' * 30
10
The output:
JavaScript
1
13
13
1
http://example.com/article1
2
Article1
3
------------------------------
4
http://example.com/article1/subarticle1
5
SubArticle1
6
------------------------------
7
http://example.com/article2
8
Article2
9
------------------------------
10
http://example.com/article3
11
Article3
12
------------------------------
13
The Xml represents a tree like structure of a sitemap.
I have been up and down the docs and Google all day and can’t figure it out hot to get the node depth of entries.
I used counting of the children container but that only works for the first parent and then it breaks as I can’t figure it out how to reset. But this is probably just a hackish idea.
The desired output:
JavaScript
1
17
17
1
0
2
http://example.com/article1
3
Article1
4
------------------------------
5
1
6
http://example.com/article1/subarticle1
7
SubArticle1
8
------------------------------
9
0
10
http://example.com/article2
11
Article2
12
------------------------------
13
0
14
http://example.com/article3
15
Article3
16
------------------------------
17
Advertisement
Answer
Used lxml.html
.
JavaScript
1
20
20
1
import lxml.html
2
3
rexml =
4
5
def depth(node):
6
d = 0
7
while node is not None:
8
d += 1
9
node = node.getparent()
10
return d
11
12
tree = lxml.html.fromstring(rexml)
13
for node in tree.iter('page'):
14
print depth(node)
15
for url in node.iterfind('url'):
16
print url.text
17
for title in node.iterfind('title'):
18
print title.text.encode("utf-8")
19
print '-' * 30
20