using “getElementsByTagName” to get tag in python

my XML file is

<list>
  <ProfileDefinition>
    <string name="ID">nCGhwaZNpy6</string>
    <string name="name">02.11.2013 Scott Mobile</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal>
  </ProfileDefinition><ProfileDefinition>
    <string name="ID">9JsG57bRUu6</string>
    <string name="name">Huggies US-EN & CA-EN Test Town Responsive - Prod</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal>
  </ProfileDefinition><ProfileDefinition>
    <string name="ID">I3CJQ4gDkK6</string>
    <string name="name">Huggies US-EN Brand Desktop - Prod</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal></ProfileDefinition>

JavaScript
​x
 
<list>
  <ProfileDefinition>
    <string name="ID">nCGhwaZNpy6</string>
    <string name="name">02.11.2013 Scott Mobile</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal>
  </ProfileDefinition><ProfileDefinition>
    <string name="ID">9JsG57bRUu6</string>
    <string name="name">Huggies US-EN & CA-EN Test Town Responsive - Prod</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal>
  </ProfileDefinition><ProfileDefinition>
    <string name="ID">I3CJQ4gDkK6</string>
    <string name="name">Huggies US-EN Brand Desktop - Prod</string>
    <decimal name="AccountID">10954</decimal>
    <decimal name="TimeZoneID">-600</decimal></ProfileDefinition>
​

my code is

import urllib2

theurl = 'https://ws.webtrends.com/v2/ReportService/profiles/?format=xml'




pagehandle = urllib2.urlopen(theurl)



##########################################################################

from xml.dom.minidom import parseString

file = pagehandle


data = file.read()

file.close()

dom = parseString(data)

xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml()

xmlData=xmlTag.replace('<string name="ID">','').replace('</string>','')

print xmlTag

print xmlData

JavaScript
 
import urllib2
​
theurl = 'https://ws.webtrends.com/v2/ReportService/profiles/?format=xml'
​
​
​
​
pagehandle = urllib2.urlopen(theurl)
​
​
​
##########################################################################
​
from xml.dom.minidom import parseString
​
file = pagehandle
​
​
data = file.read()
​
file.close()
​
dom = parseString(data)
​
xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml()
​
xmlData=xmlTag.replace('<string name="ID">','').replace('</string>','')
​
print xmlTag
​
print xmlData
​

I want to get value of element with tagname ‘string name=”ID”‘

but the error comes

Traceback (most recent call last):
  File "C:UsersVaibhavDesktopWebtrendstest.py", line 43, in <module>
    xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml()
IndexError: list index out of range

JavaScript
 
Traceback (most recent call last):
  File "C:UsersVaibhavDesktopWebtrendstest.py", line 43, in <module>
    xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml()
IndexError: list index out of range
​

if i replace

dom.getElementsByTagName('string name="ID"')[0].toxml()

JavaScript
 
dom.getElementsByTagName('string name="ID"')[0].toxml()
​

dom.getElementsByTagName('string')[0].toxml()

JavaScript
 
dom.getElementsByTagName('string')[0].toxml()
​

the output comes

“nCGhwaZNpy6”

since its the first element of that list but second element is

“02.11.2013 Scott Mobile”

which also get saved in list which i don’t want

however there are two string tag with name=”ID” and name=”name” how to access the string tag with name=”ID” only

Answer

string name="ID" is not tag name. Only string is tag name.

You have to compare name attribute value for each string tag.

....
dom = parseString(data)
for s in dom.getElementsByTagName('string'):
    if s.getAttribute('name') == 'ID':
        print s.childNodes[0].data

JavaScript
 
....
dom = parseString(data)
for s in dom.getElementsByTagName('string'):
    if s.getAttribute('name') == 'ID':
        print s.childNodes[0].data
​

I recommed you to use lxml or BeautifulSoup.

Following is equivalent code using lxml.

import lxml.html
dom = lxml.html.fromstring(data)
for s in dom.cssselect('string[name=ID]'):
    print s.text

JavaScript
 
import lxml.html
dom = lxml.html.fromstring(data)
for s in dom.cssselect('string[name=ID]'):
    print s.text
​

Advertisement

Answer