my XML file is
<list> <ProfileDefinition> <string name="ID">nCGhwaZNpy6</string> <string name="name">02.11.2013 Scott Mobile</string> <decimal name="AccountID">10954</decimal> <decimal name="TimeZoneID">-600</decimal> </ProfileDefinition><ProfileDefinition> <string name="ID">9JsG57bRUu6</string> <string name="name">Huggies US-EN & CA-EN Test Town Responsive - Prod</string> <decimal name="AccountID">10954</decimal> <decimal name="TimeZoneID">-600</decimal> </ProfileDefinition><ProfileDefinition> <string name="ID">I3CJQ4gDkK6</string> <string name="name">Huggies US-EN Brand Desktop - Prod</string> <decimal name="AccountID">10954</decimal> <decimal name="TimeZoneID">-600</decimal></ProfileDefinition>
my code is
import urllib2 theurl = 'https://ws.webtrends.com/v2/ReportService/profiles/?format=xml' pagehandle = urllib2.urlopen(theurl) ########################################################################## from xml.dom.minidom import parseString file = pagehandle data = file.read() file.close() dom = parseString(data) xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml() xmlData=xmlTag.replace('<string name="ID">','').replace('</string>','') print xmlTag print xmlData
I want to get value of element with tagname ‘string name=”ID”‘
but the error comes
Traceback (most recent call last): File "C:UsersVaibhavDesktopWebtrendstest.py", line 43, in <module> xmlTag = dom.getElementsByTagName('string name="ID"')[0].toxml() IndexError: list index out of range
if i replace
dom.getElementsByTagName('string name="ID"')[0].toxml()
to
dom.getElementsByTagName('string')[0].toxml()
the output comes
“nCGhwaZNpy6”
since its the first element of that list but second element is
“02.11.2013 Scott Mobile”
which also get saved in list which i don’t want
however there are two string tag with name=”ID” and name=”name” how to access the string tag with name=”ID” only
Advertisement
Answer
string name="ID"
is not tag name. Only string
is tag name.
You have to compare name attribute value for each string
tag.
.... dom = parseString(data) for s in dom.getElementsByTagName('string'): if s.getAttribute('name') == 'ID': print s.childNodes[0].data
I recommed you to use lxml or BeautifulSoup.
Following is equivalent code using lxml.
import lxml.html dom = lxml.html.fromstring(data) for s in dom.cssselect('string[name=ID]'): print s.text