Skip to content
Advertisement

XML parsing in python issue using elementTree

I need to parse a soap response and convert to a text file. I am trying to parse the values as detailed below. I am using ElementTree in python

I have the below xml response which I need to parse

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tmf854="tmf854.v1" xmlns:alu="alu.v1">
  <soapenv:Header>
    <tmf854:header>
      <tmf854:activityName>query</tmf854:activityName>
      <tmf854:msgName>queryResponse</tmf854:msgName>
      <tmf854:msgType>RESPONSE</tmf854:msgType>
      <tmf854:senderURI>https:/destinationhost:8443/tmf854/services</tmf854:senderURI>
      <tmf854:destinationURI>https://localhost:8443</tmf854:destinationURI>
      <tmf854:activityStatus>SUCCESS</tmf854:activityStatus>
      <tmf854:correlationId>1</tmf854:correlationId>
      <tmf854:communicationPattern>MultipleBatchResponse</tmf854:communicationPattern>
      <tmf854:communicationStyle>RPC</tmf854:communicationStyle>
      <tmf854:requestedBatchSize>1500</tmf854:requestedBatchSize>
      <tmf854:batchSequenceNumber>1</tmf854:batchSequenceNumber>
      <tmf854:batchSequenceEndOfReply>true</tmf854:batchSequenceEndOfReply>
      <tmf854:iteratorReferenceURI>http://9195985371165397084</tmf854:iteratorReferenceURI>
      <tmf854:timestamp>20220915222121.472+0530</tmf854:timestamp>
    </tmf854:header>
  </soapenv:Header>
  <soapenv:Body>
    <queryResponse xmlns="alu.v1">
      <queryObjectData>
        <queryObject>
          <name>
            <tmf854:mdNm>AMS</tmf854:mdNm>
            <tmf854:meNm>CHEERLAVANCHA_281743</tmf854:meNm>
            <tmf854:ptpNm>/type=NE/CHEERLAVANCHA_281743</tmf854:ptpNm>
          </name>
          <vendorExtensions>
            <package>
              <NameAndStringValue>
                <tmf854:name>hubSubtendedStatus</tmf854:name>
                <tmf854:value>NONE</tmf854:value>
              </NameAndStringValue>
              <NameAndStringValue>
                <tmf854:name>productAndRelease</tmf854:name>
                <tmf854:value>DF.6.1</tmf854:value>
              </NameAndStringValue>
              <NameAndStringValue>
                <tmf854:name>adminUserName</tmf854:name>
                <tmf854:value>isadmin</tmf854:value>
              </NameAndStringValue>
              <NameAndStringValue>
           </package>
          </vendorExtensions>
        </queryObject>
      </queryObjectData>
     </queryResponse>
 </soapenv:Body>
</soapenv:Envelope>

I need to use the below code snippet.

parser = ElementTree.parse("response.txt")
            root = parser.getroot()
            inventoryObjectData = root.find(".//{alu.v1}queryObjectData")
            for inventoryObject in inventoryObjectData:
                for device in inventoryObject:
                    if (device.tag.split("}")[1]) == "me":
                        vendorExtensionsNames = []
                        vendorExtensionsValues = []
                        if device.find(".//{tmf854.v1}mdNm") is not None:
                            mdnm = device.find(".//{tmf854.v1}mdNm").text
                        if device.find(".//{tmf854.v1}meNm") is not None:
                            menm = device.find(".//{tmf854.v1}meNm").text
                        if device.find(".//{tmf854.v1}userLabel") is not None:
                            userlabel = device.find(".//{tmf854.v1}userLabel").text
                        if device.find(".//{tmf854.v1}resourceState") is not None:
                            resourcestate = device.find(".//{tmf854.v1}resourceState").text
                        if device.find(".//{tmf854.v1}location") is not None:
                            location = device.find(".//{tmf854.v1}location").text
                        if device.find(".//{tmf854.v1}manufacturer") is not None:
                            manufacturer = device.find(".//{tmf854.v1}manufacturer").text
                        if device.find(".//{tmf854.v1}productName") is not None:
                            productname = device.find(".//{tmf854.v1}productName").text
                        if device.find(".//{tmf854.v1}version") is not None:
                            version = device.find(".//{tmf854.v1}version").text
                        vendorExtensions = device.find("vendorExtensions")
                        vendorExtensionsNamesElements = vendorExtensions.findall(".//{tmf854.v1}name")
                        for i in vendorExtensionsNamesElements:
                            vendorExtensionsNames.append(i.text.strip())
                         vendorExtensionsValuesElements = vendorExtensions.findall(".//{tmf854.v1}value")
                        for i in vendorExtensionsValuesElements:
                            vendorExtensionsValues.append(str(i.text or "").strip())

                        alu = ""
                        for i in vendorExtensions:
                            if i.attrib:
                                if alu == "":
                                    alu = i.attrib.get("{alu.v1}name")
                                else:
                                    alu = alu + "|" + i.attrib.get("{alu.v1}name")

The issue is that The below code is not able to find the ‘vendorExtensions”‘. Please help here.

vendorExtensions = device.find("vendorExtensions")

Have tried the below as well

vendorExtensions = device.find(".//queryObject/vendorExtensions")

Advertisement

Answer

Your document declares a default namespace of alu.v1:

<queryResponse xmlns="alu.v1">
...
</queryResponse>

Any attribute without an explicit namespace is in the alu.v1 namespace. You need to qualify your attribute name appropriately:

vendorExtensions = device.find("{alu.v1}vendorExtensions")

While the above is a real problem with your code that needs to be corrected (the Wikipedia entry on XML namespaces may be useful reading if you’re unfamiliar with how namespaces work), there are also some logic problems with your code.

Let’s drop the big list of conditionals from the code and see if it’s actually doing what we think it’s doing. If we run this:

from xml.etree import ElementTree

parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData")
for queryObject in queryObjectData:
    for device in queryObject:
        print(device.tag)

Then using your sample data (once it has been corrected to be syntactically valid), we see as output:

{alu.v1}name
{alu.v1}vendorExtensions

Your search for the {alu.v1}vendorExtensions element will never succeed before the thing on which you’re trying to search (the device variable) is the thing you’re trying to find.

Additionally, the conditional in your loop…

if (device.tag.split("}")[1]) == "me":

…will never match (there is no element in the entire document for which tag.split("}")[1] == "me" is True).

I’m not entirely clear what you’re trying to do, but here’s are some thoughts:

  • Given your example data, you probably don’t want that for device in inventoryObject: loop
  • We can drastically simplify your code by replacing that long block of conditionals with a list of attributes in which we are interested and then a for loop to extract them.
  • Rather than assigning a bunch of individual variables, we can build up a dictionary with the data from the queryObject

That might look like:

from xml.etree import ElementTree
import json

attributeNames = [
    "mdNm",
    "meNm",
    "userLabel",
    "resourceState",
    "location",
    "manufacturer",
    "productName",
    "version",
]

parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData")
for queryObject in queryObjectData:
    device = {}

    for name in attributeNames:
        if (value := queryObject.find(f".//{{tmf854.v1}}{name}")) is not None:
            device[name] = value.text

    vendorExtensions = queryObject.find("{alu.v1}vendorExtensions")
    extensionMap = {}

    for extension in vendorExtensions.findall(".//{alu.v1}NameAndStringValue"):
        extname = extension.find("{tmf854.v1}name").text
        extvalue = extension.find("{tmf854.v1}value").text
        extensionMap[extname] = extvalue

    device["vendorExtensions"] = extensionMap

    print(json.dumps(device, indent=2))

Given your example data, this outputs:

{
  "mdNm": "AMS",
  "meNm": "CHEERLAVANCHA_281743",
  "vendorExtensions": {
    "hubSubtendedStatus": "NONE",
    "productAndRelease": "DF.6.1",
    "adminUserName": "isadmin"
  }
}

An alternate approach, in which we just transform each queryObject into a dictionary, might look like this:

from xml.etree import ElementTree
import json


def localName(ele):
    return ele.tag.split("}")[1]


def etree_to_dict(t):
    if list(t):
        d = {}
        for child in t:
            if localName(child) == "NameAndStringValue":
                d.update(dict([[x.text.strip() for x in child]]))
            else:
                d.update({localName(child): etree_to_dict(child) for child in t})
        return d
    else:
        return t.text.strip()


parser = ElementTree.parse("data.xml")
root = parser.getroot()
queryObjectData = root.find(".//{alu.v1}queryObjectData") or []
for queryObject in queryObjectData:
    d = etree_to_dict(queryObject)
    print(json.dumps(d, indent=2))

This will output:

{
  "name": {
    "mdNm": "AMS",
    "meNm": "CHEERLAVANCHA_281743",
    "ptpNm": "/type=NE/CHEERLAVANCHA_281743"
  },
  "vendorExtensions": {
    "package": {
      "hubSubtendedStatus": "NONE",
      "productAndRelease": "DF.6.1",
      "adminUserName": "isadmin"
    }
  }
}

That may or may not be appropriate depending on the structure of your real data and exactly what you’re trying to accomplish.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement