Skip to content
Advertisement

Extract XML Attribute-Python

I am new to Python & trying to extract XML attributes. Below is the code that I tried.

import xml.etree.ElementTree as ET

a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
         <countryCode>RO</countryCode>
         <vatNumber>43097749</vatNumber>
         <requestDate>2022-07-12+02:00</requestDate>
         <valid>true</valid>
         <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
         <address>MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
      </checkVatResponse>
   </soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()

for cust in root.findall('Body/checkVatResponse'):
    name = cust.find('name').text
    print(name)

I wanted to extract ‘name’ and ‘address’ from XML. But when I run the above code nothing is printed. What is my mistake?

Regards, Mayank Pande

Advertisement

Answer

Namespaces dawg, namespaces! You can be damn sure that when Jay-Z rapped about having 99 problems, having to deal with XML with namespaces was definitely one of them!

See Parsing XML with Namespaces

For the body tag, its namespace is http://schemas.xmlsoap.org/soap/envelope/, checkVatResponse‘s is urn:ec.europa.eu:taxud:vies:services:checkVat:types, and both name and address‘s are urn:ec.europa.eu:taxud:vies:services:checkVat:types, which they inherit off their parent, checkVatResponse.

So, you can explicitly search for an element including its namespace, like so:

root.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body/{urn:ec.europa.eu:taxud:vies:services:checkVat:types}checkVatResponse')

Or you can ignore it with the wildcard character:

root.findall('{*}Body/{*}checkVatResponse')

Try this:

a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
      <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types">
         <countryCode>RO</countryCode>
         <vatNumber>43097749</vatNumber>
         <requestDate>2022-07-12+02:00</requestDate>
         <valid>true</valid>
         <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name>
         <address>MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1</address>
      </checkVatResponse>
   </soap:Body>
</soap:Envelope>'''
tree = ET.ElementTree(ET.fromstring(a))
root = tree.getroot()

for cust in root.findall('{*}Body/{*}checkVatResponse'):
    name = cust.find('{*}name').text
    print(name)
    address = cust.find('{*}address').text
    print(address)

Output:

ROHLIG SUUS LOGISTICS ROMANIA S.R.L.
MUNICIPIUL  BUCUREŞTI, SECTOR 1
BLD. ION MIHALACHE Nr. 15-17
Et. 1
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement