I am new to Python & trying to extract XML attributes. Below is the code that I tried.
import xml.etree.ElementTree as ET a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types"> <countryCode>RO</countryCode> <vatNumber>43097749</vatNumber> <requestDate>2022-07-12+02:00</requestDate> <valid>true</valid> <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name> <address>MUNICIPIUL BUCUREŞTI, SECTOR 1 BLD. ION MIHALACHE Nr. 15-17 Et. 1</address> </checkVatResponse> </soap:Body> </soap:Envelope>''' tree = ET.ElementTree(ET.fromstring(a)) root = tree.getroot() for cust in root.findall('Body/checkVatResponse'): name = cust.find('name').text print(name)
I wanted to extract ‘name’ and ‘address’ from XML. But when I run the above code nothing is printed. What is my mistake?
Regards, Mayank Pande
Advertisement
Answer
Namespaces dawg, namespaces! You can be damn sure that when Jay-Z rapped about having 99 problems, having to deal with XML with namespaces was definitely one of them!
See Parsing XML with Namespaces
For the body
tag, its namespace is http://schemas.xmlsoap.org/soap/envelope/
, checkVatResponse
‘s is urn:ec.europa.eu:taxud:vies:services:checkVat:types
, and both name
and address
‘s are urn:ec.europa.eu:taxud:vies:services:checkVat:types
, which they inherit off their parent, checkVatResponse
.
So, you can explicitly search for an element including its namespace, like so:
root.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body/{urn:ec.europa.eu:taxud:vies:services:checkVat:types}checkVatResponse')
Or you can ignore it with the wildcard character:
root.findall('{*}Body/{*}checkVatResponse')
Try this:
a = '''<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <checkVatResponse xmlns="urn:ec.europa.eu:taxud:vies:services:checkVat:types"> <countryCode>RO</countryCode> <vatNumber>43097749</vatNumber> <requestDate>2022-07-12+02:00</requestDate> <valid>true</valid> <name>ROHLIG SUUS LOGISTICS ROMANIA S.R.L.</name> <address>MUNICIPIUL BUCUREŞTI, SECTOR 1 BLD. ION MIHALACHE Nr. 15-17 Et. 1</address> </checkVatResponse> </soap:Body> </soap:Envelope>''' tree = ET.ElementTree(ET.fromstring(a)) root = tree.getroot() for cust in root.findall('{*}Body/{*}checkVatResponse'): name = cust.find('{*}name').text print(name) address = cust.find('{*}address').text print(address)
Output:
ROHLIG SUUS LOGISTICS ROMANIA S.R.L. MUNICIPIUL BUCUREŞTI, SECTOR 1 BLD. ION MIHALACHE Nr. 15-17 Et. 1