I am trying to extract year from multiple xml files. Initially, the xml files are as follows,
JavaScript
x
11
11
1
<Return xmlns="http://www.irs.gov/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.irs.gov/efile" returnVersion="2018v3.2">
2
<ReturnHeader binaryAttachmentCnt="0">
3
<!-- -->
4
<TaxPeriodEndDt>2019-09-30</TaxPeriodEndDt>
5
<!-- -->
6
</ReturnHeader>
7
<ReturnData documentCnt="12">
8
<!-- -->
9
</ReturnData>
10
</Return>
11
I used
JavaScript
1
2
1
year = root.find('.//irs:TaxPeriodEndDt',ns).text[:4]
2
It had worked well. But in some xml files the tag is changed to TaxPeriodEndDate
JavaScript
1
11
11
1
<Return xmlns="http://www.irs.gov/efile" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.irs.gov/efile" returnVersion="2018v3.2">
2
<ReturnHeader binaryAttachmentCnt="0">
3
<!-- -->
4
<TaxPeriodEndDate>2012-09-30</TaxPeriodEndDate>
5
<!-- -->
6
</ReturnHeader>
7
<ReturnData documentCnt="12">
8
<!-- -->
9
</ReturnData>
10
</Return>
11
I tried to revise the code to
JavaScript
1
2
1
year = root.find('.//irs:TaxPeriodEndDt|.//irs:TaxPeriodEndDate',ns).text[:4]
2
It did not work. No error message, but no output. Any suggestion is highly appreciated. Thank you.
Advertisement
Answer
The support for xpath in ElementTree is very limited. The union operator (|
) doesn’t appear to work and other options, like using the self::
axis or name()
/local-name()
in a predicate, aren’t supported.
I think your best bet is to use a try/except…
JavaScript
1
5
1
try:
2
year = root.find(".//irs:TaxPeriodEndDt", ns).text[:4]
3
except AttributeError:
4
year = root.find(".//irs:TaxPeriodEndDate", ns).text[:4]
5
If you can switch to lxml, your original attempt with the union operator will work with a few small changes (mainly use xpath()
instead of find()
and use the namespaces
keyword arg)…
JavaScript
1
2
1
year = root.xpath(".//irs:TaxPeriodEndDt|.//irs:TaxPeriodEndDate", namespaces=ns)[0].text[:4]
2