Skip to content
Advertisement

No DTD validation and XInclude resolution when using Saxon C HE with Python

I have a question about the Saxon C HE version for Python. After the successful installation I tried some examples where I executed XSLT transformations. These all worked.

However, when I parse an XML file, no DTD validation is performed during parsing and the XIncludes are not resolved. I have tried many things, however it is not possible for me to solve this problem. I hope someone can show me and explain my error.

Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD. When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XInclude which is not resolved.

I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.

XML Master:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <FOU Id="A-1">
        <BAR Name="Test-Bar-1"/>
        <BAR Name="Test-Bar-2"/>
        <BAR Name="Test-Bar-3"/>
    </FOU>
    <TUTU Id="TU-1">
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/>
    </TUTU>
</TEST>

XML Include:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE TEST SYSTEM "Test.dtd">
<TEST>
    <TUTU Id="TU-1">
        <TITI Name="Titi-1"/>
        <TITI Name="Titi-2"/>
        <TITI Name="Titi-3"/>
    </TUTU>
</TEST>

DTD:

<!ELEMENT TEST  (FOO+ , TUTU+)>
<!ELEMENT FOO   (BAR+)>
<!ELEMENT BAR   ANY>
<!ELEMENT TUTU  (TITI+)>
<!ELEMENT TITI  ANY>
<!-- Attribute -->
<!ATTLIST TEST
>
<!ATTLIST FOO
    Id      ID    #REQUIRED
>
<!ATTLIST BAR
    Name        CDATA #IMPLIED
>
<!ATTLIST TUTU
    Id      ID    #REQUIRED
>
<!ATTLIST TITI 
    Name        CDATA #IMPLIED
>

Python Script:

import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    xdmAtomicval = proc.make_boolean_value(False)
    xsltproc = proc.new_xslt_processor()
    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)
    
    xsltproc.set_source(xdm_node=document)
    xsltproc.set_output_file("Result.xml")
    xsltproc.compile_stylesheet(stylesheet_file="styl.xslt")
    xsltproc.transform_to_file(stylesheet_file="styl.xslt")
    
    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)

Advertisement

Answer

You should be able to set the xi and dtd configuration properties to “on”.

proc.set_configuration_property("xi", "on")
proc.set_configuration_property("dtd", "on")

However, the only way I could get it to work was if I removed the xpointer from the xinclude. I didn’t have time to research why this isn’t working.

It also doesn’t appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to “off” or to “recover” to get Result.xml).

Here’s the modified version of your Python that I used to test…

import os
import saxonc

with saxonc.PySaxonProcessor(license=False) as proc:
    print(proc.version)
    proc.set_cwd(os.getcwd())
    proc.set_configuration_property("xi", "on")
    proc.set_configuration_property("dtd", "on")

    document = proc.parse_xml(xml_file_name='Master.xml')
    print(document)

    xsltproc = proc.new_xslt30_processor()
    xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml")

    documentRes = proc.parse_xml(xml_file_name='Result.xml')
    print(documentRes)
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement