I have a question about the Saxon C HE version for Python. After the successful installation I tried some examples where I executed XSLT transformations. These all worked.
However, when I parse an XML file, no DTD validation is performed during parsing and the XIncludes are not resolved. I have tried many things, however it is not possible for me to solve this problem. I hope someone can show me and explain my error.
Attached is an example which should show an error with intent when a DTD validation is done because there is no element with the name FOU in the DTD. When I run the script then it creates a Result.xml file and both the erroneous FOU element is present and the XInclude which is not resolved.
I am aware that it is easy to do this with lxml, however I would like to know how it works with the Saxon parser.
XML Master:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE TEST SYSTEM "Test.dtd"> <TEST> <FOU Id="A-1"> <BAR Name="Test-Bar-1"/> <BAR Name="Test-Bar-2"/> <BAR Name="Test-Bar-3"/> </FOU> <TUTU Id="TU-1"> <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="Include.xml" xpointer="xpointer(/node()/node()/*)"/> </TUTU> </TEST>
XML Include:
<?xml version="1.0" encoding="UTF-8" standalone="no"?> <!DOCTYPE TEST SYSTEM "Test.dtd"> <TEST> <TUTU Id="TU-1"> <TITI Name="Titi-1"/> <TITI Name="Titi-2"/> <TITI Name="Titi-3"/> </TUTU> </TEST>
DTD:
<!ELEMENT TEST (FOO+ , TUTU+)> <!ELEMENT FOO (BAR+)> <!ELEMENT BAR ANY> <!ELEMENT TUTU (TITI+)> <!ELEMENT TITI ANY> <!-- Attribute --> <!ATTLIST TEST > <!ATTLIST FOO Id ID #REQUIRED > <!ATTLIST BAR Name CDATA #IMPLIED > <!ATTLIST TUTU Id ID #REQUIRED > <!ATTLIST TITI Name CDATA #IMPLIED >
Python Script:
import saxonc with saxonc.PySaxonProcessor(license=False) as proc: print(proc.version) xdmAtomicval = proc.make_boolean_value(False) xsltproc = proc.new_xslt_processor() document = proc.parse_xml(xml_file_name='Master.xml') print(document) xsltproc.set_source(xdm_node=document) xsltproc.set_output_file("Result.xml") xsltproc.compile_stylesheet(stylesheet_file="styl.xslt") xsltproc.transform_to_file(stylesheet_file="styl.xslt") documentRes = proc.parse_xml(xml_file_name='Result.xml') print(documentRes)
Advertisement
Answer
You should be able to set the xi
and dtd
configuration properties to “on”.
proc.set_configuration_property("xi", "on") proc.set_configuration_property("dtd", "on")
However, the only way I could get it to work was if I removed the xpointer from the xinclude. I didn’t have time to research why this isn’t working.
It also doesn’t appear that parse_xml() does any validation or xinclude resolution, but it did happen on the transform (set dtd validation to “off” or to “recover” to get Result.xml).
Here’s the modified version of your Python that I used to test…
import os import saxonc with saxonc.PySaxonProcessor(license=False) as proc: print(proc.version) proc.set_cwd(os.getcwd()) proc.set_configuration_property("xi", "on") proc.set_configuration_property("dtd", "on") document = proc.parse_xml(xml_file_name='Master.xml') print(document) xsltproc = proc.new_xslt30_processor() xsltproc.transform_to_file(source_file="Master.xml", stylesheet_file="styl.xslt", output_file="Result.xml") documentRes = proc.parse_xml(xml_file_name='Result.xml') print(documentRes)