Overview
I write programs to make art using XQuery and XSLT. The XQuery generates a domain-specific XML output, which the XSLT transforms to SVG for rendering. To support this work I have created extensive function libraries for geometric operations, tiling, curve generation, random number generation, image and colour manipulations and much more. These libraries are written in XQuery.
However, there are several reasons to want the same functions available in XSLT:
-
To increase abstraction level of the domain-specific XML
-
To share libraries more widely
-
To experiment with interactive art using Saxon-JS (Saxon)
Keeping the domain specific XML more abstract and pushing the details of rendering to the XSLT makes it easier to experiment with alternative presentations and designs. For example, a stroke can be rendered with a different "brush", turning a simple line into a more complex set of SVG objects. These brushes can be quite complex, involving extensive geometric calculations and randomizations. The requirement here is simply to be able to import the XQuery functions: translation to XSLT is not required. There are non-standard and rather trivial mechanisms to do this, although with a large collection of function libraries that is still a fair amount of repetitive work. The standard way to make XQuery functions available in XSLT involves extensive and tedious binding definitions.
Part of the goal of the function libraries is to share a variety of useful capabilities with the XML community. The fact is that there are many more people using XSLT in the world than XQuery. To the extent the collection of function libraries is useful, and worth sharing, it is more useful and shared more broadly as XSLT rather than XQuery. Importing into XSLT might be enough, except many processors do not support that, or support it only as a paid option. Making the libraries available to XSLT users who are unable or unwilling to pay the extra requires translation.
Finally, I wanted to experiment with interactive art in the browser without having to recode everything in a language I'd prefer not to use, and Saxon-JS looked to provide a vehicle for this. However, Saxon-JS only supports pure XSLT applications: packages that import XQuery modules are not supported. Translation of the XQuery libraries to XSLT packages is therefore a prerequisite.
There is also a fundamental requirement here: I am lazy and impatient. I don't want to have to type the same thing over and over again. I have a lot of existing code and I don't want to stop making art for months just to convert it all over. I want a system or tool that is easy to implement and simplifies the process enough that I will stick with it going forwards while meeting my other goals. There is ongoing development of the libraries as well as the creation of new libraries, and that development happens in XQuery. By the same token, I am not too concerned about perfecting the tool for every case: I am happy to have the tool do most of the work and finish the job in Emacs.
What follows is a description of the various approaches I took to making my XQuery functions available in XSLT: first using bindings to import them as-is, and then performing conversions into XSLT. I am not claiming all these approaches were good approaches. Indeed: simple string parsing is not a great approach at all, even if it proved a surprisingly effective one.
Calling XQuery Functions from XSLT
The truly easy way to make XQuery functions available in XSLT is decidedly non-standard:
saxon:import-query
instruction (which requires Saxon-PE or Saxon-EE) or its equivalent (e.g. MarkLogic's
xdmp:import-module
).
To generate a binding for an XQuery module is a fairly simple bit of scripting.
This wrapper transform is given the location of the XQuery module and the preferred XSLT prefix to use[1], and generates a small stub stylesheet by plucking out the module namespace. That stub stylesheet that can then be imported, providing access to the functions and variables in the XQuery library.
To create the binding transform in a standard way is a little more involved. The stub
transform binds the results of the function load-xquery-module()
to a variable. This is a map with information about the public functions and variables.
The script needs to then generate XSL functions that use this information to call
the appropriate function or return the appropriate variable value. Figure 3 shows what the bindings look like, once generated.
The variables are not actually needed here and could be collapsed into the body of the function. I chose to keep the mechanics of function lookup out of the function bodies themselves.
This is a substantially more involved bit of scripting. To accomplish it, there are several problems we need to address:
-
List of functions and variables
-
Parameter names
-
Type information for parameters, function return values, and variables
-
Set of namespaces needed
I have taken two distinct approaches to solving these problems, which I'll get to in section “Approaches”.
Translating XQuery Function Modules to XSLT
There are a number of papers on translating XSLT to XQuery. Bettentrupp06 and Fokoue05 direct most of their energy to determining the way to translate the template selection mechanism of XSLT, which is obviously of no concern when translating function libraries in the other direction. Laga06 gives a nice side-by-side summary of how XSLT 2.0 constructs can be mapped into XQuery 1.0. It uses templates applied using some kind of ad hoc implementation.
When mapping from XSLT to XQuery there are issues about replicating the template matching paradigm, issues about grouping constructs (for XQuery 1.0 as the target, at least), and issues around contextual features of XSLT like tunnel parameters and rules around precedence. Managing the syntax of XSLT is not problematic: it is XML and therefore easy to parse. The XPath expressions are subsets of XQuery expressions.
Going the other way is more syntactically challenging: XQuery has a fairly large and complex grammar, with a few subtleties around the use of non-reserved keywords and white space. My problem is more constrained than a general translation problem as well: I only wish to translate function libraries. Papers here are thin on the ground. Lechner02 is based on working drafts of the XQuery 1.0 specification and of XSLT 2.0, although the same principles hold. The approach is to do a complete translation of XQuery into XSLT using a mapping from an intermediate abstract syntax tree. A great deal of the effort centers around the problems of assigning nodes to variables, which is not an issue in XSLT 3.0. I also found some very incomplete references to papers but not, in fact, the papers themselves.
The target here is XSLT 3.0 packages. For the most part, therefore, we can wrap the
body of the XQuery function inside the select
attribute of xsl:sequence
instruction to define the XSLT implementation. There are some XQuery constructs in
the body that need to be translated, as we'll see in section “Difficulties”, but it is a good first step.
This kind of conversion needs everything we needed for the generated function bindings, plus a few additional items:
-
Function annotations indicating caching hints or other key information
-
List of imported modules
-
Function and variable bodies
Approaches
Take 1: String parsing + extension functions
The initial idea was to chop up the module text with string functions to pull out the key pieces of information. This has the advantage of being simple and works reasonably well. For example, Figure 5 shows a function that parses the namespace prefixes and URIs from the module imports in the text of an XQuery module.
This function gathers up text from the query prologue and chops it up into blocks
starting with import module namespace
, pulling out the namespace and prefix from the rest of the line. A similar process
can be used to gather additional namespace prefixes and URIs from the namespace declarations.
Similarly, Figure 6 shows a function that parses function names, bodies, and parameter names from the
text of an XQuery module. If the target is simply the function bindings, the function
bodies are not relevant and can be ignored. We use the declare function
keywords as the boundaries for tokenize
to chop up the module text into function declaration segments and then then $
marker of variable names to parse out the parameter list. None of this is completely
guaranteed. It will fail if there are comments containing the key phrase or syntax.
In practice, it works pretty well.
Notice what we don't have parsed out here: type information. While the type information
is there in the text, certainly, simple string parsing is not going to cut it here.
Consider a type such as function (map(xs:string,map(xs:string,xs:integer*))) as map(xs:string,xs:integer)*
. Good luck with that! Such nested type signatures require a different approach.
That different approach is to use introspection on the function definition itself, that is, using code to find out information about the code itself. Introspection needs a handle to the function item and a function that returns the return type and individual parameter types from that function item.
I have used two different approaches to get my hands on the function item. The first
uses load-xquery-module()
as in the XSLT stub module, and reads the function out of the map there. The other
uses function-lookup()
inside a constructed query that imports the necessary module and is invoked via saxon:xquery()
. The first is more standard, and with Saxon requires Saxon-EE. The second also works
in Saxon-PE.
Fine, so now we have a function item in our hands, so what? We still need to be able
to get the return type and the parameter types. Depending on the processor, there
may already be an extension function for this. For Saxon, I implemented a couple of
extension functions using the Saxon Java extension interface. The function ext:return-type()
returns a string giving the sequence type of the function item. The function ext:parameter-type
returns a string for the nth parameter of the function item. There are not terribly
complicated functions. Here, for example, is the guts of ext:return-type()
:
With these functions in hand, we can now march through the list of functions that have been extracted, and emit XSL functions with the appropriate set of declarations.
Variables' type information turned out to be harder to find: it may well be possible, but I couldn't quite figure out where the information was in the Java classes. Instead, I make a basic guess based on the actual type of the variable and rely on post-editing to fix it if necessary. This was sufficient for my needs as the number of public variables is relatively small, as is the range of types used, so in practice there isn't a lot of editing required.
To summarize, here is how various issues are handled using this approach:
List of functions and variables | Parsed from blocks separated by relevant keywords |
---|---|
Parameter names | Parsed from function declaration using the dollar sign and space as delimiters |
Type information | Extension functions to get type from function items, plus guessing based on variable values |
Imported modules | Parsed from blocks separated by relevant keywords |
Namespaces | Parsed from blocks separated by relevant keywords |
Ordering within output | Partial order from parse plus manual editing |
Function and variable bodies | Parsed from function declaration plus manual editing |
Module and function comments | Manual editing |
All of this works reasonably well in getting the overall framework of the XSLT module with all the functions and variables declared. There are a number of issues with it.
First, the text extraction isn't completely reliable. For most modules most of the time it did all right, but it got confused in the case of commented out code. A preprocessing step to remove comments could take care of that. Second, while I could keep the order of variables consistent with the original, and the order of functions likewise, there was no easy way to keep them consistent with each other as they are chopped out separately. It was also not easy to preserve the function and variable documentation. Adding the documentation and reordering things became a tedious manual chore.
More sophisticated string processing and additional logic could handle a lot of these cases. Still, simple string processing is always going to run afoul of edge cases when dealing with a complex recursive language. Fortunately, I found a better way that was still easy enough to implement in a day or so.[2]
Take 2: XQDoc + transform
Long had I intended to create linked HTML documentation for my XQuery modules, to make them more usable for the community. XQDoc is a commonly used tool for generating the documentation in a literate programming fashion from the XQuery modules themselves. As I updated my internal comments and started producing the documentation, it quickly because clear that the XQDoc intermediate format had already done a lot of the heavy lifting of pulling out the necessary information for generating the XSLT package. Furthermore, since there was a real parser behind it, it was much more reliable. All that was required was an appropriate stylesheet.
The XQDoc XML output for a library module has a handful of top-level elements. The
xqdoc:module
element includes children giving the module namespace URI and prefix and the top
comment. From this the stylesheet can render the top comment as an XML comment and
start defining the root xsl:package
element.
The xqdoc:imports
element includes a series of xqdoc:import
elements. Each of these include an attribute for the namespace prefix and a child
containing the namespace URI. These imports provide namespace declarations for the
XSLT xsl:package
and can also be rendered out as xsl:use-package
elements.
The xqdoc:namespaces
element includes a series of xqdoc:namespace
elements. Attributes on these elements define a set of additional namespace prefixes
and URIs to add to the xsl:package
element. With some specific exceptions, I also add the namespace prefixes to the
exclude-result-prefixes
attribute in the target stylesheet.
The two other top-level elements that matter in the XQDoc output are xqdoc:functions
and xqdoc:variables
. Although functions are separated from variables, I recover the original order by
sorting based on the starting offset recorded on the xqdoc:body
,
The xqdoc:variable
element has children xqdoc:name
to give the function name, xqdoc:comment
to give the associated XQDoc comment, xqdoc:type
to give the variable type, and xqdoc:body
to give the variable's value. These can be translated directly into the corresponding
information of an xsl:variable
element and its associated comment. The body, with a small bit of string processing
to eliminate the text before the ":="
, is put into the select
attribute of an xsl:sequence
element.
The xqdoc:function
element includes a lot of structured information. Not all of it is useful here. The
xqdoc:comment
child gets rendered out as an XML comment in the stylesheet. The xqdoc:name
, xqdoc:return
fits into the xsl:function
element. Annotations, captured in xqdoc:annotations
map to attributes on the xsl:function
element. The parameter names and types are captured in the xqdoc:parameter
children of the xqdoc:parameters
element, and rendered out as xsl:param
elements. Finally, the function body itself is found in the xqdoc:body
element. A small amount of string processing is required to capture the part between
the opening and closing braces, and then the whole is popped into the select
attribute of a xsl:sequence
element.
The whole stylesheet is fairly trivial to write and comes to about 200 lines. See Appendix A.
To summarize, here is how various issues are handled using this approach:
List of functions and variables | Pulled from xqdoc:functions and xqdoc:variables |
---|---|
Parameter names | Pulled from xqdoc:parameters |
Type information | Pulled from xqdoc:type |
Imported modules | Pulled from xqdoc:imports |
Namespaces | Pulled from xqdoc:namespaces and xqdoc:imports |
Ordering within output | Offsets on xqdoc:body |
Function and variable bodies | Extracted from xqdoc:body plus editing
|
Module and function comments | Formatted from xqdoc:comment |
This approach is much more comprehensive and reliable, making use of the XQuery parsing at the core of XQDoc to do the heavy lifting.
Difficulties
Although the transform goes a long way towards producing an equivalent XSLT package
from an XQuery function library module, the sad truth is that some manual editing
is still required after the translation step. Some changes are cosmetic, but make
the results more readable. Some are more substantial and stem from the key differences
between XQuery expressions and XPath expressions allowable in an xsl:sequence
select. Some alterations are quite mechanical in nature: others require significant
reworking of the code, alas.
Here are some of the key issues:
-
" 
 >
and related annoyances -
FLWOR vs for and let
-
switch and typeswitch
-
node construction
-
try...catch
-
function annotations
First up is the cosmetic issue of having the function bodies rendered with a lot of
needless entities. Since in my code I tend to use the double quotation mark for string
delimiters and the arrow syntax for a lot of function calls, the code looks quite
ugly with default output settings. Furthermore, all the indentation is lost with 

character entities replacing newlines. A character map for >
and 

fixes most of this, and the use of the Saxon serialization parameter saxon:single-quotes
takes care of the rest. Having made that fix, that much less post-processing is required.
The second post-processing issue stems from the difference between XQuery FLWOR expressions and XPath 3.1 for and let expressions. This manifests in several ways, all of which I take care of via manual editing.
Issue | Fix |
---|---|
Example: XQuery source | Example: translation for XSLT |
Chains of let or for clauses
|
Add return or use comma-separated clauses on a single let |
let $k := ($b - $a) div 3 let $ts := tail(this:linspace(4, $a, $b, true())) return ( $k * (2 * $f($ts[1]) - $f($ts[2]) + 2*$f($ts[3])) ) |
let $k := ($b - $a) div 3 return
let $ts := tail(this:linspace(4, $a, $b, true()))
return (
$k * (2 * $f($ts[1]) - $f($ts[2]) + 2*$f($ts[3]))
)
|
Use of at in for clause
|
Replace with indexes and count() |
sum( for $bit at $i in $bits return $this:MULTIPLIERS64[$i + $offset] * $bit ) cast as xs:integer |
sum(
for $i in 1 to count($bits)
return $this:MULTIPLIERS64[$i + $offset]*$bits[$i]
) cast as xs:integer
|
Use of order by clause
|
Replace with sort() |
for $datum in $data order by this:size($datum) descending return $datum |
for $datum in sort($data, (), function($d) {-this:size($d)})
return $datum
|
Use of where clause
|
Replace with predicate or conditional |
for $j in 1 to $N where ($j idiv math:pow(2, $j - 1) mod 2 = 1) return $input[$j] |
for $j in 1 to $N return if ($j idiv math:pow(2, $j - 1) mod 2 = 1) then $input[$j] else () |
Use of as |
Remove |
let $triangles as map(xs:string,item()*) := this:triangle-mesh($granularity, $n) |
let $triangles := this:triangle-mesh($granularity, $n) |
These are fairly mechanical modifications and do not require a great deal of thought
or effort. The lack of switch
and typeswitch
statements is only slightly more tiresome. These need to be converted to conditionals
with either equality comparisons or the use of the instance of
operator. It is also usually wise to add a let
variable. In some cases the determination of what variable to use may need a little
thought and analysis.
Often there are multiple ways to convert a problematic construct. Which approach to
take is largely a matter of taste and convenience. Adding return
to handle a sequence of let
clauses is simpler in my case than converting to a list on a single let clause, because
it it just repeatedly pasting in the same value (one keystroke in my editor) and makes
for easier cross-comparisons. I also prefer the stronger syntactic signal.
Example: XQuery source | Example: translation for XSLT |
---|---|
switch(this:kind($region)) case "polygon" return path:length($region) case "ellipse" return ellipse:perimeter($region) case "quad" return edge:length($region) case "cubic" return edge:length($region) case "edge" return edge:length($region) case "slot" return this:length($region=>slot:body()) default return 0 |
let $kind := this:kind($region) return if ($kind="polygon") then path:length($region) else if ($kind="ellipse") then ellipse:perimeter($region) else if ($kind="quad") then edge:length($region) else if ($kind="cubic") then edge:length($region) else if ($kind="edge") then edge:length($region) else if ($kind="slot") then this:length($region=>slot:body()) else 0 |
The remaining issues require a great deal more effort and thought. Node construction and try/catch expressions cannot be translated within the context of XPath and need to be converted to XSLT equivalents instead. Often that means pulling apart the whole expression to re-express it in XSLT terms. Consider the example below: as you can see, everything needs to be converted to XSLT. It is quite an involved undertaking. Fortunately node construction plays a relatively limited role in my function libraries and try/catch expressions are only used in a handful of places.
Example: XQuery source | Example: XSLT translation |
---|---|
for $node in $nodes return typeswitch ($node) case element(svg:feDisplacementMap) return element {node-name($node)} { $node/(@* except (@scale)), attribute scale { util:decimal($node/@scale * $rescale, 1) }, this:replace-scale($node/*, $rescale) } case element() return element {node-name($node)} { $node/@*, this:replace-scale($node/*, $rescale) } default return $node |
<xsl:for-each select="$nodes">
<xsl:variable name="node" select="."/>
<xsl:choose>
<xsl:when test='
$node instance of element(svg:feDisplacementMap)'
>
<xsl:element name="{node-name($node)}">
<xsl:copy-of select="$node/(@* except @scale)"/>
<xsl:attribute name="scale" select='
util:decimal($node/@scale * $rescale, 1)'/>
<xsl:sequence select='
this:replace-scale($node/*, $rescale)'/>
</xsl:element>
</xsl:when>
<xsl:when test='$node instance of element()'>
<xsl:element name="{node-name($node)}">
<xsl:copy-of select="$node/@*"/>
<xsl:sequence select='
this:replace-scale($node/*, $rescale)'/>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="$node"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
|
The observant reader here notes the previous example shows a recursive transformation, which is what XSLT is designed to do. A more idiomatic translation is obviously possible. There are definitely tensions here between what is convenient for an initial conversion, what is idiomatic, and what is easier to maintain. I have opted for less idiomatic but easier to maintain in the two forms.
The final difficulty poses conceptual problems as well as technical ones. I categorize
named functions using annotations for documentation purposes. For example, all base
randomization distribution functions are marked with the annotation %art:distribution
. Such annotations can be transferred over to XSLT functions as attributes and serve
essentially the same purpose. Other annotations, such as %saxon:memo-function
provide performance hints to the processor. These can be translated to the corresponding
mechanism in XSLT (if there is one).
Annotations on anonymous functions pose a more difficult problem. XPath 3.1 allows for inline function definitions, but not for function annotations on them. However, certain libraries make heavy use of function annotations on inline functions to give them usable names for metadata capture and debugging. The only solution is a wholesale rewrite: wrapping the function items in a map along with the necessary annotations. Unfortunately that changes the API for the callers, so a rewrite there is also necessary.[3]
Example: Original XQuery source | Example: Revised XQuery |
---|---|
declare function this:sdCappedTorus( $angle as xs:double, $ra as xs:double, $rb as xs:double ) as function(xs:double*) as xs:double* { let $sc := ( math:sin(math:radians($angle div 2)), math:cos(math:radians($angle div 2)) ) let $ra2 := $ra*$ra return %art:name("sd3:sdCappedTorus") function($point as xs:double*) as xs:double* { let $p := (abs(v:px($point)), tail($point)) let $pxy := (v:px($p), v:pz($p)) let $k := if (v:determinant($pxy,$sc) > 0) then v:dot($pxy, $sc) else v:magnitude($pxy) return math:sqrt(v:dot($p,$p) + $ra2 - 2.0*$ra*$k) - $rb } }; ... let $sdf := sd3:sdCappedTorus(60, 0.1, 0.4) return (util:function-name($sdf)||"="||$sdf((0.1, 0.2))) |
declare function this:sdCappedTorus( $angle as xs:double, $ra as xs:double, $rb as xs:double ) as map(*) { let $sc := ( math:sin(util:radians($angle div 2)), math:cos(util:radians($angle div 2)) ) let $ra2 := $ra*$ra return callable:named("sd3:sdCappedTorus", function($point as xs:double*) as xs:double* { let $p := (abs(v:px($point)), tail($point)) let $pxy := (v:px($p), v:pz($p)) let $k := if (v:determinant($pxy,$sc) > 0) then v:dot($pxy, $sc) else v:magnitude($pxy) return math:sqrt(v:dot($p,$p) + $ra2 - 2.0*$ra*$k) - $rb }) }; ... let $sdf := sd3:sdCappedTorus(60, 0.1, 0.4) =>callable:function() return (util:function-name($sdf)||"="||$sdf((0.1, 0.2))) |
This can then be translated into XSLT in the usual way.
Potential difficulties
There are a number of potential difficulties in an XQuery to XSLT translation that turn out not to be a problem for me, either because they are not particularly relevant to function libraries, or they are constructs that I haven't had occasion to use.
-
Extended FLWOR clauses
-
Ordered and unordered expressions
-
Other declarations
-
Crossing module imports
XQuery 3.0 extended the FLWOR expression with some additional clauses for grouping
and counting, none of which are available in XPath 3.1. All of these would need to
be converted into XSLT constructs. The entire FLWOR
would need to be unpacked into an xsl:for-each
or xsl:for-each-group
instruction.
The XQuery module prologue includes a number of other declarations: schema imports, decimal formats, and various mode and option declarations. Some of these have a direct analogue in XSLT and some do not.
XQuery declaration | XSLT translation |
---|---|
import schema |
xsl:import-schema instruction
|
decimal-format |
xsl:decimal-format instruction
|
base-uri |
xml:base on xsl:package |
default collation |
default-collation attribute on xsl:package |
construction |
Use appropriate validation attribute on specific constructors
|
copy-namespaces |
Use appropriate copy-namespaces and inherit-namespaces attributes on specific constructors
|
default order |
Make sure sort() calls use the right function
|
ordering |
Ignore: this is a performance hint |
ordered and unordered expressions
|
Ignore: these are performance hints |
boundary-space |
No easy translation |
Serialization options | xsl:output instruction
|
Other option declarations
|
Processor dependent translation, or none |
A more substantial problem would be the problem of crossing module imports. XQuery allows module A to import module B and module B to import module A. If there happen to be cyclic dependencies in functions or variables this will result in a dynamic error (e.g. stack overflow) but not a static one. XSLT packages are stricter, however. Cross imports are not permitted. This is sound software architectural practice. If the XQuery modules to be translated have crossing imports, there is nothing for it but to refactor them to avoid it.[4]
Maintenance
There is a final difficulty that deserves to be called out separately: maintaining both sets of libraries. Because not everything is automated here, new versions cannot be automatically regenerated. There are really only two options here and neither of them is great:
-
Full automated translation: generating the stylesheets becomes a "compilation" of sorts, scriptable in a build system.
-
Manual updates, with some automated assistance.
In theory, making the translation fully automatic should be possible: modify the XQDoc
source to include structured output for all expressions so that they can be rendered
out as XSLT. XQDoc works through the visitor pattern, defining functions for each
kind of node in the parse tree. Currently it just prints the function body as a string,
but it could continue visiting subcomponents and building up a structured result instead.
For example, a switch expression could be rendered out as an xsl:choose
element.
In addition to the work involved, there are some downsides to this. First, it basically turns long XPath expressions into a series of XSLT elements instead, making the functions much less readable. That said, it ought to be possible to determine when a full translation was required and when the raw function body could be used. This approach also means forking XQDoc and having to maintain it rather than just leveraging updates.
Manually maintaining updates is not a great option, either. Doing all the work manually makes it error-prone. It would be easy to miss some change. Some semi-automated assistance here will help a lot. Here's the basic plan of attack:
-
Create a baseline: the automatic (unedited) XSLT translation for the XQuery modules
-
For a new release of the stylesheets, generate the automatic XSLT translation
-
Calculate the difference between the baseline and the new automatic translation
-
Insert and edit differences
In practice this is only a little tiresome when the first three steps are part of the automated build system and most updates are handled live in the moment: about as tiresome as creating edited release notes. It ensures that in every full release, all the XQuery changes have been captured.
For day-to-day changes a slightly different differences-of-differences approach worked well:
-
Compare the working version of the XQuery module to the baseline in source control
-
Compare the working version of the XSLT module to the baseline in source control
-
Compare the two sets of differences
-
Resolve inconsistencies as necessary
I created a tool that compares the differences of the XQuery version to the baseline in source control with the differences of the XSLT version to that baseline, to make sure that each check-in contained consistent changes on both sides. Filtering "return" out of the differences made them more useful, as this was a large fraction of the actual differences.
A Word on Performance
It is interesting to consider the relative performance of the XQuery functions versus the XSLT functions, and indeed the relative performance of the compiled XSLT functions used in Saxon-JS.
I have not done a detailed performance study, so these results should be taken as
suggestive rather than definitive. The numbers were obtained with a simple Unix time
command and using Saxon 11.4 or Saxon-JS 2.5 as the processor. The Saxon-JS numbers
were run under node.js 16.18.1 over files compiled with Saxon 11.4. All times in seconds
unless indicated. These are timings on the running of various unit tests that had
been translated into all three contexts. Since Saxon-JS doesn't have a -repeat
option and, frankly, because it takes too long to run anyway, these were run just
once so Java warm-up time is part of the measurement.
Test set | XQuery | XSLT | Saxon-JS |
---|---|---|---|
All tests | 36.7 | 40.7 | 11+ hours |
array test | 1.3 | 1.6 | 0.5 |
colour-space test | 5.5 | 5.9 | 38.3 |
complex test | 1.7 | 1.9 | 0.7 |
distributions test | 2.6 | 3.1 | 9.7 |
geometry test | 5.1 | 6.7 | 1478s |
point test | 1.6 | 1.8 | 0.7 |
polynomial test | 1.4 | 1.5 | 0.4 |
randomizer test | 3.1 | 3.5 | 31.2 |
rectangle test | 2.0 | 1.9 | 0.7 |
SDF test | 10.7 | 11.2 | 10.5 hours |
utilities test | 1.4 | 1.6 | 0.6 |
The XQuery and XSLT performance numbers are fairly similar, with the XSLT times generally running about ten percent slower. The Saxon-JS numbers are, frankly, baffling. Some tests run much more quickly, probably because the time to parse and prepare the XQuery or XSLT has been accounted for when we compiled to JSON and the Javascript runtime initializes more quickly than the Java runtime. On the other hand, some of the times are much slower, unusably so. The bulk of the time is taken up in signed distance and certain geometric calculations (particularly intersections and interpolations), both of which do a lot of floating point mathematics and function calls. Still, there is no obvious reason for the numbers to be quite this bad, and it bears further investigation.[5]
Conclusions and Summary
We have examined several approaches to single-sourcing XQuery and XSLT function libraries starting with an XQuery base.
Approach | Difficulty | Completeness | Result usable in |
---|---|---|---|
Simple import | Trivial | Complete | Saxon-PE, Saxon-EE; not Saxon-JS |
Standard import | Fairly simple | Some type fixup for best results | Product supporting load-xquery-module() ; not Saxon-JS, Saxon-HE, or Saxon-PE
|
String-based conversion | Moderate | Extensive editing for order, expressions, comments (improvement possible) | XSLT 3.0 processor with XPath 3.1 |
XQDoc transformation | Fairly simple | Substantial editing for expressions | XSLT 3.0 processor with XPath 3.1 |
Extended XQDoc processor[6] | Complex | Complete | XSLT 3.0 processor with XPath 3.1 |
I was working with Saxon as both my tool and my target. Other alternatives are available
with some other processors such as MarkLogic or BaseX that have introspection capabilities.
For example, the BaseX inspection module has functions that return structured information about a module
either in XQDoc format or its own internal format, the list of functions in a module,
structured information about a specific function (including the attached XQDoc), the
type of a value, and so forth. The xquery:parse()
function could be used to get the detailed parse tree of function bodies for more
complete automation instead of modifying the XQDoc processor.
Maintaining the parallel versions and running tests in all three environments (XQuery, XSLT, node.js) has had a quite unexpected benefit of improving the code. Part of this was just the extra proofreading that happens during the fix-up stage: every line of code is examined twice, once to create it and once to translate it. You notice things. In addition, the translation steps essentially introduced compilers into the mix, which have to check every line of code soon after it is written. In an ideal world unit tests would accomplish the same thing, but the truth is, something like "grind on an image for an hour" is not something I choose to have a unit test for, and something like "create a random octopus" is not something particularly amenable to a unit test, so not every module gets run. Bone-head typos and errors have been introduced during major refactorings. In addition, the different processors have slightly different quirks and assumptions, and fixing code to work properly in all of them makes it better for any of them.
With these techniques I have been able to realize my goal of creating XQuery libraries and using them in XSLT and translating them into XSLT as well. Automatic binding generation takes just a few seconds, and is incorporated as part of the build system. Converting a new module to XSLT can generally be accomplished in a minute or so, manual post-processing and all, which is sufficient for my needs. Automatic baselining and comparison has been incorporated into the build and release scripts, piggy-backing off the documentation generation, and has proved manageable. Code overall has improved.
Appendix A. Full XQDoc to XSL stylesheet
<xsl:stylesheet xmlns:doc="http://www.xqdoc.org/1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias" xmlns:fn="http://www.w3.org/2005/02/xpath-functions" xmlns:art="http://mathling.com/art" xmlns:saxon="http://saxon.sf.net/" exclude-result-prefixes="doc fn saxon" version="3.0"> <xsl:output method="xml" indent="yes" encoding="UTF-8" use-character-maps="bodymap" saxon:single-quotes="yes" saxon:line-length="120" saxon:attribute-order="name visibility *"/> <xsl:character-map name="bodymap"> <xsl:output-character character=">" string=">"/> <xsl:output-character character="
" string="
"/> <xsl:output-character character="	" string=" "/> </xsl:character-map> <xsl:namespace-alias stylesheet-prefix="axsl" result-prefix="xsl"/> <xsl:template match="/"> <xsl:apply-templates select="//doc:xqdoc"/> </xsl:template> <xsl:template match="doc:xqdoc"> <xsl:apply-templates select="doc:module/doc:comment"/> <axsl:package name="{doc:module/doc:uri}" version="3.0"> <xsl:namespace name="this" select="doc:module/doc:uri"/> <xsl:apply-templates select="doc:namespaces|doc:imports" mode="namespaces"/> <xsl:attribute name="exclude-result-prefixes"> <xsl:sequence select='string-join(( "this", "xs", doc:namespaces/doc:namespace/@prefix, doc:imports/doc:import/@prefix )[not(. = ("art","svg"))]," ")'/> </xsl:attribute> <axsl:expose component="variable" names="this:*" visibility="public"/> <axsl:expose component="function" names="this:*" visibility="public"/> <xsl:apply-templates select="doc:imports"/> <xsl:variable name="components" as="element()*"> <xsl:sequence select="doc:variables/doc:variable"/> <xsl:sequence select="doc:functions/doc:function"/> </xsl:variable> <xsl:for-each select=' sort($components, (), function($component) {number($component/doc:body/@start)}) '> <xsl:apply-templates select="."/> </xsl:for-each> </axsl:package> </xsl:template> <xsl:template match="doc:namespaces" mode="namespaces"> <xsl:apply-templates select="doc:namespace" mode="namespaces"/> </xsl:template> <xsl:template match="doc:namespace" mode="namespaces"> <xsl:namespace name="{@prefix}" select="@uri"/> </xsl:template> <xsl:template match="doc:imports" mode="namespaces"> <xsl:apply-templates select="doc:import" mode="namespaces"/> </xsl:template> <xsl:template match="doc:import" mode="namespaces"> <xsl:namespace name="{@prefix}" select="doc:uri"/> </xsl:template> <xsl:template match="doc:variable[@private]"> <xsl:apply-templates select="doc:comment"/> <axsl:variable name="this:{doc:name}" visibility="private"> <xsl:apply-templates select="doc:type"/> <xsl:apply-templates select="doc:annotations"/> <axsl:sequence> <xsl:attribute name="select"> <xsl:value-of select='substring-after(doc:body,":=")'/> </xsl:attribute> </axsl:sequence> </axsl:variable> </xsl:template> <xsl:template match="doc:variable"> <xsl:apply-templates select="doc:comment"/> <axsl:variable name="this:{doc:name}"> <xsl:apply-templates select="doc:type"/> <xsl:apply-templates select="doc:annotations"/> <axsl:sequence> <xsl:attribute name="select"> <xsl:value-of select='substring-after(doc:body,":=")'/> </xsl:attribute> </axsl:sequence> </axsl:variable> </xsl:template> <xsl:template match="doc:imports"> <xsl:apply-templates/> </xsl:template> <xsl:template match="doc:import"> <axsl:use-package name="{doc:uri}" package-version="*"/> </xsl:template> <xsl:template match="doc:function[@private]"> <xsl:apply-templates select="doc:comment"/> <axsl:function name="this:{doc:name}" visibility="private"> <xsl:apply-templates select="doc:return"/> <xsl:apply-templates select="doc:annotations"/> <xsl:apply-templates select="doc:parameters"/> <axsl:sequence> <xsl:attribute name="select"> <xsl:value-of select='replace(substring-after(doc:body,"{"),"[}]$","")'/> </xsl:attribute> </axsl:sequence> </axsl:function> </xsl:template> <xsl:template match="doc:function"> <xsl:apply-templates select="doc:comment"/> <axsl:function name="this:{doc:name}"> <xsl:apply-templates select="doc:return"/> <xsl:apply-templates select="doc:annotations"/> <xsl:apply-templates select="doc:parameters"/> <axsl:sequence> <xsl:attribute name="select"> <xsl:value-of select='replace(substring-after(doc:body,"{"),"[}]$","")'/> </xsl:attribute> </axsl:sequence> </axsl:function> </xsl:template> <xsl:template match="doc:parameters"> <xsl:apply-templates select="doc:parameter"/> </xsl:template> <xsl:template match="doc:parameter"> <axsl:param name="{string(doc:name)}"> <xsl:apply-templates select="doc:type"/> </axsl:param> </xsl:template> <xsl:template match="doc:return" as="attribute()"> <xsl:apply-templates select="doc:type"/> </xsl:template> <xsl:template match="doc:type" as="attribute()"> <xsl:attribute name="as"> <xsl:choose> <xsl:when test='. eq ""'>empty-sequence()</xsl:when> <xsl:otherwise> <xsl:value-of select="replace(.,'[)]as',') as ')"/><xsl:value-of select="@occurrence"/> </xsl:otherwise> </xsl:choose> </xsl:attribute> </xsl:template> <xsl:template match="doc:annotations" as="attribute()*"> <xsl:apply-templates select="doc:annotation"/> </xsl:template> <xsl:template match="doc:annotation" as="attribute()"> <xsl:choose> <xsl:when test="@name='art:non-deterministic'"> <xsl:attribute name="new-each-time" select="'yes'"/> </xsl:when> <xsl:when test="@name='saxon:memo-function'"> <xsl:attribute name="cache" select="'yes'"/> </xsl:when> <xsl:when test="@name='private'"> <xsl:attribute name="visibility" select="'private'"/> </xsl:when> <xsl:otherwise> <xsl:attribute name="{@name}" select="string(doc:literal)"/> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="doc:module/doc:comment"> <xsl:variable name="contents" as="xs:string*"> <xsl:for-each select='tokenize(doc:description,"\n")'> <xsl:sequence select='string(.)'/> </xsl:for-each> <xsl:sequence select='""'/> <xsl:for-each select="doc:* except (doc:description|doc:custom)"> <xsl:sequence select='concat(" @", local-name(.),": ",string(.))'/> </xsl:for-each> <xsl:for-each select="doc:custom"> <xsl:sequence select='concat(" @", local-name(.),":",string(./@tag)," ",string(.))'/> </xsl:for-each> </xsl:variable> <xsl:comment> <xsl:sequence select='string-join($contents,"
 :")'/> <xsl:sequence select='"
 "'/> </xsl:comment> </xsl:template> <xsl:template match="doc:comment"> <xsl:variable name="contents" as="xs:string*"> <xsl:for-each select='tokenize(doc:description,"\n")'> <xsl:sequence select='string(.)'/> </xsl:for-each> <xsl:sequence select='""'/> <xsl:for-each select="doc:* except (doc:description|doc:custom)"> <xsl:sequence select='concat(" @", local-name(.),": ",string(.))'/> </xsl:for-each> <xsl:for-each select="doc:custom"> <xsl:sequence select='concat(" @", local-name(.),":",string(./@tag)," ",string(.))'/> </xsl:for-each> </xsl:variable> <xsl:comment> <xsl:sequence select='string-join($contents,"
 :")'/> <xsl:sequence select='"
 "'/> </xsl:comment> </xsl:template> </xsl:stylesheet>
References
[BaseX] BaseX, basex.org. https://basex.org/
[Bettentrupp06] Ralf Bettentrupp, Sven Groppe, Jinghua Groppe, Stefan Böttcher, Le Gruenwald.
A Prototype for Translating XSLT into XQuery.
In Proceedings of the 8th International Conference on Enterprise Information Systems
- Volume 1: ICEIS, pages 22-29. SciTePress. 2006. (doi:https://doi.org/10.5220/0002442100220029) Available at https://www.scitepress.org/papers/2006/24421/24421.pdf
[Fokoue05] Achille Fokoue, Kristoffer Høgsbro Rose, Jérôme Siméon, Lionel Villard. Compiling XSLT 2.0 into XQuery 1.0
. In Proceedings of the Fourteenth International World Wide Web Conference, ACM Press, Chiba, Japan, May 2005, pp. 682-691. (doi:https://doi.org/10.1145/1060745.1060844) Available at https://www.researchgate.net/publication/221023408_Compiling_XSLT_20_into_XQuery_10
[XSLT] W3C: Michael Kay, editor. XSL Transformations (XSLT) Version 3.0 Recommendation. W3C, 8 June 2017. http://www.w3.org/TR/xslt-30/
[Laga06] Albin Laga, Praveen Madiraju, Darrel A. Mazzari and Gowri Dara.
Translating XSLT into XQuery.
In Proceedings of 15th International Conference on Software Engineering and Data Engineering
(SEDE-2006), Los Angeles, California, July 6-8, 2006.
An extended version An Approach to Translate XSLT to XQuery is available at http://www.mscs.mu.edu/~praveen/Research/XSLT2XQ/XSLT2XQ_Journal.pdf
[Lechner02] Stephan Lechner, Günter Preuner, Micheel Schrefl. Translating XQuery into XSLT
. In: Arisawa, H., Kambayashi, Y., Kumar, V., Mayr, H.C., Hunt, I. (eds) Conceptual Modeling for New Information Systems Technologies. ER 2001. Lecture Notes in Computer Science, vol 2465. Springer, Berlin, Heidelberg.
2002.
(doi:https://doi.org/10.1007/3-540-46140-X_19)
Available on request from authors at http://www.dke.jku.at/research/publications/details.xq?type=paper&code=Lech01a
[XQuery] W3C: Jonathan Robie, Michael Dyck, Josh Spiegel, editors. XQuery 3.1: An XML Query Language Recommendation. W3C, 21 March 2017. http://www.w3.org/TR/xquery-31/
[Saxon] Saxon, Saxonica https://www.saxonica.com/products/products.xml
[XQDoc] XQDoc, xqdoc.org. https://xqdoc.org/
[1] One could pull the internal prefix from the XQuery module itself, from the module
namespace declaration. However my coding style is to always use the prefix this
internal to an XQuery module and what is needed here is an outward-facing prefix,
not an inward-facing one.
[2] Although preparing for the implementation was a long week with a lot of assists from Emacs macros.
[3] It turns out this rewrite is necessary for using the compiled stylesheets in Saxon-JS
anyway because the full annotation mechanism relies on saxon:xquery
which is not available in that context.
[4] One of my reviewers spelled out a rather clever technique for avoiding refactoring to handle this problem by using static parameters used as flags on use-when attributes on the imports. I haven't had occasion to try this myself.
[5] A future release of SaxonJS should address this issue: I look forward to testing it.
[6] Not implemented.