Holstege, Mary. “Adventures in Single-Sourcing XQuery and XSLT.” Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Holstege01.
Balisage: The Markup Conference 2023 July 31 - August 4, 2023
Balisage Paper: Adventures in Single-Sourcing XQuery and XSLT
Mary Holstege spent decades developing software in Silicon Valley, in and around markup
technologies and information extraction. She has most recently been pursuing artistic
endeavours. She holds a Ph.D. from Stanford University in Computer Science, for a
thesis on document representation.
This is a case study of a project to make a fairly substantial collection of XQuery
function libraries available in XSLT. Various approaches with different capabilities
are taken:
Generating a stub stylesheet with a simple import of the XQuery module.
Generating a more elaborate stylesheet with bindings for the functions and variables.
Semi-automatically converting XQuery to XSLT using string manipulations.
Semi-automatically converting XQuery to XSLT using the XQDoc XML as an intermediate
format.
There is consideration of the various sticking points and tradeoffs with these approaches.
I write programs to make art using XQuery and XSLT. The XQuery generates a domain-specific XML output, which the XSLT transforms to
SVG for rendering. To support this work I have created extensive function libraries
for geometric operations, tiling, curve generation, random number generation, image
and colour manipulations and much more. These libraries are written in XQuery.
However, there are several reasons to want the same functions available in XSLT:
To increase abstraction level of the domain-specific XML
To share libraries more widely
To experiment with interactive art using Saxon-JS (Saxon)
Keeping the domain specific XML more abstract and pushing the details of rendering
to the XSLT makes it easier to experiment with alternative presentations and designs.
For example, a stroke can be rendered with a different "brush", turning a simple line
into a more complex set of SVG objects. These brushes can be quite complex, involving
extensive geometric calculations and randomizations. The requirement here is simply
to be able to import the XQuery functions: translation to XSLT is not required. There
are non-standard and rather trivial mechanisms to do this, although with a large collection
of function libraries that is still a fair amount of repetitive work. The standard
way to make XQuery functions available in XSLT involves extensive and tedious binding
definitions.
Part of the goal of the function libraries is to share a variety of useful capabilities
with the XML community. The fact is that there are many more people using XSLT in
the world than XQuery. To the extent the collection of function libraries is useful,
and worth sharing, it is more useful and shared more broadly as XSLT rather than XQuery.
Importing into XSLT might be enough, except many processors do not support that, or
support it only as a paid option. Making the libraries available to XSLT users who
are unable or unwilling to pay the extra requires translation.
Finally, I wanted to experiment with interactive art in the browser without having
to recode everything in a language I'd prefer not to use, and Saxon-JS looked to provide
a vehicle for this. However, Saxon-JS only supports pure XSLT applications: packages
that import XQuery modules are not supported. Translation of the XQuery libraries
to XSLT packages is therefore a prerequisite.
There is also a fundamental requirement here: I am lazy and impatient. I don't want
to have to type the same thing over and over again. I have a lot of existing code
and I don't want to stop making art for months just to convert it all over. I want
a system or tool that is easy to implement and simplifies the process enough that
I will stick with it going forwards while meeting my other goals. There is ongoing
development of the libraries as well as the creation of new libraries, and that development
happens in XQuery. By the same token, I am not too concerned about perfecting the
tool for every case: I am happy to have the tool do most of the work and finish the
job in Emacs.
What follows is a description of the various approaches I took to making my XQuery
functions available in XSLT: first using bindings to import them as-is, and then performing
conversions into XSLT. I am not claiming all these approaches were good approaches.
Indeed: simple string parsing is not a great approach at all, even if it proved a
surprisingly effective one.
Calling XQuery Functions from XSLT
The truly easy way to make XQuery functions available in XSLT is decidedly non-standard:
saxon:import-query instruction (which requires Saxon-PE or Saxon-EE) or its equivalent (e.g. MarkLogic's
xdmp:import-module).
To generate a binding for an XQuery module is a fairly simple bit of scripting.
This wrapper transform is given the location of the XQuery module and the preferred
XSLT prefix to use[1], and generates a small stub stylesheet by plucking out the module namespace. That
stub stylesheet that can then be imported, providing access to the functions and variables
in the XQuery library.
To create the binding transform in a standard way is a little more involved. The stub
transform binds the results of the function load-xquery-module() to a variable. This is a map with information about the public functions and variables.
The script needs to then generate XSL functions that use this information to call
the appropriate function or return the appropriate variable value. Figure 3 shows what the bindings look like, once generated.
The variables are not actually needed here and could be collapsed into the body of
the function. I chose to keep the mechanics of function lookup out of the function
bodies themselves.
This is a substantially more involved bit of scripting. To accomplish it, there are
several problems we need to address:
List of functions and variables
Parameter names
Type information for parameters, function return values, and variables
Set of namespaces needed
I have taken two distinct approaches to solving these problems, which I'll get to
in section “Approaches”.
Translating XQuery Function Modules to XSLT
There are a number of papers on translating XSLT to XQuery.
Bettentrupp06 and Fokoue05 direct most of their energy to determining the way to translate the template selection
mechanism of XSLT, which is obviously of no concern when translating function libraries
in the other direction. Laga06 gives a nice side-by-side summary of how XSLT 2.0 constructs can be mapped into XQuery
1.0. It uses templates applied using some kind of ad hoc implementation.
When mapping from XSLT to XQuery there are issues about replicating the template matching
paradigm, issues about grouping constructs (for XQuery 1.0 as the target, at least),
and issues around contextual features of XSLT like tunnel parameters and rules around
precedence. Managing the syntax of XSLT is not problematic: it is XML and therefore
easy to parse. The XPath expressions are subsets of XQuery expressions.
Going the other way is more syntactically challenging: XQuery has a fairly large and
complex grammar, with a few subtleties around the use of non-reserved keywords and
white space. My problem is more constrained than a general translation problem as
well: I only wish to translate function libraries. Papers here are thin on the ground.
Lechner02 is based on working drafts of the XQuery 1.0 specification and of XSLT 2.0, although
the same principles hold. The approach is to do a complete translation of XQuery into
XSLT using a mapping from an intermediate abstract syntax tree. A great deal of the
effort centers around the problems of assigning nodes to variables, which is not an
issue in XSLT 3.0. I also found some very incomplete references to papers but not, in fact, the papers themselves.
The target here is XSLT 3.0 packages. For the most part, therefore, we can wrap the
body of the XQuery function inside the select attribute of xsl:sequence instruction to define the XSLT implementation. There are some XQuery constructs in
the body that need to be translated, as we'll see in section “Difficulties”, but it is a good first step.
This kind of conversion needs everything we needed for the generated function bindings,
plus a few additional items:
Function annotations indicating caching hints or other key information
List of imported modules
Function and variable bodies
Approaches
Take 1: String parsing + extension functions
The initial idea was to chop up the module text with string functions to pull out
the key pieces of information. This has the advantage of being simple and works reasonably
well. For example, Figure 5 shows a function that parses the namespace prefixes and URIs from the module imports
in the text of an XQuery module.
This function gathers up text from the query prologue and chops it up into blocks
starting with import module namespace, pulling out the namespace and prefix from the rest of the line. A similar process
can be used to gather additional namespace prefixes and URIs from the namespace declarations.
Similarly, Figure 6 shows a function that parses function names, bodies, and parameter names from the
text of an XQuery module. If the target is simply the function bindings, the function
bodies are not relevant and can be ignored. We use the declare function keywords as the boundaries for tokenize to chop up the module text into function declaration segments and then then $ marker of variable names to parse out the parameter list. None of this is completely
guaranteed. It will fail if there are comments containing the key phrase or syntax.
In practice, it works pretty well.
Notice what we don't have parsed out here: type information. While the type information
is there in the text, certainly, simple string parsing is not going to cut it here.
Consider a type such as function (map(xs:string,map(xs:string,xs:integer*))) as map(xs:string,xs:integer)*. Good luck with that! Such nested type signatures require a different approach.
That different approach is to use introspection on the function definition itself,
that is, using code to find out information about the code itself. Introspection needs
a handle to the function item and a function that returns the return type and individual
parameter types from that function item.
I have used two different approaches to get my hands on the function item. The first
uses load-xquery-module() as in the XSLT stub module, and reads the function out of the map there. The other
uses function-lookup() inside a constructed query that imports the necessary module and is invoked via saxon:xquery(). The first is more standard, and with Saxon requires Saxon-EE. The second also works
in Saxon-PE.
Fine, so now we have a function item in our hands, so what? We still need to be able
to get the return type and the parameter types. Depending on the processor, there
may already be an extension function for this. For Saxon, I implemented a couple of
extension functions using the Saxon Java extension interface. The function ext:return-type() returns a string giving the sequence type of the function item. The function ext:parameter-type returns a string for the nth parameter of the function item. There are not terribly
complicated functions. Here, for example, is the guts of ext:return-type():
With these functions in hand, we can now march through the list of functions that
have been extracted, and emit XSL functions with the appropriate set of declarations.
Variables' type information turned out to be harder to find: it may well be possible,
but I couldn't quite figure out where the information was in the Java classes. Instead,
I make a basic guess based on the actual type of the variable and rely on post-editing
to fix it if necessary. This was sufficient for my needs as the number of public variables
is relatively small, as is the range of types used, so in practice there isn't a lot
of editing required.
To summarize, here is how various issues are handled using this approach:
List of functions and variables
Parsed from blocks separated by relevant keywords
Parameter names
Parsed from function declaration using the dollar sign and space as delimiters
Type information
Extension functions to get type from function items, plus guessing based on variable
values
Imported modules
Parsed from blocks separated by relevant keywords
Namespaces
Parsed from blocks separated by relevant keywords
Ordering within output
Partial order from parse plus manual editing
Function and variable bodies
Parsed from function declaration plus manual editing
Module and function comments
Manual editing
All of this works reasonably well in getting the overall framework of the XSLT module
with all the functions and variables declared. There are a number of issues with it.
First, the text extraction isn't completely reliable. For most modules most of the
time it did all right, but it got confused in the case of commented out code. A preprocessing
step to remove comments could take care of that. Second, while I could keep the order
of variables consistent with the original, and the order of functions likewise, there
was no easy way to keep them consistent with each other as they are chopped out separately.
It was also not easy to preserve the function and variable documentation. Adding the
documentation and reordering things became a tedious manual chore.
More sophisticated string processing and additional logic could handle a lot of these
cases. Still, simple string processing is always going to run afoul of edge cases
when dealing with a complex recursive language. Fortunately, I found a better way
that was still easy enough to implement in a day or so.[2]
Take 2: XQDoc + transform
Long had I intended to create linked HTML documentation for my XQuery modules, to
make them more usable for the community. XQDoc is a commonly used tool for generating the documentation in a literate programming
fashion from the XQuery modules themselves. As I updated my internal comments and
started producing the documentation, it quickly because clear that the XQDoc intermediate
format had already done a lot of the heavy lifting of pulling out the necessary information
for generating the XSLT package. Furthermore, since there was a real parser behind
it, it was much more reliable. All that was required was an appropriate stylesheet.
The XQDoc XML output for a library module has a handful of top-level elements. The
xqdoc:module element includes children giving the module namespace URI and prefix and the top
comment. From this the stylesheet can render the top comment as an XML comment and
start defining the root xsl:package element.
The xqdoc:imports element includes a series of xqdoc:import elements. Each of these include an attribute for the namespace prefix and a child
containing the namespace URI. These imports provide namespace declarations for the
XSLT xsl:package and can also be rendered out as xsl:use-package elements.
The xqdoc:namespaces element includes a series of xqdoc:namespace elements. Attributes on these elements define a set of additional namespace prefixes
and URIs to add to the xsl:package element. With some specific exceptions, I also add the namespace prefixes to the
exclude-result-prefixes attribute in the target stylesheet.
The two other top-level elements that matter in the XQDoc output are xqdoc:functions and xqdoc:variables. Although functions are separated from variables, I recover the original order by
sorting based on the starting offset recorded on the xqdoc:body,
The xqdoc:variable element has children xqdoc:name to give the function name, xqdoc:comment to give the associated XQDoc comment, xqdoc:type to give the variable type, and xqdoc:body to give the variable's value. These can be translated directly into the corresponding
information of an xsl:variable element and its associated comment. The body, with a small bit of string processing
to eliminate the text before the ":=", is put into the select attribute of an xsl:sequence element.
The xqdoc:function element includes a lot of structured information. Not all of it is useful here. The
xqdoc:comment child gets rendered out as an XML comment in the stylesheet. The xqdoc:name, xqdoc:return fits into the xsl:function element. Annotations, captured in xqdoc:annotations map to attributes on the xsl:function element. The parameter names and types are captured in the xqdoc:parameter children of the xqdoc:parameters element, and rendered out as xsl:param elements. Finally, the function body itself is found in the xqdoc:body element. A small amount of string processing is required to capture the part between
the opening and closing braces, and then the whole is popped into the select attribute of a xsl:sequence element.
The whole stylesheet is fairly trivial to write and comes to about 200 lines. See
Appendix A.
To summarize, here is how various issues are handled using this approach:
List of functions and variables
Pulled from xqdoc:functions and xqdoc:variables
Parameter names
Pulled from xqdoc:parameters
Type information
Pulled from xqdoc:type
Imported modules
Pulled from xqdoc:imports
Namespaces
Pulled from xqdoc:namespaces and xqdoc:imports
Ordering within output
Offsets on xqdoc:body
Function and variable bodies
Extracted from xqdoc:body plus editing
Module and function comments
Formatted from xqdoc:comment
This approach is much more comprehensive and reliable, making use of the XQuery parsing
at the core of XQDoc to do the heavy lifting.
Difficulties
Although the transform goes a long way towards producing an equivalent XSLT package
from an XQuery function library module, the sad truth is that some manual editing
is still required after the translation step. Some changes are cosmetic, but make
the results more readable. Some are more substantial and stem from the key differences
between XQuery expressions and XPath expressions allowable in an xsl:sequence select. Some alterations are quite mechanical in nature: others require significant
reworking of the code, alas.
Here are some of the key issues:
" 
 > and related annoyances
FLWOR vs for and let
switch and typeswitch
node construction
try...catch
function annotations
First up is the cosmetic issue of having the function bodies rendered with a lot of
needless entities. Since in my code I tend to use the double quotation mark for string
delimiters and the arrow syntax for a lot of function calls, the code looks quite
ugly with default output settings. Furthermore, all the indentation is lost with 
 character entities replacing newlines. A character map for > and 
 fixes most of this, and the use of the Saxon serialization parameter saxon:single-quotes takes care of the rest. Having made that fix, that much less post-processing is required.
The second post-processing issue stems from the difference between XQuery FLWOR expressions
and XPath 3.1 for and let expressions. This manifests in several ways, all of which
I take care of via manual editing.
Issue
Fix
Example: XQuery source
Example: translation for XSLT
Chains of let or for clauses
Add return or use comma-separated clauses on a single let
let $k := ($b - $a) div 3
let $ts := tail(this:linspace(4, $a, $b, true()))
return (
$k * (2 * $f($ts[1]) - $f($ts[2]) + 2*$f($ts[3]))
)
let $k := ($b - $a) div 3 return
let $ts := tail(this:linspace(4, $a, $b, true()))
return (
$k * (2 * $f($ts[1]) - $f($ts[2]) + 2*$f($ts[3]))
)
Use of at in for clause
Replace with indexes and count()
sum(
for $bit at $i in $bits
return $this:MULTIPLIERS64[$i + $offset] * $bit
) cast as xs:integer
sum(
for $i in 1 to count($bits)
return $this:MULTIPLIERS64[$i + $offset]*$bits[$i]
) cast as xs:integer
Use of order by clause
Replace with sort()
for $datum in $data
order by this:size($datum) descending
return $datum
for $datum in sort($data, (), function($d) {-this:size($d)})
return $datum
Use of where clause
Replace with predicate or conditional
for $j in 1 to $N
where ($j idiv math:pow(2, $j - 1) mod 2 = 1)
return $input[$j]
for $j in 1 to $N
return
if ($j idiv math:pow(2, $j - 1) mod 2 = 1)
then $input[$j]
else ()
Use of as
Remove
let $triangles as map(xs:string,item()*) := this:triangle-mesh($granularity, $n)
let $triangles := this:triangle-mesh($granularity, $n)
These are fairly mechanical modifications and do not require a great deal of thought
or effort. The lack of switch and typeswitch statements is only slightly more tiresome. These need to be converted to conditionals
with either equality comparisons or the use of the instance of operator. It is also usually wise to add a let variable. In some cases the determination of what variable to use may need a little
thought and analysis.
Often there are multiple ways to convert a problematic construct. Which approach to
take is largely a matter of taste and convenience. Adding return to handle a sequence of let clauses is simpler in my case than converting to a list on a single let clause, because
it it just repeatedly pasting in the same value (one keystroke in my editor) and makes
for easier cross-comparisons. I also prefer the stronger syntactic signal.
Example: XQuery source
Example: translation for XSLT
switch(this:kind($region))
case "polygon" return path:length($region)
case "ellipse" return ellipse:perimeter($region)
case "quad" return edge:length($region)
case "cubic" return edge:length($region)
case "edge" return edge:length($region)
case "slot" return
this:length($region=>slot:body())
default return 0
let $kind := this:kind($region) return
if ($kind="polygon") then path:length($region)
else if ($kind="ellipse") then ellipse:perimeter($region)
else if ($kind="quad") then edge:length($region)
else if ($kind="cubic") then edge:length($region)
else if ($kind="edge") then edge:length($region)
else if ($kind="slot") then
this:length($region=>slot:body())
else 0
The remaining issues require a great deal more effort and thought. Node construction
and try/catch expressions cannot be translated within the context of XPath and need
to be converted to XSLT equivalents instead. Often that means pulling apart the whole
expression to re-express it in XSLT terms. Consider the example below: as you can
see, everything needs to be converted to XSLT. It is quite an involved undertaking.
Fortunately node construction plays a relatively limited role in my function libraries
and try/catch expressions are only used in a handful of places.
Example: XQuery source
Example: XSLT translation
for $node in $nodes return
typeswitch ($node)
case element(svg:feDisplacementMap) return
element {node-name($node)} {
$node/(@* except (@scale)),
attribute scale {
util:decimal($node/@scale * $rescale, 1)
},
this:replace-scale($node/*, $rescale)
}
case element() return
element {node-name($node)} {
$node/@*,
this:replace-scale($node/*, $rescale)
}
default return $node
The observant reader here notes the previous example shows a recursive transformation,
which is what XSLT is designed to do. A more idiomatic translation is obviously possible.
There are definitely tensions here between what is convenient for an initial conversion,
what is idiomatic, and what is easier to maintain. I have opted for less idiomatic
but easier to maintain in the two forms.
The final difficulty poses conceptual problems as well as technical ones. I categorize
named functions using annotations for documentation purposes. For example, all base
randomization distribution functions are marked with the annotation %art:distribution. Such annotations can be transferred over to XSLT functions as attributes and serve
essentially the same purpose. Other annotations, such as %saxon:memo-function provide performance hints to the processor. These can be translated to the corresponding
mechanism in XSLT (if there is one).
Annotations on anonymous functions pose a more difficult problem. XPath 3.1 allows
for inline function definitions, but not for function annotations on them. However,
certain libraries make heavy use of function annotations on inline functions to give
them usable names for metadata capture and debugging. The only solution is a wholesale
rewrite: wrapping the function items in a map along with the necessary annotations.
Unfortunately that changes the API for the callers, so a rewrite there is also necessary.[3]
Example: Original XQuery source
Example: Revised XQuery
declare function this:sdCappedTorus(
$angle as xs:double,
$ra as xs:double,
$rb as xs:double
) as function(xs:double*) as xs:double*
{
let $sc := (
math:sin(math:radians($angle div 2)),
math:cos(math:radians($angle div 2))
)
let $ra2 := $ra*$ra
return
%art:name("sd3:sdCappedTorus")
function($point as xs:double*) as xs:double* {
let $p := (abs(v:px($point)), tail($point))
let $pxy := (v:px($p), v:pz($p))
let $k :=
if (v:determinant($pxy,$sc) > 0)
then v:dot($pxy, $sc)
else v:magnitude($pxy)
return
math:sqrt(v:dot($p,$p) + $ra2 - 2.0*$ra*$k) - $rb
}
};
...
let $sdf := sd3:sdCappedTorus(60, 0.1, 0.4)
return (util:function-name($sdf)||"="||$sdf((0.1, 0.2)))
declare function this:sdCappedTorus(
$angle as xs:double,
$ra as xs:double,
$rb as xs:double
) as map(*)
{
let $sc := (
math:sin(util:radians($angle div 2)),
math:cos(util:radians($angle div 2))
)
let $ra2 := $ra*$ra
return
callable:named("sd3:sdCappedTorus",
function($point as xs:double*) as xs:double* {
let $p := (abs(v:px($point)), tail($point))
let $pxy := (v:px($p), v:pz($p))
let $k :=
if (v:determinant($pxy,$sc) > 0)
then v:dot($pxy, $sc)
else v:magnitude($pxy)
return
math:sqrt(v:dot($p,$p) + $ra2 - 2.0*$ra*$k) - $rb
})
};
...
let $sdf := sd3:sdCappedTorus(60, 0.1, 0.4)
=>callable:function()
return (util:function-name($sdf)||"="||$sdf((0.1, 0.2)))
This can then be translated into XSLT in the usual way.
Potential difficulties
There are a number of potential difficulties in an XQuery to XSLT translation that
turn out not to be a problem for me, either because they are not particularly relevant
to function libraries, or they are constructs that I haven't had occasion to use.
Extended FLWOR clauses
Ordered and unordered expressions
Other declarations
Crossing module imports
XQuery 3.0 extended the FLWOR expression with some additional clauses for grouping
and counting, none of which are available in XPath 3.1. All of these would need to
be converted into XSLT constructs. The entire FLWOR would need to be unpacked into an xsl:for-each or xsl:for-each-group instruction.
The XQuery module prologue includes a number of other declarations: schema imports,
decimal formats, and various mode and option declarations. Some of these have a direct
analogue in XSLT and some do not.
XQuery declaration
XSLT translation
import schema
xsl:import-schema instruction
decimal-format
xsl:decimal-format instruction
base-uri
xml:base on xsl:package
default collation
default-collation attribute on xsl:package
construction
Use appropriate validation attribute on specific constructors
copy-namespaces
Use appropriate copy-namespaces and inherit-namespaces attributes on specific constructors
default order
Make sure sort() calls use the right function
ordering
Ignore: this is a performance hint
ordered and unordered expressions
Ignore: these are performance hints
boundary-space
No easy translation
Serialization options
xsl:output instruction
Other option declarations
Processor dependent translation, or none
A more substantial problem would be the problem of crossing module imports. XQuery
allows module A to import module B and module B to import module A. If there happen
to be cyclic dependencies in functions or variables this will result in a dynamic
error (e.g. stack overflow) but not a static one. XSLT packages are stricter, however.
Cross imports are not permitted. This is sound software architectural practice. If
the XQuery modules to be translated have crossing imports, there is nothing for it
but to refactor them to avoid it.[4]
Maintenance
There is a final difficulty that deserves to be called out separately: maintaining
both sets of libraries. Because not everything is automated here, new versions cannot
be automatically regenerated. There are really only two options here and neither of
them is great:
Full automated translation: generating the stylesheets becomes a "compilation" of
sorts, scriptable in a build system.
Manual updates, with some automated assistance.
In theory, making the translation fully automatic should be possible: modify the XQDoc
source to include structured output for all expressions so that they can be rendered
out as XSLT. XQDoc works through the visitor pattern, defining functions for each
kind of node in the parse tree. Currently it just prints the function body as a string,
but it could continue visiting subcomponents and building up a structured result instead.
For example, a switch expression could be rendered out as an xsl:choose element.
In addition to the work involved, there are some downsides to this. First, it basically
turns long XPath expressions into a series of XSLT elements instead, making the functions
much less readable. That said, it ought to be possible to determine when a full translation
was required and when the raw function body could be used. This approach also means
forking XQDoc and having to maintain it rather than just leveraging updates.
Manually maintaining updates is not a great option, either. Doing all the work manually
makes it error-prone. It would be easy to miss some change. Some semi-automated assistance
here will help a lot. Here's the basic plan of attack:
Create a baseline: the automatic (unedited) XSLT translation for the XQuery modules
For a new release of the stylesheets, generate the automatic XSLT translation
Calculate the difference between the baseline and the new automatic translation
Insert and edit differences
In practice this is only a little tiresome when the first three steps are part of
the automated build system and most updates are handled live in the moment: about
as tiresome as creating edited release notes. It ensures that in every full release,
all the XQuery changes have been captured.
For day-to-day changes a slightly different differences-of-differences approach worked
well:
Compare the working version of the XQuery module to the baseline in source control
Compare the working version of the XSLT module to the baseline in source control
Compare the two sets of differences
Resolve inconsistencies as necessary
I created a tool that compares the differences of the XQuery version to the baseline
in source control with the differences of the XSLT version to that baseline, to make
sure that each check-in contained consistent changes on both sides. Filtering "return"
out of the differences made them more useful, as this was a large fraction of the
actual differences.
A Word on Performance
It is interesting to consider the relative performance of the XQuery functions versus
the XSLT functions, and indeed the relative performance of the compiled XSLT functions
used in Saxon-JS.
I have not done a detailed performance study, so these results should be taken as
suggestive rather than definitive. The numbers were obtained with a simple Unix time command and using Saxon 11.4 or Saxon-JS 2.5 as the processor. The Saxon-JS numbers
were run under node.js 16.18.1 over files compiled with Saxon 11.4. All times in seconds
unless indicated. These are timings on the running of various unit tests that had
been translated into all three contexts. Since Saxon-JS doesn't have a -repeat option and, frankly, because it takes too long to run anyway, these were run just
once so Java warm-up time is part of the measurement.
Test set
XQuery
XSLT
Saxon-JS
All tests
36.7
40.7
11+ hours
array test
1.3
1.6
0.5
colour-space test
5.5
5.9
38.3
complex test
1.7
1.9
0.7
distributions test
2.6
3.1
9.7
geometry test
5.1
6.7
1478s
point test
1.6
1.8
0.7
polynomial test
1.4
1.5
0.4
randomizer test
3.1
3.5
31.2
rectangle test
2.0
1.9
0.7
SDF test
10.7
11.2
10.5 hours
utilities test
1.4
1.6
0.6
The XQuery and XSLT performance numbers are fairly similar, with the XSLT times generally
running about ten percent slower. The Saxon-JS numbers are, frankly, baffling. Some
tests run much more quickly, probably because the time to parse and prepare the XQuery
or XSLT has been accounted for when we compiled to JSON and the Javascript runtime
initializes more quickly than the Java runtime. On the other hand, some of the times
are much slower, unusably so. The bulk of the time is taken up in signed distance and certain
geometric calculations (particularly intersections and interpolations), both of which
do a lot of floating point mathematics and function calls. Still, there is no obvious
reason for the numbers to be quite this bad, and it bears further investigation.[5]
Conclusions and Summary
We have examined several approaches to single-sourcing XQuery and XSLT function libraries
starting with an XQuery base.
Approach
Difficulty
Completeness
Result usable in
Simple import
Trivial
Complete
Saxon-PE, Saxon-EE; not Saxon-JS
Standard import
Fairly simple
Some type fixup for best results
Product supporting load-xquery-module(); not Saxon-JS, Saxon-HE, or Saxon-PE
String-based conversion
Moderate
Extensive editing for order, expressions, comments (improvement possible)
I was working with Saxon as both my tool and my target. Other alternatives are available
with some other processors such as MarkLogic or BaseX that have introspection capabilities.
For example, the BaseX inspection module has functions that return structured information about a module
either in XQDoc format or its own internal format, the list of functions in a module,
structured information about a specific function (including the attached XQDoc), the
type of a value, and so forth. The xquery:parse() function could be used to get the detailed parse tree of function bodies for more
complete automation instead of modifying the XQDoc processor.
Maintaining the parallel versions and running tests in all three environments (XQuery,
XSLT, node.js) has had a quite unexpected benefit of improving the code. Part of this
was just the extra proofreading that happens during the fix-up stage: every line of
code is examined twice, once to create it and once to translate it. You notice things.
In addition, the translation steps essentially introduced compilers into the mix,
which have to check every line of code soon after it is written. In an ideal world
unit tests would accomplish the same thing, but the truth is, something like "grind
on an image for an hour" is not something I choose to have a unit test for, and something
like "create a random octopus" is not something particularly amenable to a unit test,
so not every module gets run. Bone-head typos and errors have been introduced during
major refactorings. In addition, the different processors have slightly different
quirks and assumptions, and fixing code to work properly in all of them makes it better
for any of them.
With these techniques I have been able to realize my goal of creating XQuery libraries
and using them in XSLT and translating them into XSLT as well. Automatic binding generation
takes just a few seconds, and is incorporated as part of the build system. Converting
a new module to XSLT can generally be accomplished in a minute or so, manual post-processing
and all, which is sufficient for my needs. Automatic baselining and comparison has
been incorporated into the build and release scripts, piggy-backing off the documentation
generation, and has proved manageable. Code overall has improved.
[Laga06] Albin Laga, Praveen Madiraju, Darrel A. Mazzari and Gowri Dara.
Translating XSLT into XQuery.
In Proceedings of 15th International Conference on Software Engineering and Data Engineering
(SEDE-2006), Los Angeles, California, July 6-8, 2006.
An extended version An Approach to Translate XSLT to XQuery is available at http://www.mscs.mu.edu/~praveen/Research/XSLT2XQ/XSLT2XQ_Journal.pdf
[XQuery] W3C: Jonathan Robie, Michael Dyck, Josh Spiegel, editors.
XQuery 3.1: An XML Query Language
Recommendation. W3C, 21 March 2017.
http://www.w3.org/TR/xquery-31/
[1] One could pull the internal prefix from the XQuery module itself, from the module
namespace declaration. However my coding style is to always use the prefix this internal to an XQuery module and what is needed here is an outward-facing prefix,
not an inward-facing one.
[2] Although preparing for the implementation was a long week with a lot of assists from Emacs macros.
[3] It turns out this rewrite is necessary for using the compiled stylesheets in Saxon-JS
anyway because the full annotation mechanism relies on saxon:xquery which is not available in that context.
[4] One of my reviewers spelled out a rather clever technique for avoiding refactoring
to handle this problem by using static parameters used as flags on use-when attributes
on the imports. I haven't had occasion to try this myself.
[5] A future release of SaxonJS should address this issue: I look forward to testing it.
Albin Laga, Praveen Madiraju, Darrel A. Mazzari and Gowri Dara.
Translating XSLT into XQuery.
In Proceedings of 15th International Conference on Software Engineering and Data Engineering
(SEDE-2006), Los Angeles, California, July 6-8, 2006.
An extended version An Approach to Translate XSLT to XQuery is available at http://www.mscs.mu.edu/~praveen/Research/XSLT2XQ/XSLT2XQ_Journal.pdf
W3C: Jonathan Robie, Michael Dyck, Josh Spiegel, editors.
XQuery 3.1: An XML Query Language
Recommendation. W3C, 21 March 2017.
http://www.w3.org/TR/xquery-31/