Introduction[1]
XSLT1.0 was originally developed primarily as a client-side processing technology, to increase data-adaptability and presentational flexibility with a declarative model supported in the browser. By 2006 all the major desktop browsers supported it, but the rise in importance of mobile client platforms and their code footprint pressures then sounded the death-knell. However, remnants of support for the XPath 1.0 component of XSLT can be found in most, but not all, of the current desktop browsers.
In the meantime, spurred mostly by unexpected takeup of XSLT and XQuery in server-side
XML-based architectures, the technologies have progressed through the 2.0
stage
(temporary variables, type model and declarations, grouping, canonical treatment of
sequences,
extensive function suites...) to the current candidate recommendation standards for
3.0/3.1 in
XSLT, XQuery and XPath. At this stage support for maps, arrays and higher-order functions
join
let constructs, increased standard function libraries and others to provide a core
set of
common functionality based on XPath. XQuery uses this with its own (superset of XPath)
syntax
to support query-based operation and reporting. XSLT adds a template-based push model,
within
an XML syntax, which is designed to be able to support streaming processing in suitable
use
cases.
These developments are sufficiently robust and powerful that, other circumstances
permitting, exploiting XSLT3.0/XPath 3.1 to process XML data is highly attractive.
But at
present this can only be performed in server-side situations – no browser developers
could
contemplate either the development effort or the necessary memory footprints needed
for the
2.0
level compilers for what they consider niche (i.e. non-mobile)
applications, let alone that required for 3.0+.
What has been developed extensively in browsers are JavaScript processors, both internal compilers, JIT runtimes and development libraries, such that very significant programmes can be executed in-browser[2]. And in general the level of conformance of and interoperability between implementations in different browsers is reasonable.
Exploiting this JavaScript option for supporting XSLT has been explored in a couple of cases:
-
Saxon-CEDelpratt2013 cross-compiled a stripped-down XSLT2.0 Saxon compiler from Java into JavaScript using Google's GWT technology. This worked, but the loaded code was large, very difficult to test and debug and very exposed to GWT's cross-browser capabilities.
-
During 2016 Saxonica developed Saxon-JS Lockett2016, a runtime environment for XSLT 3.0, implemented in JavaScript. This engine interprets compiled execution plans, presented as instruction trees describing the execution elements of an XSLT3.0 stylesheet. The execution plan is generated by an independent compiler, which performs all necessary parsing, optimisation, checking and code generation. Consequently the runtime footprint is modest (~ 200kB in minimised form) and programme execution incurs no compilation overhead.
This paper describes on-going work to construct a compiler for
XSLT3.0, generating the runtime execution plan for interpretation by Saxon-JS, written
in
XSLT3.0. It uses very heavily an ability to compile
XPath3.1 expressions
contained within the source stylesheet, described later. This makes it possible to
consider
compiling XSLT3.0 packages entirely within a JavaScript environment, such as a browser,
or
supporting the transform()
function in XPath in such environments. It also may
form the basis for a portable
Saxon compiler for XSLT3.0, when
SEF-consuming
runtimes for other language environments might be
possible.
Saxon was originally written for the Java platform, but despite the mantra of "write once, run everywhere", there has been increasing demand over the years for Saxon's functionality to be made available on platforms that Java cannot reach. Sometimes this requirement can be met by cross-compilation: thus Saxon on .NET is developed by translating Java bytecode (using IKVMC) to the .NET equivalent, Saxon for C/C++ and PHP is developed using a Java compiler (JET) that generates Intel machine code, and Saxon-CE on the browser was developed using GWT, which translates Java source code to JavaScript. Cross-compilation, however, has its limitations, one of which is a total dependency on third-party tools that may be discontinued at any time. The Saxon-JS product introduced a new approach, by splitting the compile-time and run-time parts of the product and with a JavaScript runtime created an efficient way of executing Saxon stylesheets in the browser. But for other environments (Node.js being perhaps the most obvious example), users really need to be able to compile stylesheets on the target platform (in this case JavaScript) rather than compiling them on a Java platform and then moving the compiled code across to the target platform for execution. The classic solution to this problem Richards1971 is to write the front end of the compiler in its own language (that is, in this case, in XSLT). Given the availability of a run-time for a particular target platform, the compiler immediately becomes available on that platform. The idea of using XSLT to implement a compiler is intriguing: on the one hand, a compiler can be seen as a pipeline of tree-to-tree transformations which makes it a classic XSLT application; on the other hand, the data model and instruction set of XSLT is designed for handling a rather different kind of data than that found in traditional compilers.
An earlier attempt to implement part of the compilation phase of an XSLT compiler (specifically the optimizer) was reported in Kay 2007. The earlier experiment proved unviable, largely because of the expense of using XSLT to make a small modification to a large tree, and then performing this operation repeatedly (and recursively). A further problem was the difficulty of holding generalized values (for example a sequence of integers) as properties on an XML tree. It turns out that these obstacles are largely removed by the introduction of maps into the XDM data model. Maps in Saxon are implemented using an immutable hash trie, and a property of this data structure is that despite the fact that modification to a map always returns a new map and leaves the existing map intact, the new map can share most of the data in the old map, and the cost of the modification (in both time and space) is constant, rather than being proportional to the size of the map. While one can envisage similar optimizations to the tree structures used to represent XML documents, the fact that nodes in an XML document have distinct identity, and links to their parent nodes, makes it far more difficult; furthermore, the primitive operations on maps (put, remove) translate far more easily into simple operations on a hash trie than the complex instructions in XSLT used to construct XML result trees.
In the rest of this section we describe the Saxon-JS runtime, the extensions that
were
added to support dynamic XPath evaluation, and discuss what the execution-plan
format is. In subsequent sections we then outline the approach for the compilation
taken so
far and some of the problems faced. The testing strategy is described and then finally
we
outline further work and draw conclusions.
Saxon-JS Runtime
The Saxon-JS runtime environment is written entirely in JavaScript and is intended to interpret and evaluate XSLT program execution plans and write results into the containing web page. These execution plans are presented as instruction trees describing the execution elements of an XSLT3.0 stylesheet in terms of sets of templates and their associated match patterns, global variables and functions, sequence constructors consisting of mixtures of XSLT instructions and XPath expression resolution programs and other static elements such as output format declarations.
Usually XSLT programs for Saxon-JS are compiled specifically for this target
environment, using Saxon-EE to generate an XML tree as a
Stylesheet Export File (SEF)[3]. This tree is loaded by a SaxonJS.transform()
within a web page,
which then executes the given stylesheet, with a possible XML document as source and
other
aspects of dynamic context, writing results to various sections of the web page via
xsl:result-document
directives. The engine supports specialist XSLT modes
(e.g. ixsl:on-click
) to trigger interactive responses, based on a model
originally developed for Saxon-CE.
Saxon-JS supports a very large proportion of XSLT3.0 and XPath3.1 functionality, the notable current absences being schema-awareness, streaming, functional items and higher-order functions. As all the static processing (parsing, static type checking, adding runtime checks.…) of the presented XSLT stylesheet is performed by Saxon-EE during its compilation operation, programs running in Saxon-JS should benefit from the very high levels of standards compliance that the standard Saxon product achieves. At runtime Saxon-JS only has to check for dynamic errors and often these are programmed in the execution plan as specific check instructions. The consequences are that
-
The Saxon-JS code footprint is very much smaller, consisting only of a runtime interpreter.
-
Execution starts immediately after loading the instruction tree – there is no compilation phase
-
But Saxon-JS assumes the code is correct, as all static errors have been removed and necessary dynamic checks have been added.
Dynamic evaluation of XPath expressions
In late 2016 we extended the Saxon-JS runtime with a facility (written in JavaScript)
to
support dynamic execution of XPath expressions (needed for the xsl:evaluate
instruction), which involved a parser for the XPath 3.1 grammar and a compiler for
the
resulting parse tree to an appropriate SEF tree. Full details of this are available
in Lumley2017, but some aspects will necessarily be described in this paper.
This feature (whose code is of the same order of size as Saxon-JS itself) is implemented
as
a dynamically loaded package in Saxon-JS, so those stylesheets that don't use dynamic
evaluation avoid the additional loadtime penalty. (This XPath3.1 support can also
be invoked
directly from JavaScript, permitting use outside an XSLT context.)
Stylesheet Export File
The Stylesheet Export File was originally developed as a format to describe an exported, compiled, XSLT package for use in Saxon's support of package linking for XSLT3.0. As such it is required to describe all the necessary information required to reconstruct an operational transform. Full descriptions of its form can be found in Lockett2016, but the reader should understand the following.
The SEF is written in XML, in the namespace
http://ns.saxonica.com/xslt/export
, as a package
element with
the following structure of components, corresponding to the top-level
components of a complete and statically compiled stylesheet. Each component is defined
as a
<co id="
number
">
element
surrounding the definition itself.
Global parameters & variables |
The global |
Global functions |
Each function with a set of arguments and a |
Named templates |
Each named template, i.e. those |
Modes |
The defined or implict modes for template rules, ie.e those
|
Keys & accumulators |
Declarations for |
Other declarations |
Definitions of the other declarations of the stylesheet ( |
As an example here are a couple of samples, with various sections elided for brevity:
Table I
XSLT |
<xsl:stylesheet … version="3.0"> </xsl:stylesheet> |
---|---|
SEF |
<package xmlns="http://ns.saxonica.com/xslt/export" version="30" packageVersion="1" saxonVersion="9.7.0.15" timeStamp="2017-05-29T13:11:19.791+01:00" target="JS"> <co id="0" binds=""> <mode onNo="TC" flags="dW" patternSlots="0"/> </co> <overridden/> <output> <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/> </output> <decimalFormat/> </package> |
XSLT |
<xsl:stylesheet … version="3.0"> <xsl:variable name="var1">Variable1</xsl:variable> <xsl:mode on-no-match="shallow-copy"/> <xsl:template name="A"> <A> <xsl:value-of select="$var1"/> </A> </xsl:template> <xsl:template match="a"> <xsl:call-template name="A"/> <xsl:next-match/> </xsl:template> </xsl:stylesheet> |
SEF |
<package xmlns="http://ns.saxonica.com/xslt/export" version="30" packageVersion="1" saxonVersion="9.7.0.15" timeStamp="2017-05-29T13:23:58.27+01:00" target="JS"> <co id="0" binds=""> <globalVariable name="Q{}var1" type="document-node()" … visibility="PRIVATE"> <doc … validation="preserve"> <valueOf flags="Sl"> <str val="Variable1"/> </valueOf> </doc> </globalVariable> </co> <co id="1" binds="0"> <template name="Q{}A" … slots="0"> <elem role="body" name="A"> <valueOf flags="l"> <gVarRef name="Q{}var1" bSlot="0"/> </valueOf> </elem> </template> </co> <co id="2" binds="1"> <mode onNo="SC" flags="dW" patternSlots="0"> <templateRule prec="0" prio="0.0" seq="0" rank="0" minImp="0" slots="0"… > <p.nodeTest role="match" test="element(Q{}a)" jsTest="var q=SaxonJS.U.nameOfNode(item); return SaxonJS.U.isNode(item) && item.nodeType==1 && q.uri==''&&q.local=='a';"/> <sequence role="action" … > <callT name="A" bSlot="0"/> <nextMatch flags="ti"/> </sequence> </templateRule> </mode> </co> <overridden/> <output> <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/> </output> <decimalFormat/> </package> |
The first example is the minimal stylesheet, which has implicit built-in template
rules
with 'text-only-copy' behaviour, described by the unnamed mode with no child template
rules.
In the second example the xsl:call-template name="A"
is bound to its target
named template component, and the $var1
reference to its definition, by the
@bSlot
reference to the component-binding-vector
(@binds
) on its enclosing component. The template rule has two children: one
with a match
role, which is a pattern definition and the other an
action
role, which is the sequence constructor to be used when the template
is selected.
The general form of the bodies of these components is an XML tree that defines an
instruction/function invocation, such as
<forEach>
arg0,arg1
</
.
The name of the element describes the operation, in this case a
for-each, where the first child is the generator instruction of the
for
values and the second child defines the sequence constructor
instruction for each of those when bound as the context item. Saxon-JS's main loop
is an
interpreter of such instructions, systematically evaluating instruction children as
arguments and producing suitable results, usually as (lazy) iterators over sequences
of
atomic values, maps/arrays and result-tree fragments.
It is not surprising that most of these instructions correspond to either productions in the XPath grammars or XSLT declarations or instructions, often using the same name, or camelCase() thereof, as the major internal semantics of evaluation within XSLT sequence constructors and XPath expressions is functional.
Some of these instructions are parameterised with attributes, such as <fn
name="
fn-name
>
which calls the given
system function. Other attributes describe possibly-needed properties, such as active
namespace bindings (e.g. ns=" =foo map=~"
records that the default namespace is
"foo"
and that the prefix map
is bound to its conventional
namespace for the map functions.) For some instructions, the arguments are positional
(such
as forEach
) – for others (such as templateRule
) the
@role
attribute is used for identification. In other cases instructions of
given name are used (such as catch
within try
).
As will become apparent later, for some of the important instructions there is no
difference between whether the original code was in an XPath expression or an XSLT
tree
fragment. For example both $in !
body
and
<xsl:for-each
select="$in">
body
</
are both
represented by forEach
. Similarly xsl:variable
bindings are
represented as let
instructions surrounding their definition as the first child
and a sequence of subsequent result-generating children as the second, in exactly
the same
manner as let $var := … return …
would in an XPath expression. This means it
should be possible to reuse much of the code already developed for XPath compilation
within
an XSLT situation.
Compiling the XSLT
The goal of the project is to construct a high-conformance compiler from XSLT3.0 to SEF, the compiler itself being written in XSLT3.0 with the addition of XPath compilation functions, present in Saxon-JS, developed for the support of dynamic XPath evaluation. At this stage the compiler (which has been externally compiled/exported into SEF) is intended to run within the JavaScript environment of a browser. Given that both source (XSLT3.0) and destination (SEF) are well-defined XML, it was anticipated that using XSLT3.0 for the compiler would give a strong advantage, and so it proved.
This project is still in development, so some of the steps described
will change as we tidy up the code, extend functionality and streamline
the execution. In particular we've chosen to use several passes (6 or 7?) across the
stylesheet/package trees to increase visibility for debugging and limit the
everything's empty cascade
, primarily using a central XML tree representation
of the evolving program
during initial development of the compiler. Some of the
phase separations shouldn't really be possible and might be plain wrong in the most
general
case, but for early development and the simpler majority of test cases, make development
more
transparent. An example is in the interaction of static variables and parameters and
conditional compilation on resource collection where originally collection was followed
by
static processing but are now merged into a single pass – see section “Preprocessing: package collection and static and conditional processing”.
The development methodology was to get something running quickly – just compiling
the
simplest of stylesheets and then attempt to attach to a (browser-based) testdriver
running the
XSLT3.0 test suite (see section “Testing”). This framework not only allowed selected
tests to be compiled, run and checked but also to examine various intermediate stages
of the
compilation state, as well as diving under the hood
with the browser-based
JavaScript debuggers. The process was then to choose one XSLT instruction (we chose
xsl:choose
) and get both the XSLT test situation and the compiler operational,
and then slowly expand the set of features supported [6].
Once the compiler is fully operational, and we have an executing large example corpus,
we
can begin to merge phases together and rationalise some of the internal data structures,
especially exploiting map()
and array()
types when a mixture of
properties and tree fragments are needed.
The reader should note also that there are often several different strategies that will produce the same result in Saxon-JS so the fact that code shown here is not the same as that which Saxon-EE would produce isn't necessarily an error. However we trust the reader will get a sense of the approach being taken.
The main phases of the compilation process required are:
-
XML parsing – not really part of the compiler, we assume this has already been done before we start.
-
preprocessing – evaluation of static variables, use-when, shadow attributes, include/import.
-
validation and normalization – essentially the kind of validation that could be done with a schema processor, checking that elements and attributes appear in the right place and have permitted values .Currently this is performed as part of the
XSLT to SEF instructions
process, but might be better to do it as a free-standing phase. For example, normalizing boolean attributes fromyes|1|true
totrue
early on can save a lot of boring code later. -
indexing components: building symbol tables for named objects such as variables, functions, named templates and attribute sets, resolving references, checking for duplication.
-
parsing of XPath expressions and patterns.
-
type analysis: inferring the types of expressions, checking (as far as is statically possible) that these are consistent with required types, adding run-time type checking code to the tree where necessary.
-
optimization: replacing constructs with equivalent constructs that are faster to evaluate (for example, rewriting (position() = 1 to 10) with (position() ge 1 and position() le 10). Currently little of this is being done, save that some arithmetic expressions over constants are resolved during XPath compilation.
-
streamability analysis: not being done at the moment, since Saxon-JS does not implement XSLT 3.0 streaming. (Note that work reported in Lumley2014 describes an XSLT implementation of streamability analysis, which may be of relevance to a future XSLT-based compiler product.)
-
code generation: normally the last phase of a compiler, in this architecture all that happens here is actions such as allocating slots for variables and component references, generating JavaScript code for node tests and name tests, and optional serialization of the final SEF tree as lexical XML, including a checksum to deter attempts at manual modification.
In the current design most of the representation passed between phases is a (single)
XML
tree, but we anticipate fuller use of maps and arrays, that XSLT3 supports, describing
more
collections
of components and associated properties.
A simple walk-through
We'll start by a simple walk-though of one of the first test cases we used – involving
xsl:choose
. The stylesheet is:
<xsl:stylesheet … version="2.0"> <xsl:template match="/doc"> <out> <xsl:for-each select="person"> <xsl:choose> <xsl:when test="sex='M'"> Male: </xsl:when> <xsl:when test="sex='F'"> Female: </xsl:when> <xsl:otherwise> Who knows?: </xsl:otherwise> </xsl:choose> <xsl:value-of select="name"/> </xsl:for-each> </out> </xsl:template> </xsl:stylesheet>
For the moment we'll ignore the static collection phase and show the result of the first conversion pass:
<package … version="30" packageVersion="1" saxon-version="Saxon-JS-internal" target="JS" name="TOP-LEVEL" relocatable="false"> <co id="0"> <mode onNo="TC" flags="dW" patternSlots="0"> <templateRule prec="0" seq="0" ns="xml=~ xsl=~" minImp="-10" flags="s"… select="node()" match="/doc" module="choose-0101.xsl"> <xpath role="match" xpath="/doc" ns="xml=~ xsl=~"/> <elem name="out" type="element()" role="action" ns="xml=~ xsl=~" namespaces=""> <forEach> <xpath role="select" xpath="person" ns="xml=~ xsl=~"/> <sequence> <choose type="item()*"> <xpath role="" xpath="sex='M'" ns="xml=~ xsl=~"/> <valueOf> <str val=" Male: "/> </valueOf> <xpath role="" xpath="sex='F'" ns="xml=~ xsl=~"/> <valueOf> <str val=" Female: "/> </valueOf> <true/> <valueOf> <str val=" Who knows?: "/> </valueOf> </choose> <valueOf flags="l"> <fn name="string-join" role="select"> <convert from="xs:untypedAtomic" to="xs:string"> <data> <xpath role="select" xpath="name" ns="xml=~ xsl=~"/> </data> </convert> <str val=" "/> </fn> </valueOf> </sequence> </forEach> </elem> </templateRule> </mode> </co> <overridden/> <output> <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/> </output> <decimalFormat/> </package>
The package basically consists of a single unnamed mode of templates with just one
template. We leave the @match
for this template as an xpath
child,
recording both its role and the XPath/pattern expression it defines. Then we collect
up all
its sequence constructor components (in this case the <out>
element) as the
second child labelled as the action role. (If the template had had
parameters these would have been top-level children of the rule, hence the use of
the role
identification attributes.)
Within that sequence constructor we compile recursively. The <out>
becomes an element-generating instruction with the required name as a parameter[7]. Its single child is a xsl:for-each
, which compiles to a
forEach
leaving its content selector as an xpath role="select"
child, and its sequence constructs as a sequence
around the two children, which
have compiled into a choose
and a valueOf
.
The valueOf
contains instructions to convert the result into a
space-separated concatenation of the evaluation of the name
XPath on the
context item[8]. The choose
has been expanded as a sequence of test/result pairs,
were the test is an XPath that will checked against context and the
xsl:otherwise
has a test of true()
, for which a specific
instruction true
is provided in Saxon-JS. (During the conversion of
xsl:choose
to choose
checking of static language errors such as
xsl:otherwise
not being last or incorrect children, is carried out trivially
by a series of tests such as
exists(xsl:otherwise/(following-sibling::*|following-sibling::text()[matches(.,'\S')])
During the next step we compile all the embedded XPath expressions, leaving their resulting instruction trees in situ:
<package...> <co id="0" binds=""> <mode onNo="TC" flags="dW" patternSlots="0"> <templateRule prec="0" seq="0"...> <docOrder role="match"> <slash> <fn name="root" type="node()?"> <treat type="node()" jsTest="return SaxonJS.U.isNode(item);" … > <dot type="item()" … /> </treat> </fn> <axis name="child" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return item.nodeType==1 && q.uri=='' && q.local=='doc';"/> </slash> </docOrder> <elem name="out" type="element()" role="action" ns="xml=~ xsl=~" namespaces=""> <forEach forType="element()" type="item()*"> <axis role="select" name="child" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return item.nodeType==1 && q.uri=='' && q.local=='person';" … /> <sequence type="item()*"> <choose type="item()*"> <gc role="" type="xs:boolean" card="1:1" comp="..." op="="… > <data type="xs:anyAtomicType*"> <axis name="child" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return item.nodeType==1 && q.uri=='' && q.local=='sex';"/> </data> <str val="M" type="xs:string"/> </gc> <valueOf> <str val=" Male: "/> </valueOf> … </choose> <valueOf flags="l"> <fn name="string-join" role="select"> <convert from="xs:untypedAtomic" to="xs:string"> <data> <axis role="select" name="child" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return item.nodeType==1 && q.uri=='' && q.local=='name';"… /> </data> </convert> <str val=" "/> </fn> </valueOf> </sequence> </forEach> </elem> </templateRule> </mode> </co> <overridden/> <output> <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/> </output> <decimalFormat/> </package>
Now the XPath expressions have been expanded, two of them to a plain axis
and one to a general comparison of an axis value against a string. Each of the
axis
instructions contains some JavaScript code to qualify node items for
inclusion in the axis search results. (Note well – these code statements have been
line-wrapped for brevity – in practice they must not contain newlines,
as JavaScript treats this as end of statement
.) Also the type result of the
expression has been determined and propagated upwards so that result trees are now
typed.
This becomes important in issues of whether type casts are necessary, dynamic
type/cardinality checks are required, what the type of the context-item is or if
statically-determined type errors have occurred.
Now we complete the process by converting the XPath determined for the match pattern into a true pattern:
<package...> <co id="0" binds=""> <mode onNo="TC" flags="dW" patternSlots="0"> <templateRule prec="0" seq="0" … prio="0.5"> <p.withUpper role="match" axis="parent" upFirst="true" type="element()*"> <p.nodeTest test="element()*" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return SaxonJS.U.isNode(item) && item.nodeType==1 && q.uri=='' && q.local=='doc';"/> <p.nodeTest test="document-node()" type="node()?" jsTest="return SaxonJS.U.isNode(item) && (item.nodeType==9 || item.nodeType==11);"/> </p.withUpper> <elem name="out" type="element()"… > ….
In this case the pattern matches doc
when the parent node is a
document-node()
or a document fragment. The default priority for this pattern
has been determined, based on the complexity
of the pattern tree and written
onto the rule, when priority has not already been declared. This, along with the import
precedence and sequence position will be used to order the rules for matching and
using
xsl:next-match
[9].
As should be apparent, these code samples can be large. In the rest of the document I'll concentrate on the main operations and phases of the compilation process and some of the more complex constructs.
Preprocessing: package collection and static and conditional processing
The first phase of package formation is to collect together from the top level
package
, all the necessary resources in terms of required subsidiary
packages and imported and included stylesheets. At present we are only addressing
resources
in the form of XSLT stylesheets, not packages. At first glance this should be a
comparatively simple exercise with templates matching the collection directives
(xsl:import|xsl:include
), deferencing the document locations and including
contents recursively. However there are two, or three, complications.
Some components, such as templates, in stylesheets that are
imported have a lower precedence in execution that those in the
importing stylesheets, or sibling imported stylesheets. This is an important mechanism
for
libraries
in XSLT1.0 and 2.0 and requires accurate tracking of importation
position
.
In addition, in XSLT 3.0 stylesheets can be parameterised statically in two ways,
which
can influence the collection result significantly. Firstly elements of the stylesheet
can be
included conditionally by @[xsl:]use-when
decorations, using XPath expressions
that can involve static global variables and parameters as well as the
more usual features of the compiling environment, such as the implementation name
or version
number. Secondly the values of attributes on XSLT elements can be
computed at compile-time from such static variables. This feature, termed shadow
attributes, would for example permit different versions of a library to be
included at compile time, dependent upon a supplied parameter (using
_href="lib-{$version}.xsl"
) .
A further compilcation arises from the possibility that static (global) variables
in
imported or included stylesheets have effective scope not just within their containing
stylesheets, but also for all subsequent components in calling
stylesheets,
which infers they migrate to top level – effectively the scope then becomes
following::*
. This requires static variable bindings to migrate up along with
included and imported preprocessed stylesheets.
Given the necessity of transmitting state (static variables, import precedences) along the sibling axes of the stylesheets during this process, we use an iterator over the stylesheet which returns a map, of the general form:
<xsl:mode name="static" on-no-match="shallow-copy"/> <xsl:template match="xsl:stylesheet | xsl:transform" mode="static" priority="1.5" as="map(*)"> <xsl:param name="import-precedence" select="1" tunnel="true"/> <xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/> <xsl:variable name="me" select="."/> <xsl:iterate select="*"> <xsl:param name="static-vars" as="map(*)" select="map{}"/> <xsl:param name="precedence-base" select="$import-precedence"/> <xsl:param name="parts" select="()"/> <xsl:on-completion> <xsl:variable name="sheet" as="element()"> <xsl:for-each select="$me"> <xsl:copy> <xsl:sequence select="@*, $parts"/> </xsl:copy> </xsl:for-each> </xsl:variable> <xsl:sequence select="map{'sheet':$sheet, 'precedence':$precedence-base, 'static-vars':$static-vars}"/> </xsl:on-completion> <!– Process component --> </...> <xsl:template match="component" mode="static">....
where the static evaluation state can be altered for following-sibling
components, by a next-iteration
instruction with altered parameter values,
which also accumulates the components of the stylesheet though the parts
parameter.
First we evaluate and accumulate the state of the static variables and parameters.
These
are global variable or parameters of a stylesheet that have been declared
@static="yes"
. These are restricted in the sense that their values can only
be defined either by external compile-time binding (for parameters) or by an XPath
expression which involves the usual static context and reference to other
earlier-defined static variables or parameters only, though static variables
from included stylesheets do come into scope.
Pleasingly in the most common cases this turns out to be comparatively straightforward,
relying on the two restrictions: i) that static variables can only set
a value via an XPath expression in their @select
attributes (i.e. no sequence
constructors) and the static context for that expression excludes any user-defined
functions
in the package and ii) reference to variables can only be to prior static global variables.
These together mean that given the determined values of all previously defined static
variables, the value for the next variable can always be determined by an
<xsl:evaluate xpath="@select">
action, supplying a namespace context and a
map of those previously defined variable values on the @with-params
map
argument. So our process step for such variables is:
<xsl:when test="self::xsl:variable/@static = ('yes', 'true', '1')"> <xsl:variable name="possible.component" as="element()?"> <xsl:apply-templates select="." mode="#current"> <xsl:with-param name="static-vars" select="$static-vars" tunnel="true"/> </xsl:apply-templates> </xsl:variable> <xsl:if test="exists($possible.component)"> <xsl:variable name="value" as="item()*"> <xsl:evaluate xpath="@select" namespace-context="." with-params="$static-vars"/> </xsl:variable> <xsl:next-iteration> <xsl:with-param name="static-vars" as="map(*)" select="map:put($static-vars, resolve-QName(@name, .), $value)"/> <xsl:with-param name="parts" select="$parts,$possible.component"/> </xsl:next-iteration> </xsl:if> </xsl:when>
It is first necessary to process the variable declaration itself, as it can be subject
to conditional compilation (as discussed just below). If the declaration still
stands
we evaluate its @select
expression with all previous static
variables in scope and then add that name/value mapping to the static-variable state
for the
next iteration on subsequent components, with the variable definition added to the
list of
component parts[10].
The conditional inclusion directive (@[xsl:]use-when
) is processed by a
simple high-priority template, shown below, matching elements which include the directive.
When the directive is a shadow attribute it is first preprocessed to
expand the expression value. Then the expression is evaluated in current static variable
context. If its boolean value is true then the element is processed conventionally.
If not,
nothing is written to the expanded stylesheet.
<xsl:template match="xsl:*[@use-when | @_use-when] | *[@xsl:use-when | @xsl:_use-when]" priority="3" mode="static"> <xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/> <xsl:variable name="use-expression" as="xs:string"> <xsl:choose> <xsl:when test="@*:_use-when"> <xsl:apply-templates select="@*:_use-when" mode="#current"/> </xsl:when> <xsl:otherwise> <xsl:sequence select="@*:use-when"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="use" as="xs:boolean"> <xsl:evaluate xpath="$use-expression" namespace-context="." with-params="$static-vars"/> </xsl:variable> <xsl:if test="$use"> <xsl:next-match/> </xsl:if> </xsl:template>
Shadow attributes are processed by a similarly simple template, writing a new attribute with the appropriate name.
Finally we can now track the importation precedence. The goal is that components in a higher-level stylesheet have higher precedence than those in any imported sheets, and lower than any following-sibling sheet. If we make a simplifying assumption, common in most designs and mandated in XSLT 1.0 and 2.0, that any importations, or inclusions that include importations, in a stylesheet precede any templates or other components that need to track precedence, then we can process in the same iteration:
<xsl:when test="self::xsl:import"> <xsl:variable name="processed-directive" as="element()?"> <xsl:apply-templates select="." mode="#current"/> </xsl:variable> <xsl:if test="$processed-directive"> <xsl:variable name="doc" as="document-node()"> <xsl:call-template name="collect-doc"> <xsl:with-param name="uri" select="resolve-uri($processed-directive/@href, base-uri(.))"/> </xsl:call-template> </xsl:variable> <xsl:variable name="collected" as="map(*)"> <xsl:apply-templates select="$doc/*" mode="#current"/> </xsl:variable> <xsl:variable name="max-precedence" as="xs:integer" select="xs:integer(max(($precedence-base, $collected?precedence)))"/> <xsl:next-iteration> <xsl:with-param name="precedence-base" as="xs:integer" select="$max-precedence + 1"/> <xsl:with-param name="parts" select="$parts,$collected?sheet/*"/> <xsl:with-param name="static-vars" select="$collected?static-vars"/> </xsl:next-iteration> </xsl:if> </xsl:when>
The technique processes the imported stylesheet recursively, recording precedence as an attribute on components that require it and then finds the maximum of the precedences recorded over the (expanded) stylesheet. One more that that precedence is then used for the remaining components.
However, XSLT 3.0 has liberalised the situation, such that importations, and inclusions that contain importations, can occur anywhere in the top level of a stylesheet. This means that templates that precede an importation will now have higher precedence than those imported, as well as those that follow. To handle this case the process is changed to use a map-base tree of importations and their precedences and a recursively-sensitive algorithm to distribute computed precedences across elements in this tree.
During this phase we also track and record active namespaces and the base URI for
each
element. During subsequent processing, with a plethora of temporary trees, ancestry
is
frequently lost and these important early properties often with it. They are currently
recorded as attributes in a separate namespace for later bulk removal. (This also
a suitable
point to carry out some minor forms of canonicalisation, such as normalizing whitespace
on
@as
type declarations. Most of these will probaby migrate to a separate
normalization phase.)
XSLT to SEF instructions
With the stylesheet expanded statically, the bulk of the compilation action involves conversion of XSLT instructions to suitable SEF elements and subtrees. At present we perform this in two phases – conversion without XPath compilation, followed by compilation of XPath expressions and typechecking.
Naturally XSLT's push
mode is highly suitable for the converstion of XSLT
components and instructions to SEF. Here are some simple examples:
<xsl:template match="xsl:for-each" mode="sef"> <xsl:param name="attr" as="attribute()*" select="()"/> <xsl:call-template name="check-select"/> <xsl:sequence select="f:check-sort(.)"/> <forEach> <xsl:sequence select="$attr"/> <xsl:iterate select="xsl:sort"> <xsl:param name="inner"> <xsl:apply-templates select="@select" mode="create.xpath"/> </xsl:param> <xsl:on-completion select="$inner"/> <xsl:next-iteration> <xsl:with-param name="inner"> <sort> <xsl:sequence select="$inner"/> <xsl:apply-templates select="." mode="#current"/> </sort> </xsl:with-param> </xsl:next-iteration> </xsl:iterate> <xsl:call-template name="sequence-constructor"/> </forEach> </xsl:template>
In compiling a xsl:for-each
we first check it has a @select
attribute (an error will be thrown by the named template if not) and then that any
xsl:sort
children are first. The forEach
SEF instruction has
possible attributes added through the $attr
parameter – this is usually to set
a possible role identifier, such as when the xsl:for-each
is the only (action)
child of a template rule. Then the selected content is created as an xpath
element nested inside any defined xsl:sort
declarations, by using an iterator.
Finally the sequence constructor instructions are generated using a canonical named
template. (Note that regardless of the number of xsl:sort
children, the
forEach
only has two children – the first a content-generating instruction
(e.g. sort[sort[xpath]]
), the second a sequence constructor.)
<xsl:template match="xsl:function" mode="sef"> <xsl:variable name="params" as="element()*"> <xsl:apply-templates select="xsl:param" mode="#current"/> </xsl:variable> <co> <function name="{f:EQName(@name,.)}" as="{(@as,'item()*')[1]}" slots="{count($params)}"> <xsl:call-template name="record-namespaces"/> <xsl:sequence select="$params"/> <sequence role="body"> <xsl:apply-templates select="* except xsl:param" mode="#current"/> </sequence> </function> </co> </xsl:template> <xsl:template match="xsl:function/xsl:param" mode="sef"> <arg slot="{position() - 1}" name="{f:EQName(@name,.)}"> <xsl:sequence select="@as"/> </arg> </xsl:template>
An xsl:function
is compiled to a separate package component with a
function
child, with name, type and number of argument slots properties. The
active namespaces are then recorded so that in subsequent XPath compiling the correct
prefix-namespace bindings are available. The parameters become named and typed
arg
children, each with an allocated value slot. Finally the sequence
constructor is compiled from all the non-parameters
and identified as having
the body role for the function.
Some other declarations on stylesheet elements (@default-collation
,
@expand-text
etc.) are also processed during this phase, by modifying the
static compiling context, which is currently distributed across a set of tunnelled
variables[11]and which can be retrieved for compilations that are dependent upon them. As
these declarations are orthogonal to each other, we arrange a series of priority-ordered
templates which pass on responsibility for final effect using xsl:next-match
with a suitable tunneled parameter. For example:
<xsl:template match="xsl:*[@xpath-default-namespace]" priority="3" mode="sef"> <xsl:param name="attr" as="attribute()*" select="()"/> <xsl:next-match> <xsl:with-param name="xpath-default-namespace" select="@xpath-default-namespace" tunnel="true"/> <xsl:with-param name="attr" select="$attr"/> </xsl:next-match> </xsl:template> <xsl:template match="xsl:*[@expand-text]" priority="3.3" mode="sef"> <xsl:param name="attr" as="attribute()*" select="()"/> <xsl:next-match> <xsl:with-param name="expand-text" as="xs:boolean" select="@expand-text = ('true', 'yes')" tunnel="true"/> <xsl:with-param name="attr" select="$attr"/> </xsl:next-match> </xsl:template>
Local variable and template or iteration parameter declarations are processed in part
at
this stage, but left as specific VARDEF
or PARAMDEF
elements, with
name and possible type attributes and their value generator as a child, either from
a
@select
expression or a sequence constructor. They will be finally converted
in the next phase.
At this point we have mappings of almost all of the XSLT declarations and instructions into SEF equivalents, save that XPath expressions, matching patterns and parameter and variable declarations have not been finalised.
XPath and typing
Now we have the main elements of the package as a series of components with subtrees.
Some of the components define global variables, parameters and functions, others define
named-templates and modes of template rules. Scattered throughout are uncompiled XPath
expressions described as <xpath
xpath="
expression
"… />
elements, as
well as unprocessed variable and parameter definitions as VARDEF|PARAMDEF
subtrees. We now proceed to compile these XPaths, by using the
saxon:compile-XPath()
function developed for use in the support of
xsl:evaluate
. But in this process, these XPath expressions can, and often do,
make reference to resources that are defined elsewhere in the stylesheet, namely
user-defined functions and global and local variables and parameters, or even the
type of
the context-item for its evaluation. Such reference covers not only their existence
(and
possible slot
for their value) but equally crucially the
type and cardinality of value/result/argument they deliver or expect.
For functions the process involves forming a map, keyed by name and arity, of each
of
the function bodies – this is passed in as part of the static context state to XPath
compilations – when the language
property of the compilation is
XSLT
then these definitions are searched for function calls if the standard
library fails to match[12]. Static type checking of both arguments and results of calls to these
user-defined functions is carried out in exactly the same manner as for built-in functions,
save that the call instruction is ufCall
rather than fn
and the
call contains a binding to the appropriate package component for the function body.
Global variable and parameter definitions are passed into XPath compilation as maps of name→slot and name→type after having themselves been compiled to both build value generation instructions and determine value type. (As they can acyclically reference each other, this is carried out in a similar manner as for local variables in scope.)
For local variables and parameters we need to handle things with regard to their
scoping. Recall that in the previous stage we left them as VARDEF
or
PARAMDEF
children, typically within a sequence constructor:
<xsl:variable name="words" select="tokenize(.)"/> <xsl:variable name="chars" select="sum($words!string-length(.))"/> <xsl:copy> <xsl:attribute name="words" select="count($words)"/> <xsl:attribute name="chars" select="$chars"/> </xsl:copy> → <sequence> <VARDEF name="words"> <xpath role="select" xpath="tokenize(.)"… /> </VARDEF> <VARDEF name="chars"> <xpath role="select" xpath="sum($words!string-length(.))" … /> </VARDEF> <copy flags="cin"> <sequence> <att name="words" type="attribute()" … > <fn name="string-join" role="select"> <convert from="xs:untypedAtomic" to="xs:string"> <data> <xpath role="select" xpath="count($words)" … /> </data> </convert> <str val=""/> </fn> </att> <att name="chars" type="attribute()" … > … <xpath role="select" xpath="$chars" … /> … </att> </sequence> </copy> </sequence>
The scope of variables in XSLT is along the
following-sibling::*/descendant-or-self::*
compound axis. Thus for an
instruction which has a child defining a variable we need to bind the variable and
ensure
the correct reference is made in any subsequent XPath. The mechanism for such in SEF
is a
let
, in exactly the same manner as that within an XPath expression, with two
children – the first defines the value, the second the result generator. The
let
itself describes the name of the variable and most importantly its value
binding slot. Thus we need to generate suitably nested let
s which we do with a
recursive template. For each VARDEF
encountered as the child of a
variable-containing element it i) compiles the XPaths in its definition using the
current
static context, ii) modifies static state to include the binding of the variable name,
its
determined type and its allocated slot to store the run-time determined value, iii)
recursively processes all following siblings, with the variable now in-scope
for reference in any XPath expressions and iv) wraps within a let
instruction
the variable defining instructions and a sequence constructor around the processed
remaining
siblings. Thus our example compiles to two nested let
s binding and referencing
variables in slots 0 and 1:
<let var="words" slot="0" type="xs:anyAtomicType?"> <fn role="select" name="tokenize" type="xs:string*" … > <cvUntyped type="undefined" to="xs:string" diag="0|0||tokenize"> <data diag="0|0||tokenize" type="xs:untypedAtomic?"> <dot type="element()" diag="0|0||tokenize"/> </data> </cvUntyped> </fn> <let var="chars" slot="1" type="xs:anyAtomicType?"> <fn role="select" name="sum" type="xs:anyAtomicType?" … > <forEach type="xs:integer" diag="0|0||sum"> <varRef name="Q{}words" slot="0" type="xs:string*"/> <fn name="string-length" type="xs:integer"> <dot type="xs:string" diag="0|0||string-length"/> </fn> </forEach> </fn> <copy flags="cin"> <sequence type="attribute()*"> <att name="words" type="attribute()" … > <fn name="string-join" role="select"> <convert from="xs:untypedAtomic" to="xs:string"> <data> <fn role="select" name="count" type="xs:integer" … > <varRef name="Q{}words" slot="0" type="xs:string*" … /> </fn> </data> </convert> <str val=""/> </fn> </att> <att name="chars" type="attribute()" … > <fn name="string-join" role="select"> <convert from="xs:untypedAtomic" to="xs:string"> <data> <varRef role="select" name="Q{}chars" slot="1" type="xs:anyAtomicType?" … /> </data> </convert> <str val=""/> </fn> </att> </sequence> </copy> </let> </let>
We can now look at typing. Checking and adapting to static or dynamic type is critical
for XPath and XSLT and even more so for Saxon-JS, as the runtime makes lots of assumptions
about argument types, or the presence of runtime type checking or casting instructions
being
present in the SEF tree. Consequences of error in processing the tree for these cases
usually result in a JavaScript error through a null pointer reference. The XPath compiler
has a built-in typechecker (in JavaScript) and we have mildly extended the compiler
such
that at least the top level result type appears on the compiled SEF tree as a
@type
attribute.
This means that after XPath compilation we should know what expected types will result from that expression. We can now do static type checking at the XSLT level, though at present parts of the code are a little rough-and-ready: eventually we'll converge to a consistent coherent type checker, either exposing the JavaScript one or a new one written in XSLT. The typing is used in a number of places:
-
Determining the type of variables and parameters, either from the inferred type of the
@select
expression, or the defined (@as)
type or otherwise form the sequence constructor, such that correct type information is used for compiling XPath expressions that reference those variables. -
Inferring the type of the context-item, either from a template matching or in the body of a context-altering instruction, such as
xsl:for-each
where the context-type fordot
in the body is the singular of the type returned by the@select
expression. -
Determining whether a cast, a dynamic check or other instruction should wrap an element in the generated SEF tree, or indeed whether a static type error must always result.
Patterns
Patterns to match items are critical to XSLT templates and are also used in other
features such as xsl:number
, accumulators and keys. Whilst looking like XPath
expressions, technically they have a different top-level grammar, some (subtree) productions
of which are identical to those of XPath. In XSLT3.0 pattern possibilities have increased
a
great deal over XSLT2.0, especially in matches for non-nodal items. Eventually we'll
have to
bite the bullet and produce a variant of the XPath compiler (written in JavaScript
and used
for dynamic evaluation) which handles the XSLT pattern grammar and generates SEF
instructions using the additional pattern-related instructions. This is particularly
the
case for positional predicates
in patterns, where integer values denote
qualifications on sequence position and the conversion becomes distinctly tricky and
sensitive.
However for a lot of the simpler patterns, it is possible to compile them as if they
were an XPath and then convert the (higher) sections of the SEF tree into suitable
pattern
elements. For example a/b[c]
will compile if treated as XPath as:
<docOrder type="element()*" … > <slash> <axis name="child" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return item.nodeType==1 && q.uri=='' && q.local=='a';"/> <filter type="element()*"> <axis name="child" type="element()*" jsTest="… q.local=='b';"/> <fn name="exists" type="xs:boolean"> <axis name="child" type="element()*" jsTest="… q.local=='c';"/> </fn> </filter> </slash> </docOrder>
but as a pattern should be:
<p.withUpper role="match" axis="parent" upFirst="true" type="element()*"> <p.withPredicate type="element()*"> <p.nodeTest test="element()*" type="element()*" jsTest="var q=SaxonJS.U.nameOfNode(item); return SaxonJS.U.isNode(item) && item.nodeType==1 && q.uri=='' && q.local=='b';"/> <fn name="exists" type="xs:boolean"> <axis name="child" type="element()*" jsTest="… q.local=='c';"/> </fn> </p.withPredicate> <p.nodeTest test="element()*" type="element()*" jsTest="… q.local=='a';"/> </p.withUpper>
Such simple conversions can be done with a modest number of templates used in a push mode. With such a pattern we can then also determine the default priority for a template using it, by matching various elements in the resulting pattern tree. This conversion is also used to create a separate template for each member of a union of patterns on a template match, as they may have separate default priorities and type for the context item matched.
Component binding
With all the components now created, the final remaining step is to link them as
required through their component binding vectors. Each invocation instruction requiring
linking (such as applyT
for xsl:apply-templates
to the appropriate
mode
containing relevant template rules) has a @bSlot
attribute
which is a number indexing into the containing component @binds
attribute (the
binding vector), which is a whitespace sequence of numbers corresponding to the
@id
of the component linked. This process is performed by a simple collection
of mappings from global variables, named templates and modes and the components containing
their definitions and a recursive tree-walk using a small number of templates. With
this
operation the compilation is complete and the SEF tree can be used as an execution
plan for
evaluation of the given transform.
Testing
In the development of the dynamic XPath support, we managed to build a testdriver
framework, written in XSLT, that could exercise a great deal of the QT3 test suite,
passing
each required test to xsl:evaluate
and checking the result assertions and
reporting the results entirely within a webpage. (More details can be
found in Lumley2017.) By using this tool, which also enabled compiled
execution plans to be observed (using serialization) and the various phases of the
compilation
controlled, we managed to acheive very high levels of standards conformance, with
some 18,000+
tests passed and O(100) failures, mostly in obscure edge conditions.
Obviously a similar approach, using the ~9,000 relevant tests of the XSLT3.0 test
suite
would be attractive and, given previous experience, an important, even central, tool
in the
design development. The testdriver framework was generalized using a modest amount
of
conditional compilation and two differing test stub
XSLT stylesheets for XSLT
and XPath testing. This enabled standard tests to be employed from very early on,
and with
some option controls it was possible to examine various phases of the compilation
process on a
given test (such as static expansion, XSLT conversion, XPath expansion, pattern formation
and
component binding) to isolate bugs, faults and errors. Usually the procedure was to
select a
single test-set (comprising 10-100 test cases) and first work to get the compilation
phase for
all of them avoiding a JavaScript error (usually a null pointer reference). Then this
was
extended to the run of these compilations and checking the result assertions.
A useful consequence of this was that parts of the XSLT test suite exercised portions both of XPath semantics and Saxon-JS requirements that presumably hadn't been covered before, as we discovered (and corrected) a number of new errors, both in XPath compilation and the Saxon-JS runtime.
Does it work?
To start with, this was just a small experiment to generate an SEF for some simple
XSLT
using Saxon-JS in the browser, but it became clear fairly quickly that something more
substantial and with significant standards conformance might be possible. An early
goal was to
be able to get most of the XSLT tests for xsl:evaluate
at least running and
preferably passing. These tests involve a large mixture of namespace bindings, functions,
data-borne expressions and so forth, so getting this going would be, as was, quite
a milestone
– currently only 5 of 36 fail.
At the time of writing, of the 120 different categories of relevant test (stylesheet
attributes, declarations, instructions etc.) the tests in 40 of them are
working
(some 2,400 tests in total), which means the tests compile, run and
perhaps >90% of them pass or generate the wrong error[13]. The rest of the categories have not been addressed specifically, but there is a
lot of cross-dependence between the the categories of course.
Performance (at least on a high-end laptop) was pleasing. For example the compiling,
running and checking of the 51 tests for xsl:choose
takes some 7 seconds, the 36
tests for xsl:evaluate
similar.
One of the original goals of Saxon-JS was to reduce the loading memory footprint, which had been just under 1 MB for Saxon-CE. This has been achieved as the minified Saxon-JS runtime is about 220kB. XPath evaluation support (additional JavaScript) is dynamically loaded by Saxon-JS and takes roughly the same amount of memory (~180kB minified). The compiler (inside whatever front-end is being used) is currently loaded as a complete SEF into Saxon-JS and is about 800kB, but contains much information redundancy, as a ZIP compression reduces it to a tenth of its size.
The technology is however sufficiently advanced that a simple editor/executor for XSLT stylesheets can be run completely in the browser, using all the features of Saxon-JS to support drag-and-drop interactive programming.
One of the eventual acid tests for portability of a compiler-written-in-itself, is
an
ability to successfully compile itself into a form that can repeatedly
compile itself idempotently. In some previous work (Lumley2015) we showed
how an XSLT3.0→2.0 converter, written in XSLT3.0, could convert itself and the resulting
XSLT2.0 conversion successfully and recursively
convert the original XSLT3.0
source.
The current development is a long way from that stage yet, but theoretically the compiler is only using (some of) the features of XSLT for which it can compile successfully, so it should be possible. The XSLT3.0 repertoire supported by the current target platform (Saxon-JS) has limitations (higher-order functions might be most useful), but of course the compiler cannot use them. Interestingly enough are some of the features that the compiler (writer) hasn't used thus far – attribute sets and accumulators amongst them.
Possible developments and Conclusion
In this paper we definitely are reporting on on-going work. The first intention is
to
complete a compiler that will run in our current environment, that of Saxon-JS operating
in a
browser. With that complete
we can start to consider mechanisms to make some of
the technology more generic and exploitable on other platforms, probably starting
with
Node.js.
Completing the browser compiler
Parts of the compiler are being modified as new issues are uncovered and understanding increases. The original design was almost entirely based on a central XML tree and tunnelled variables handling separate components of static compiling state.
As illustrated in the section section “Preprocessing: package collection and static and conditional processing” we can begin to use compound (map, XML) data structures more heavily in the design. Two obvious candidates are firstly to group all the static context into a single map rather than a set of tunneled variables. The second is build a better type representation structure for the XSLT processing – currently compiled XPath expressions leave their determined type as an attribute, and in the XSLT this is mostly manipulated with string processing, which isn't really robust enough.
The XPath compiler already has a type system using a simple JavaScript property pair
(.baseType
, .card
) If we arrange that the XPath compiler returns
a tree/type (map) structure then we may be able to avoid the necessity of attribute-recorded
type properties[14].
Even more interestingly, we may be able to expose as an extension function the static type checking code that is already present in the XPath compiler to use from XSLT – this will ensure much greater type coherence between the two language implementations.
A generic compiler framework
This work was only possible because of the very sizeable effort in building the Saxon-JS runtime interpreter of SEF XML trees. Building the XPath compiler involved addition of an XPath parser (whose portability is dependent on the excellent REx REx parser-generator we have used for both XSLT and JavaScript implementation targets) followed by an parse tree reducer (in the spirit of Pemberton2013) written in JavaScript. The compiler itself (and the static type checker) were also JavaScript. But it would be entirely possible to write those later phases in XSLT, using some of the structures of and lessons from the compiler for XSLT, albeit at a (compile-time) performance penalty.
This then opens the possibility of reducing
the portability issue for a
new target to three stages – procuring/generating an XPath parser (and arranging tree
reduction), developing the data model and API and the minor
issue of building
the SEF interpreter itself.
Other points
One of the most useful features within this project (and indeed the XPath compiler
too)
has been the early writing of an extended, though non-compliant serialize()
function. Unlike the standard function it will serialise sequences of items into an
XPath-like form ( e.g. (1,'fred', <foo>)
) as well as expanding/indenting
XML trees and serializing map()
and array()
types. This meant that
during debugging it was possible to see reasonably full details of intermediate data
structures, either through xsl:message
or in xsl:result-document
to the browser screen, in ways that would be virtually impossible using the JavaScript
debugger. It is central to displaying intermediate stages of many of the developemtn
and
testing processes.
Acknowledgements
The authors thank those who urged the preparation of this paper – during the writing several parts of the compiler were cleaned, redesigned and debugged in a very satisfactory way!
References
[Delpratt2013] Delpratt, O'Neil and Kay,
Michael. Multi-user interaction using client-side XSLT.
In Proceedings of
XMLPrague 2013, pp1-23 (2013)
[online] http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf
[Kay 2007] Kay, Michael. Writing an XSLT
Optimizer in XSLT.
In Proceedings of Extreme Markup Languages. (2007).
[online] http://www.saxonica.com/papers/Extreme2007/EML2007Kay01.html
[Lockett2016] Lockett, Debbie and Kay, Michael.
Saxon-JS: XSLT 3.0 in the Browser.
In Proceedings of Balisage: The Markup
Conference 2016. Balisage Series on Markup Technologies, vol. 13 (2014).
doi:https://doi.org/10.4242/BalisageVol17.Lockett01. [online] http://www.balisage.net/Proceedings/vol17/html/Lockett01/BalisageVol16-Lockett01.html
[Lumley2014] Lumley, John. Analysing XSLT
Streamability.
In Proceedings of Balisage: The Markup Conference 2014. Balisage
Series on Markup Technologies, vol. 13 (2014).
doi:https://doi.org/10.4242/BalisageVol13.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[Lumley2015] Lumley, John. Two from Three (in
XSLT).
In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on
Markup Technologies, vol. 15 (2015).
doi:https://doi.org/10.4242/BalisageVol15.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[Lumley2017] Lumley, John, Lockett, Debbie and Kay,
Michael. XPath 3.1 in the Browser.
In Proceedings of XMLPrague 2017, pp 1-18
(2017)
[online] http://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf
[Pemberton2013] Pemberton, Steven.
Invisible XML.
In Proceedings of Balisage: The Markup Conference 2013.
Balisage Series on Markup Technologies, vol. 10 (2013).
doi:https://doi.org/10.4242/BalisageVol10.Pemberton01. [online] http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html
[Richards1971] Richards, Martin. The
portability of the BCPL compiler.
Software Practice and Experience, Vol 1, Issue 2,
April 1971pp 135-146 (1971). doi:https://doi.org/10.1002/spe.4380010204
[REx] Rademacher, Gunther: REx Parser
Generator.
[online] http://www.bottlecaps.de/rex/
[XPath3.1] Robie, Jonathan, Dyck, Michael and Spiegel,
Josh, Editors. XML Path Language (XPath) 3.1
. World Wide Web Consortium, 21
March 2017. [online] http://www.w3.org/TR/xpath-31/
[XPath.FO] Kay, Michael, Editor. XQuery and
XPath Functions and Operators 3.1
. World Wide Web Consortium, 21 March 2017.
[online] http://www.w3.org/TR/xpath-functions-31/
[XSLT3.0] Kay, Michael, Editor. XSL
Transformations (XSLT) Version 3.0
. World Wide Web Consortium, 18 April 2017.
[online] http://www.w3.org/TR/xslt-30/
[1] Much of this section is lifted verbatim from my XMLPrague paperLumley2017. The background is pretty similar, so why not practice reuse?
[2] In the process pushing client-based Java towards oblivion too.
[3] Apart from some additional Saxon-JS attributes, the tree is identical to that used
for linking separately-compiled packages for execution by a Java-based server-side
Saxon
engine – hence it shares the same degree of compilation correctness
as
any other Saxon-processed stylesheet.
[4] Sometimes a global variable/parameter is split into a pair of the global item and a generated global variable that takes care of the conversion. References to the variable/parameter are redirected though the variable.
[5] The parameters of the template are defined as the start of the sequence constuctor – their evaluation has the side-effect of setting a value in the appropriate parameter slot for interpolation later in the sequence constructor.
[6] Needless to say, new changes for an extra instruction can sometimes have deleterious effects on large swathes of other already-passed tests. Sometimes convergence is not as fast as hoped, but in the process problem understanding does increase.
[7] When the name contains an attribute value template, an alternative
compElem
instruction is used, where the first child is the name
generator.
[8] The stylesheet says nothing about a person
only having one
name
.
[9] Technically it isn't necessary to order at the compilation phase – Saxon-JS orders
the template rules according to the rules on conflict resolution as it forms an internal
Mode
object. But it can help in debugging the compiler to see templates
ordered correctly.
[10] This code still leaves an included static variable as its original definition – since we must have evaluated it, we could substitute the evaluated value.
[11] A map()
form might be more coherent, but first designs have used
separate variables.
[12] When the language is XSLT, the suite of supported functions expands to cover those only supported in XSLT. Currently this is known in the XPath compiler, but it would be possible to dynamically load a coherent map of function signatures covering both XSLT and user-supplied stylesheet functions.
[13] Whereas the complete (~18,000) QT3 tests could be run in O(5 mins), the XSLT tests
must perforce take quite a lot longer, if only for the numbers of embedded XPath
expressions within a stylesheet that must be compiled.. The ~800 Type
category tests take O(5mins) suggesting slower by O(20x). So far we haven't run
them all
.
[14] Though this is exceptional useful when the trees are serialized in debugging.