Introduction
XSLT1.0 was originally developed primarily as a client-side processing technology,
to
increase data-adaptability and presentational flexibility with a declarative model
supported
in the browser. By 2006 all the major desktop browsers supported it, but the rise
in
importance of mobile client platforms and their code footprint pressures then sounded
the
death-knell. However, remnants of support for the XPath 1.0 component of XSLT can
be found in
most, but not all, of the current desktop browsers.
In the meantime, spurred mostly by unexpected takeup of XSLT and XQuery in server-side
XML-based architectures, the technologies have progressed through the 2.0
stage
(temporary variables, type model and declarations, grouping, canonical treatment of
sequences,
extensive function suites...) to the current candidate recommendation standards for
3.0/3.1 in
XSLT, XQuery and XPath. At this stage support for maps, arrays and higher-order functions
join
let constructs, increased standard function libraries and others to provide a core
set of
common functionality based on XPath. XQuery uses this with its own (superset of XPath)
syntax
to support query-based operation and reporting. XSLT adds a template-based push model,
within
an XML syntax, which is designed to be able to support streaming processing in suitable
use
cases.
These developments are sufficiently robust and powerful that, other circumstances
permitting, exploiting XSLT3.0/XPath 3.1 to process XML data is highly attractive.
But at
present this can only be performed in server-side situations – no browser developers
could
contemplate either the development effort or the necessary memory footprints needed
for the
2.0
level compilers for what they consider niche (i.e. non-mobile)
applications, let alone that required for 3.0+.
What has been developed extensively in browsers are
JavaScript processors, both internal compilers, JIT runtimes and
development libraries, such that very significant programmes can be executed in-browser. And in general the level of conformance of and interoperability between
implementations in different browsers is reasonable.
Exploiting this JavaScript option for supporting XSLT has been explored in a couple
of
cases:
-
Saxon-CEDelpratt2013 cross-compiled a stripped-down XSLT2.0 Saxon
compiler from Java into JavaScript using Google's GWT technology. This worked, but
the
loaded code was large, very difficult to test and debug and very exposed to GWT's
cross-browser capabilities.
-
During 2016 Saxonica developed Saxon-JS
Lockett2016, a runtime environment for XSLT 3.0, implemented in
JavaScript. This engine interprets compiled execution plans, presented as instruction
trees describing the execution elements of an XSLT3.0 stylesheet. The execution plan
is
generated by an independent compiler, which performs all necessary parsing, optimisation,
checking and code generation. Consequently the runtime footprint is modest (~ 200kB
in
minimised form) and programme execution incurs no compilation overhead.
This paper describes on-going work to construct a compiler for
XSLT3.0, generating the runtime execution plan for interpretation by Saxon-JS, written
in
XSLT3.0. It uses very heavily an ability to compile
XPath3.1 expressions
contained within the source stylesheet, described later. This makes it possible to
consider
compiling XSLT3.0 packages entirely within a JavaScript environment, such as a browser,
or
supporting the transform()
function in XPath in such environments. It also may
form the basis for a portable
Saxon compiler for XSLT3.0, when
SEF-consuming
runtimes for other language environments might be
possible.
Saxon was originally written for the Java platform, but despite the mantra of "write
once,
run everywhere", there has been increasing demand over the years for Saxon's functionality
to
be made available on platforms that Java cannot reach. Sometimes this requirement
can be met
by cross-compilation: thus Saxon on .NET is developed by translating Java bytecode
(using
IKVMC) to the .NET equivalent, Saxon for C/C++ and PHP is developed using a Java compiler
(JET) that generates Intel machine code, and Saxon-CE on the browser was developed
using GWT,
which translates Java source code to JavaScript. Cross-compilation, however, has its
limitations, one of which is a total dependency on third-party tools that may be discontinued
at any time. The Saxon-JS product introduced a new approach, by splitting the compile-time
and
run-time parts of the product and with a JavaScript runtime created an efficient way
of
executing Saxon stylesheets in the browser. But for other environments (Node.js being
perhaps
the most obvious example), users really need to be able to compile stylesheets on
the target
platform (in this case JavaScript) rather than compiling them on a Java platform and
then
moving the compiled code across to the target platform for execution. The classic
solution to
this problem Richards1971 is to write the front end of the compiler in its
own language (that is, in this case, in XSLT). Given the availability of a run-time
for a
particular target platform, the compiler immediately becomes available on that platform.
The
idea of using XSLT to implement a compiler is intriguing: on the one hand, a compiler
can be
seen as a pipeline of tree-to-tree transformations which makes it a classic XSLT application;
on the other hand, the data model and instruction set of XSLT is designed for handling
a
rather different kind of data than that found in traditional compilers.
An earlier attempt to implement part of the compilation phase of an XSLT compiler
(specifically the optimizer) was reported in Kay 2007. The earlier
experiment proved unviable, largely because of the expense of using XSLT to make a
small
modification to a large tree, and then performing this operation repeatedly (and recursively).
A further problem was the difficulty of holding generalized values (for example a
sequence of
integers) as properties on an XML tree. It turns out that these obstacles are largely
removed
by the introduction of maps into the XDM data model. Maps in Saxon are implemented
using an
immutable hash trie, and a property of this data structure is that despite the fact
that
modification to a map always returns a new map and leaves the existing map intact,
the new map
can share most of the data in the old map, and the cost of the modification (in both
time and
space) is constant, rather than being proportional to the size of the map. While one
can
envisage similar optimizations to the tree structures used to represent XML documents,
the
fact that nodes in an XML document have distinct identity, and links to their parent
nodes,
makes it far more difficult; furthermore, the primitive operations on maps (put, remove)
translate far more easily into simple operations on a hash trie than the complex instructions
in XSLT used to construct XML result trees.
In the rest of this section we describe the Saxon-JS runtime, the extensions that
were
added to support dynamic XPath evaluation, and discuss what the execution-plan
format is. In subsequent sections we then outline the approach for the compilation
taken so
far and some of the problems faced. The testing strategy is described and then finally
we
outline further work and draw conclusions.
Saxon-JS Runtime
The Saxon-JS runtime environment is written entirely in JavaScript and is intended
to
interpret and evaluate XSLT program execution plans and write results into the containing
web page. These execution plans are presented as instruction trees describing the
execution
elements of an XSLT3.0 stylesheet in terms of sets of templates and their associated
match
patterns, global variables and functions, sequence constructors consisting of mixtures
of
XSLT instructions and XPath expression resolution programs and other static elements
such as
output format declarations.
Usually XSLT programs for Saxon-JS are compiled specifically for this target
environment, using Saxon-EE to generate an XML tree as a
Stylesheet Export File (SEF). This tree is loaded by a SaxonJS.transform()
within a web page,
which then executes the given stylesheet, with a possible XML document as source and
other
aspects of dynamic context, writing results to various sections of the web page via
xsl:result-document
directives. The engine supports specialist XSLT modes
(e.g. ixsl:on-click
) to trigger interactive responses, based on a model
originally developed for Saxon-CE.
Saxon-JS supports a very large proportion of XSLT3.0 and XPath3.1 functionality, the
notable current absences being schema-awareness, streaming, functional items and
higher-order functions. As all the static processing (parsing, static type checking,
adding
runtime checks.…) of the presented XSLT stylesheet is performed by Saxon-EE during
its
compilation operation, programs running in Saxon-JS should benefit from the very high
levels
of standards compliance that the standard Saxon product achieves. At runtime Saxon-JS
only
has to check for dynamic errors and often these are programmed in the execution plan
as
specific check instructions. The consequences are that
-
The Saxon-JS code footprint is very much smaller, consisting only of a runtime
interpreter.
-
Execution starts immediately after loading the instruction tree – there is no
compilation phase
-
But Saxon-JS assumes the code is correct,
as all static errors have been removed and necessary dynamic checks have been
added.
Dynamic evaluation of XPath expressions
In late 2016 we extended the Saxon-JS runtime with a facility (written in JavaScript)
to
support dynamic execution of XPath expressions (needed for the xsl:evaluate
instruction), which involved a parser for the XPath 3.1 grammar and a compiler for
the
resulting parse tree to an appropriate SEF tree. Full details of this are available
in Lumley2017, but some aspects will necessarily be described in this paper.
This feature (whose code is of the same order of size as Saxon-JS itself) is implemented
as
a dynamically loaded package in Saxon-JS, so those stylesheets that don't use dynamic
evaluation avoid the additional loadtime penalty. (This XPath3.1 support can also
be invoked
directly from JavaScript, permitting use outside an XSLT context.)
Stylesheet Export File
The Stylesheet Export File was originally developed as a format to describe an exported,
compiled, XSLT package for use in Saxon's support of package linking
for XSLT3.0. As such it is required to describe all the necessary
information required to reconstruct an operational transform. Full descriptions of
its form
can be found in Lockett2016, but the reader should understand the
following.
The SEF is written in XML, in the namespace
http://ns.saxonica.com/xslt/export
, as a package
element with
the following structure of components, corresponding to the top-level
components of a complete and statically compiled stylesheet. Each component is defined
as a
<co id="
number
">
element
surrounding the definition itself.
Global parameters & variables
|
The global xsl:param and xsl:variable children of the
stylesheet with a default value generator and for a parameter, an acceptor/converter
expression for externally defined values, as their children .
|
Global functions
|
Each function with a set of arguments and a body expression to
evaluate the result, on the assumption the argument values have already been bound
into the appropriate parameter slots.
|
Named templates
|
Each named template, i.e. those xsl:template[@name] with its sequence
constructing instructions as its child.
|
Modes
|
The defined or implict modes for template rules, ie.e those
xsl:template[@match] . Each mode contains all the template rules
associated with it, ordered by import precedence, priority and sequence position,
corresponding to the conflict resolution rules. (The body of a template may appear
multiple times when the match pattern is a union, as different rules with possibly
different priorities. Equally well a template rule may appear in multiple modes, and
the body may be shared with a named template as both @match and
@name can co-exist.) The mode also describes its default behaviours
(on-no-match="shallow-copy", on-multiple-match="fail" etc.) as encoded
attributes.
|
Keys & accumulators
|
Declarations for xsl:key and xsl:accumulator
definitions.
|
Other declarations
|
Definitions of the other declarations of the stylesheet (xsl:strip-space,
xsl:decimal-format, xsl:output ) appear as top level components of the
package .
|
As an example here are a couple of samples, with various sections elided for
brevity:
Table I
Some export file examples
XSLT |
<xsl:stylesheet … version="3.0">
</xsl:stylesheet>
|
SEF |
<package xmlns="http://ns.saxonica.com/xslt/export"
version="30" packageVersion="1" saxonVersion="9.7.0.15"
timeStamp="2017-05-29T13:11:19.791+01:00" target="JS">
<co id="0" binds="">
<mode onNo="TC" flags="dW" patternSlots="0"/>
</co>
<overridden/>
<output>
<property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
</output>
<decimalFormat/>
</package>
|
XSLT |
<xsl:stylesheet … version="3.0">
<xsl:variable name="var1">Variable1</xsl:variable>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template name="A">
<A>
<xsl:value-of select="$var1"/>
</A>
</xsl:template>
<xsl:template match="a">
<xsl:call-template name="A"/>
<xsl:next-match/>
</xsl:template>
</xsl:stylesheet>
|
SEF |
<package xmlns="http://ns.saxonica.com/xslt/export"
version="30" packageVersion="1" saxonVersion="9.7.0.15"
timeStamp="2017-05-29T13:23:58.27+01:00" target="JS">
<co id="0" binds="">
<globalVariable name="Q{}var1" type="document-node()"
… visibility="PRIVATE">
<doc … validation="preserve">
<valueOf flags="Sl">
<str val="Variable1"/>
</valueOf>
</doc>
</globalVariable>
</co>
<co id="1" binds="0">
<template name="Q{}A" … slots="0">
<elem role="body" name="A">
<valueOf flags="l">
<gVarRef name="Q{}var1" bSlot="0"/>
</valueOf>
</elem>
</template>
</co>
<co id="2" binds="1">
<mode onNo="SC" flags="dW" patternSlots="0">
<templateRule prec="0" prio="0.0" seq="0" rank="0" minImp="0" slots="0"… >
<p.nodeTest role="match" test="element(Q{}a)"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return SaxonJS.U.isNode(item)
&& item.nodeType==1
&& q.uri==''&&q.local=='a';"/>
<sequence role="action" … >
<callT name="A" bSlot="0"/>
<nextMatch flags="ti"/>
</sequence>
</templateRule>
</mode>
</co>
<overridden/>
<output>
<property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
</output>
<decimalFormat/>
</package>
|
The first example is the minimal stylesheet, which has implicit built-in template
rules
with 'text-only-copy' behaviour, described by the unnamed mode with no child template
rules.
In the second example the xsl:call-template name="A"
is bound to its target
named template component, and the $var1
reference to its definition, by the
@bSlot
reference to the component-binding-vector
(@binds
) on its enclosing component. The template rule has two children: one
with a match
role, which is a pattern definition and the other an
action
role, which is the sequence constructor to be used when the template
is selected.
The general form of the bodies of these components is an XML tree that defines an
instruction/function invocation, such as
<forEach>
arg0,arg1
</
.
The name of the element describes the operation, in this case a
for-each, where the first child is the generator instruction of the
for
values and the second child defines the sequence constructor
instruction for each of those when bound as the context item. Saxon-JS's main loop
is an
interpreter of such instructions, systematically evaluating instruction children as
arguments and producing suitable results, usually as (lazy) iterators over sequences
of
atomic values, maps/arrays and result-tree fragments.
It is not surprising that most of these instructions correspond to either productions
in
the XPath grammars or XSLT declarations or instructions, often using the same name,
or
camelCase() thereof, as the major internal semantics of evaluation
within XSLT sequence constructors and XPath expressions is functional.
Some of these instructions are parameterised with attributes, such as <fn
name="
fn-name
>
which calls the given
system function. Other attributes describe possibly-needed properties, such as active
namespace bindings (e.g. ns=" =foo map=~"
records that the default namespace is
"foo"
and that the prefix map
is bound to its conventional
namespace for the map functions.) For some instructions, the arguments are positional
(such
as forEach
) – for others (such as templateRule
) the
@role
attribute is used for identification. In other cases instructions of
given name are used (such as catch
within try
).
As will become apparent later, for some of the important instructions there is no
difference between whether the original code was in an XPath expression or an XSLT
tree
fragment. For example both $in !
body
and
<xsl:for-each
select="$in">
body
</
are both
represented by forEach
. Similarly xsl:variable
bindings are
represented as let
instructions surrounding their definition as the first child
and a sequence of subsequent result-generating children as the second, in exactly
the same
manner as let $var := … return …
would in an XPath expression. This means it
should be possible to reuse much of the code already developed for XPath compilation
within
an XSLT situation.
Compiling the XSLT
The goal of the project is to construct a high-conformance compiler from XSLT3.0 to
SEF,
the compiler itself being written in XSLT3.0 with the addition of XPath compilation
functions,
present in Saxon-JS, developed for the support of dynamic XPath evaluation. At this
stage the
compiler (which has been externally compiled/exported into SEF) is intended to run
within the
JavaScript environment of a browser. Given that both source (XSLT3.0) and destination
(SEF)
are well-defined XML, it was anticipated that using XSLT3.0 for the compiler would
give a
strong advantage, and so it proved.
This project is still in development, so some of the steps described
will change as we tidy up the code, extend functionality and streamline
the execution. In particular we've chosen to use several passes (6 or 7?) across the
stylesheet/package trees to increase visibility for debugging and limit the
everything's empty cascade
, primarily using a central XML tree representation
of the evolving program
during initial development of the compiler. Some of the
phase separations shouldn't really be possible and might be plain wrong in the most
general
case, but for early development and the simpler majority of test cases, make development
more
transparent. An example is in the interaction of static variables and parameters and
conditional compilation on resource collection where originally collection was followed
by
static processing but are now merged into a single pass – see section “Preprocessing: package collection and static and conditional processing”.
The development methodology was to get something running quickly – just compiling
the
simplest of stylesheets and then attempt to attach to a (browser-based) testdriver
running the
XSLT3.0 test suite (see section “Testing”). This framework not only allowed selected
tests to be compiled, run and checked but also to examine various intermediate stages
of the
compilation state, as well as diving under the hood
with the browser-based
JavaScript debuggers. The process was then to choose one XSLT instruction (we chose
xsl:choose
) and get both the XSLT test situation and the compiler operational,
and then slowly expand the set of features supported .
Once the compiler is fully operational, and we have an executing large example corpus,
we
can begin to merge phases together and rationalise some of the internal data structures,
especially exploiting map()
and array()
types when a mixture of
properties and tree fragments are needed.
The reader should note also that there are often several different strategies that
will
produce the same result in Saxon-JS so the fact that code shown here is not the same
as that
which Saxon-EE would produce isn't necessarily an error. However we trust the reader
will get
a sense of the approach being taken.
The main phases of the compilation process required are:
-
XML parsing – not really part of the compiler, we assume this has already been done
before we start.
-
preprocessing – evaluation of static variables, use-when, shadow attributes,
include/import.
-
validation and normalization – essentially the kind of validation that could be done
with a schema processor, checking that elements and attributes appear in the right
place
and have permitted values .Currently this is performed as part of the XSLT to SEF
instructions
process, but might be better to do it as a free-standing phase. For
example, normalizing boolean attributes from yes|1|true
to true
early on can save a lot of boring code later.
-
indexing components: building symbol tables for named objects such as variables,
functions, named templates and attribute sets, resolving references, checking for
duplication.
-
parsing of XPath expressions and patterns.
-
type analysis: inferring the types of expressions, checking (as far as is statically
possible) that these are consistent with required types, adding run-time type checking
code to the tree where necessary.
-
optimization: replacing constructs with equivalent constructs that are faster to
evaluate (for example, rewriting (position() = 1 to 10) with (position() ge 1 and
position() le 10). Currently little of this is being done, save that some arithmetic
expressions over constants are resolved during XPath compilation.
-
streamability analysis: not being done at the moment, since Saxon-JS does not
implement XSLT 3.0 streaming. (Note that work reported in Lumley2014
describes an XSLT implementation of streamability analysis, which may be of relevance
to a
future XSLT-based compiler product.)
-
code generation: normally the last phase of a compiler, in this architecture all that
happens here is actions such as allocating slots for variables and component references,
generating JavaScript code for node tests and name tests, and optional serialization
of
the final SEF tree as lexical XML, including a checksum to deter attempts at manual
modification.
In the current design most of the representation passed between phases is a (single)
XML
tree, but we anticipate fuller use of maps and arrays, that XSLT3 supports, describing
more
collections
of components and associated properties.
A simple walk-through
We'll start by a simple walk-though of one of the first test cases we used – involving
xsl:choose
. The stylesheet is:
<xsl:stylesheet … version="2.0">
<xsl:template match="/doc">
<out>
<xsl:for-each select="person">
<xsl:choose>
<xsl:when test="sex='M'">
Male: </xsl:when>
<xsl:when test="sex='F'">
Female: </xsl:when>
<xsl:otherwise>
Who knows?: </xsl:otherwise>
</xsl:choose>
<xsl:value-of select="name"/>
</xsl:for-each>
</out>
</xsl:template>
</xsl:stylesheet>
For the moment we'll ignore the static collection phase and show the result of the
first
conversion pass:
<package … version="30" packageVersion="1" saxon-version="Saxon-JS-internal"
target="JS" name="TOP-LEVEL" relocatable="false">
<co id="0">
<mode onNo="TC" flags="dW" patternSlots="0">
<templateRule prec="0" seq="0" ns="xml=~ xsl=~" minImp="-10" flags="s"…
select="node()" match="/doc" module="choose-0101.xsl">
<xpath role="match" xpath="/doc" ns="xml=~ xsl=~"/>
<elem name="out" type="element()"
role="action" ns="xml=~ xsl=~" namespaces="">
<forEach>
<xpath role="select" xpath="person" ns="xml=~ xsl=~"/>
<sequence>
<choose type="item()*">
<xpath role="" xpath="sex='M'" ns="xml=~ xsl=~"/>
<valueOf>
<str val="
Male: "/>
</valueOf>
<xpath role="" xpath="sex='F'" ns="xml=~ xsl=~"/>
<valueOf>
<str val="
Female: "/>
</valueOf>
<true/>
<valueOf>
<str val="
Who knows?: "/>
</valueOf>
</choose>
<valueOf flags="l">
<fn name="string-join" role="select">
<convert from="xs:untypedAtomic" to="xs:string">
<data>
<xpath role="select" xpath="name" ns="xml=~ xsl=~"/>
</data>
</convert>
<str val=" "/>
</fn>
</valueOf>
</sequence>
</forEach>
</elem>
</templateRule>
</mode>
</co>
<overridden/>
<output>
<property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
</output>
<decimalFormat/>
</package>
The package basically consists of a single unnamed mode of templates with just one
template. We leave the @match
for this template as an xpath
child,
recording both its role and the XPath/pattern expression it defines. Then we collect
up all
its sequence constructor components (in this case the <out>
element) as the
second child labelled as the action role. (If the template had had
parameters these would have been top-level children of the rule, hence the use of
the role
identification attributes.)
Within that sequence constructor we compile recursively. The <out>
becomes an element-generating instruction with the required name as a parameter. Its single child is a xsl:for-each
, which compiles to a
forEach
leaving its content selector as an xpath role="select"
child, and its sequence constructs as a sequence
around the two children, which
have compiled into a choose
and a valueOf
.
The valueOf
contains instructions to convert the result into a
space-separated concatenation of the evaluation of the name
XPath on the
context item. The choose
has been expanded as a sequence of test/result pairs,
were the test is an XPath that will checked against context and the
xsl:otherwise
has a test of true()
, for which a specific
instruction true
is provided in Saxon-JS. (During the conversion of
xsl:choose
to choose
checking of static language errors such as
xsl:otherwise
not being last or incorrect children, is carried out trivially
by a series of tests such as
exists(xsl:otherwise/(following-sibling::*|following-sibling::text()[matches(.,'\S')])
During the next step we compile all the embedded XPath expressions, leaving their
resulting instruction trees in situ:
<package...>
<co id="0" binds="">
<mode onNo="TC" flags="dW" patternSlots="0">
<templateRule prec="0" seq="0"...>
<docOrder role="match">
<slash>
<fn name="root" type="node()?">
<treat type="node()" jsTest="return SaxonJS.U.isNode(item);" … >
<dot type="item()" … />
</treat>
</fn>
<axis name="child" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return item.nodeType==1 &&
q.uri=='' && q.local=='doc';"/>
</slash>
</docOrder>
<elem name="out" type="element()" role="action" ns="xml=~ xsl=~" namespaces="">
<forEach forType="element()" type="item()*">
<axis role="select" name="child" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return item.nodeType==1 &&
q.uri=='' && q.local=='person';" … />
<sequence type="item()*">
<choose type="item()*">
<gc role="" type="xs:boolean" card="1:1" comp="..." op="="… >
<data type="xs:anyAtomicType*">
<axis name="child" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return item.nodeType==1 &&
q.uri=='' && q.local=='sex';"/>
</data>
<str val="M" type="xs:string"/>
</gc>
<valueOf>
<str val="
Male: "/>
</valueOf>
…
</choose>
<valueOf flags="l">
<fn name="string-join" role="select">
<convert from="xs:untypedAtomic" to="xs:string">
<data>
<axis role="select" name="child" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return item.nodeType==1 &&
q.uri=='' && q.local=='name';"… />
</data>
</convert>
<str val=" "/>
</fn>
</valueOf>
</sequence>
</forEach>
</elem>
</templateRule>
</mode>
</co>
<overridden/>
<output>
<property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
</output>
<decimalFormat/>
</package>
Now the XPath expressions have been expanded, two of them to a plain axis
and one to a general comparison of an axis value against a string. Each of the
axis
instructions contains some JavaScript code to qualify node items for
inclusion in the axis search results. (Note well – these code statements have been
line-wrapped for brevity – in practice they must not contain newlines,
as JavaScript treats this as end of statement
.) Also the type result of the
expression has been determined and propagated upwards so that result trees are now
typed.
This becomes important in issues of whether type casts are necessary, dynamic
type/cardinality checks are required, what the type of the context-item is or if
statically-determined type errors have occurred.
Now we complete the process by converting the XPath determined for the match pattern
into a true pattern:
<package...>
<co id="0" binds="">
<mode onNo="TC" flags="dW" patternSlots="0">
<templateRule prec="0" seq="0" … prio="0.5">
<p.withUpper role="match" axis="parent" upFirst="true" type="element()*">
<p.nodeTest test="element()*" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return SaxonJS.U.isNode(item) &&
item.nodeType==1 && q.uri=='' && q.local=='doc';"/>
<p.nodeTest test="document-node()" type="node()?"
jsTest="return SaxonJS.U.isNode(item) &&
(item.nodeType==9 || item.nodeType==11);"/>
</p.withUpper>
<elem name="out" type="element()"… >
….
In this case the pattern matches doc
when the parent node is a
document-node()
or a document fragment. The default priority for this pattern
has been determined, based on the complexity
of the pattern tree and written
onto the rule, when priority has not already been declared. This, along with the import
precedence and sequence position will be used to order the rules for matching and
using
xsl:next-match
.
As should be apparent, these code samples can be large. In the rest of the document
I'll
concentrate on the main operations and phases of the compilation process and some
of the
more complex constructs.
Preprocessing: package collection and static and conditional processing
The first phase of package formation is to collect together from the top level
package
, all the necessary resources in terms of required subsidiary
packages and imported and included stylesheets. At present we are only addressing
resources
in the form of XSLT stylesheets, not packages. At first glance this should be a
comparatively simple exercise with templates matching the collection directives
(xsl:import|xsl:include
), deferencing the document locations and including
contents recursively. However there are two, or three, complications.
Some components, such as templates, in stylesheets that are
imported have a lower precedence in execution that those in the
importing stylesheets, or sibling imported stylesheets. This is an important mechanism
for
libraries
in XSLT1.0 and 2.0 and requires accurate tracking of importation
position
.
In addition, in XSLT 3.0 stylesheets can be parameterised statically in two ways,
which
can influence the collection result significantly. Firstly elements of the stylesheet
can be
included conditionally by @[xsl:]use-when
decorations, using XPath expressions
that can involve static global variables and parameters as well as the
more usual features of the compiling environment, such as the implementation name
or version
number. Secondly the values of attributes on XSLT elements can be
computed at compile-time from such static variables. This feature, termed shadow
attributes, would for example permit different versions of a library to be
included at compile time, dependent upon a supplied parameter (using
_href="lib-{$version}.xsl"
) .
A further compilcation arises from the possibility that static (global) variables
in
imported or included stylesheets have effective scope not just within their containing
stylesheets, but also for all subsequent components in calling
stylesheets,
which infers they migrate to top level – effectively the scope then becomes
following::*
. This requires static variable bindings to migrate up along with
included and imported preprocessed stylesheets.
Given the necessity of transmitting state (static variables, import
precedences) along the sibling axes of the stylesheets during this
process, we use an iterator over the stylesheet which returns a map, of the general
form:
<xsl:mode name="static" on-no-match="shallow-copy"/>
<xsl:template match="xsl:stylesheet | xsl:transform" mode="static" priority="1.5" as="map(*)">
<xsl:param name="import-precedence" select="1" tunnel="true"/>
<xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/>
<xsl:variable name="me" select="."/>
<xsl:iterate select="*">
<xsl:param name="static-vars" as="map(*)" select="map{}"/>
<xsl:param name="precedence-base" select="$import-precedence"/>
<xsl:param name="parts" select="()"/>
<xsl:on-completion>
<xsl:variable name="sheet" as="element()">
<xsl:for-each select="$me">
<xsl:copy>
<xsl:sequence select="@*, $parts"/>
</xsl:copy>
</xsl:for-each>
</xsl:variable>
<xsl:sequence select="map{'sheet':$sheet,
'precedence':$precedence-base,
'static-vars':$static-vars}"/>
</xsl:on-completion>
<!– Process component -->
</...>
<xsl:template match="component" mode="static">....
where the static evaluation state can be altered for following-sibling
components, by a next-iteration
instruction with altered parameter values,
which also accumulates the components of the stylesheet though the parts
parameter.
First we evaluate and accumulate the state of the static variables and parameters.
These
are global variable or parameters of a stylesheet that have been declared
@static="yes"
. These are restricted in the sense that their values can only
be defined either by external compile-time binding (for parameters) or by an XPath
expression which involves the usual static context and reference to other
earlier-defined static variables or parameters only, though static variables
from included stylesheets do come into scope.
Pleasingly in the most common cases this turns out to be comparatively straightforward,
relying on the two restrictions: i) that static variables can only set
a value via an XPath expression in their @select
attributes (i.e. no sequence
constructors) and the static context for that expression excludes any user-defined
functions
in the package and ii) reference to variables can only be to prior static global variables.
These together mean that given the determined values of all previously defined static
variables, the value for the next variable can always be determined by an
<xsl:evaluate xpath="@select">
action, supplying a namespace context and a
map of those previously defined variable values on the @with-params
map
argument. So our process step for such variables is:
<xsl:when test="self::xsl:variable/@static = ('yes', 'true', '1')">
<xsl:variable name="possible.component" as="element()?">
<xsl:apply-templates select="." mode="#current">
<xsl:with-param name="static-vars" select="$static-vars" tunnel="true"/>
</xsl:apply-templates>
</xsl:variable>
<xsl:if test="exists($possible.component)">
<xsl:variable name="value" as="item()*">
<xsl:evaluate xpath="@select" namespace-context="." with-params="$static-vars"/>
</xsl:variable>
<xsl:next-iteration>
<xsl:with-param name="static-vars" as="map(*)"
select="map:put($static-vars, resolve-QName(@name, .), $value)"/>
<xsl:with-param name="parts" select="$parts,$possible.component"/>
</xsl:next-iteration>
</xsl:if>
</xsl:when>
It is first necessary to process the variable declaration itself, as it can be subject
to conditional compilation (as discussed just below). If the declaration still
stands
we evaluate its @select
expression with all previous static
variables in scope and then add that name/value mapping to the static-variable state
for the
next iteration on subsequent components, with the variable definition added to the
list of
component parts.
The conditional inclusion directive (@[xsl:]use-when
) is processed by a
simple high-priority template, shown below, matching elements which include the directive.
When the directive is a shadow attribute it is first preprocessed to
expand the expression value. Then the expression is evaluated in current static variable
context. If its boolean value is true then the element is processed conventionally.
If not,
nothing is written to the expanded stylesheet.
<xsl:template match="xsl:*[@use-when | @_use-when] | *[@xsl:use-when | @xsl:_use-when]"
priority="3" mode="static">
<xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/>
<xsl:variable name="use-expression" as="xs:string">
<xsl:choose>
<xsl:when test="@*:_use-when">
<xsl:apply-templates select="@*:_use-when" mode="#current"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="@*:use-when"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="use" as="xs:boolean">
<xsl:evaluate xpath="$use-expression" namespace-context="." with-params="$static-vars"/>
</xsl:variable>
<xsl:if test="$use">
<xsl:next-match/>
</xsl:if>
</xsl:template>
Shadow attributes are processed by a similarly simple template, writing a new attribute
with the appropriate name.
Finally we can now track the importation precedence. The goal is that components in
a
higher-level stylesheet have higher precedence than those in any imported sheets,
and lower
than any following-sibling sheet. If we make a simplifying assumption, common in most
designs and mandated in XSLT 1.0 and 2.0, that any importations, or inclusions that
include
importations, in a stylesheet precede any templates or other components
that need to track precedence, then we can process in the same iteration:
<xsl:when test="self::xsl:import">
<xsl:variable name="processed-directive" as="element()?">
<xsl:apply-templates select="." mode="#current"/>
</xsl:variable>
<xsl:if test="$processed-directive">
<xsl:variable name="doc" as="document-node()">
<xsl:call-template name="collect-doc">
<xsl:with-param name="uri" select="resolve-uri($processed-directive/@href, base-uri(.))"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="collected" as="map(*)">
<xsl:apply-templates select="$doc/*" mode="#current"/>
</xsl:variable>
<xsl:variable name="max-precedence" as="xs:integer"
select="xs:integer(max(($precedence-base, $collected?precedence)))"/>
<xsl:next-iteration>
<xsl:with-param name="precedence-base" as="xs:integer" select="$max-precedence + 1"/>
<xsl:with-param name="parts" select="$parts,$collected?sheet/*"/>
<xsl:with-param name="static-vars" select="$collected?static-vars"/>
</xsl:next-iteration>
</xsl:if>
</xsl:when>
The technique processes the imported stylesheet recursively, recording precedence
as an
attribute on components that require it and then finds the maximum of the precedences
recorded over the (expanded) stylesheet. One more that that precedence is then used
for the
remaining components.
However, XSLT 3.0 has liberalised the situation, such that importations, and inclusions
that contain importations, can occur anywhere in the top level of a stylesheet. This
means
that templates that precede an importation will now have higher
precedence than those imported, as well as those that follow. To handle this case
the
process is changed to use a map-base tree of importations and their precedences and
a
recursively-sensitive algorithm to distribute computed precedences across elements
in this
tree.
During this phase we also track and record active namespaces and the base URI for
each
element. During subsequent processing, with a plethora of temporary trees, ancestry
is
frequently lost and these important early properties often with it. They are currently
recorded as attributes in a separate namespace for later bulk removal. (This also
a suitable
point to carry out some minor forms of canonicalisation, such as normalizing whitespace
on
@as
type declarations. Most of these will probaby migrate to a separate
normalization phase.)
XSLT to SEF instructions
With the stylesheet expanded statically, the bulk of the compilation action involves
conversion of XSLT instructions to suitable SEF elements and subtrees. At present
we perform
this in two phases – conversion without XPath compilation, followed by compilation
of XPath
expressions and typechecking.
Naturally XSLT's push
mode is highly suitable for the converstion of XSLT
components and instructions to SEF. Here are some simple examples:
<xsl:template match="xsl:for-each" mode="sef">
<xsl:param name="attr" as="attribute()*" select="()"/>
<xsl:call-template name="check-select"/>
<xsl:sequence select="f:check-sort(.)"/>
<forEach>
<xsl:sequence select="$attr"/>
<xsl:iterate select="xsl:sort">
<xsl:param name="inner">
<xsl:apply-templates select="@select" mode="create.xpath"/>
</xsl:param>
<xsl:on-completion select="$inner"/>
<xsl:next-iteration>
<xsl:with-param name="inner">
<sort>
<xsl:sequence select="$inner"/>
<xsl:apply-templates select="." mode="#current"/>
</sort>
</xsl:with-param>
</xsl:next-iteration>
</xsl:iterate>
<xsl:call-template name="sequence-constructor"/>
</forEach>
</xsl:template>
In compiling a xsl:for-each
we first check it has a @select
attribute (an error will be thrown by the named template if not) and then that any
xsl:sort
children are first. The forEach
SEF instruction has
possible attributes added through the $attr
parameter – this is usually to set
a possible role identifier, such as when the xsl:for-each
is the only (action)
child of a template rule. Then the selected content is created as an xpath
element nested inside any defined xsl:sort
declarations, by using an iterator.
Finally the sequence constructor instructions are generated using a canonical named
template. (Note that regardless of the number of xsl:sort
children, the
forEach
only has two children – the first a content-generating instruction
(e.g. sort[sort[xpath]]
), the second a sequence constructor.)
<xsl:template match="xsl:function" mode="sef">
<xsl:variable name="params" as="element()*">
<xsl:apply-templates select="xsl:param" mode="#current"/>
</xsl:variable>
<co>
<function name="{f:EQName(@name,.)}" as="{(@as,'item()*')[1]}" slots="{count($params)}">
<xsl:call-template name="record-namespaces"/>
<xsl:sequence select="$params"/>
<sequence role="body">
<xsl:apply-templates select="* except xsl:param" mode="#current"/>
</sequence>
</function>
</co>
</xsl:template>
<xsl:template match="xsl:function/xsl:param" mode="sef">
<arg slot="{position() - 1}" name="{f:EQName(@name,.)}">
<xsl:sequence select="@as"/>
</arg>
</xsl:template>
An xsl:function
is compiled to a separate package component with a
function
child, with name, type and number of argument slots properties. The
active namespaces are then recorded so that in subsequent XPath compiling the correct
prefix-namespace bindings are available. The parameters become named and typed
arg
children, each with an allocated value slot. Finally the sequence
constructor is compiled from all the non-parameters
and identified as having
the body role for the function.
Some other declarations on stylesheet elements (@default-collation
,
@expand-text
etc.) are also processed during this phase, by modifying the
static compiling context, which is currently distributed across a set of tunnelled
variablesand which can be retrieved for compilations that are dependent upon them. As
these declarations are orthogonal to each other, we arrange a series of priority-ordered
templates which pass on responsibility for final effect using xsl:next-match
with a suitable tunneled parameter. For example:
<xsl:template match="xsl:*[@xpath-default-namespace]" priority="3" mode="sef">
<xsl:param name="attr" as="attribute()*" select="()"/>
<xsl:next-match>
<xsl:with-param name="xpath-default-namespace" select="@xpath-default-namespace"
tunnel="true"/>
<xsl:with-param name="attr" select="$attr"/>
</xsl:next-match>
</xsl:template>
<xsl:template match="xsl:*[@expand-text]" priority="3.3" mode="sef">
<xsl:param name="attr" as="attribute()*" select="()"/>
<xsl:next-match>
<xsl:with-param name="expand-text" as="xs:boolean"
select="@expand-text = ('true', 'yes')" tunnel="true"/>
<xsl:with-param name="attr" select="$attr"/>
</xsl:next-match>
</xsl:template>
Local variable and template or iteration parameter declarations are processed in part
at
this stage, but left as specific VARDEF
or PARAMDEF
elements, with
name and possible type attributes and their value generator as a child, either from
a
@select
expression or a sequence constructor. They will be finally converted
in the next phase.
At this point we have mappings of almost all of the XSLT declarations and instructions
into SEF equivalents, save that XPath expressions, matching patterns and parameter
and
variable declarations have not been finalised.
XPath and typing
Now we have the main elements of the package as a series of components with subtrees.
Some of the components define global variables, parameters and functions, others define
named-templates and modes of template rules. Scattered throughout are uncompiled XPath
expressions described as <xpath
xpath="
expression
"… />
elements, as
well as unprocessed variable and parameter definitions as VARDEF|PARAMDEF
subtrees. We now proceed to compile these XPaths, by using the
saxon:compile-XPath()
function developed for use in the support of
xsl:evaluate
. But in this process, these XPath expressions can, and often do,
make reference to resources that are defined elsewhere in the stylesheet, namely
user-defined functions and global and local variables and parameters, or even the
type of
the context-item for its evaluation. Such reference covers not only their existence
(and
possible slot
for their value) but equally crucially the
type and cardinality of value/result/argument they deliver or expect.
For functions the process involves forming a map, keyed by name and arity, of each
of
the function bodies – this is passed in as part of the static context state to XPath
compilations – when the language
property of the compilation is
XSLT
then these definitions are searched for function calls if the standard
library fails to match. Static type checking of both arguments and results of calls to these
user-defined functions is carried out in exactly the same manner as for built-in functions,
save that the call instruction is ufCall
rather than fn
and the
call contains a binding to the appropriate package component for the function body.
Global variable and parameter definitions are passed into XPath compilation as maps
of
name→slot and name→type after having themselves been compiled to both build value
generation
instructions and determine value type. (As they can acyclically reference each other,
this
is carried out in a similar manner as for local variables in scope.)
For local variables and parameters we need to handle things with regard to their
scoping. Recall that in the previous stage we left them as VARDEF
or
PARAMDEF
children, typically within a sequence constructor:
<xsl:variable name="words" select="tokenize(.)"/>
<xsl:variable name="chars" select="sum($words!string-length(.))"/>
<xsl:copy>
<xsl:attribute name="words" select="count($words)"/>
<xsl:attribute name="chars" select="$chars"/>
</xsl:copy>
→
<sequence>
<VARDEF name="words">
<xpath role="select" xpath="tokenize(.)"… />
</VARDEF>
<VARDEF name="chars">
<xpath role="select" xpath="sum($words!string-length(.))" … />
</VARDEF>
<copy flags="cin">
<sequence>
<att name="words" type="attribute()" … >
<fn name="string-join" role="select">
<convert from="xs:untypedAtomic" to="xs:string">
<data>
<xpath role="select" xpath="count($words)" … />
</data>
</convert>
<str val=""/>
</fn>
</att>
<att name="chars" type="attribute()" … >
…
<xpath role="select" xpath="$chars" … />
…
</att>
</sequence>
</copy>
</sequence>
The scope of variables in XSLT is along the
following-sibling::*/descendant-or-self::*
compound axis. Thus for an
instruction which has a child defining a variable we need to bind the variable and
ensure
the correct reference is made in any subsequent XPath. The mechanism for such in SEF
is a
let
, in exactly the same manner as that within an XPath expression, with two
children – the first defines the value, the second the result generator. The
let
itself describes the name of the variable and most importantly its value
binding slot. Thus we need to generate suitably nested let
s which we do with a
recursive template. For each VARDEF
encountered as the child of a
variable-containing element it i) compiles the XPaths in its definition using the
current
static context, ii) modifies static state to include the binding of the variable name,
its
determined type and its allocated slot to store the run-time determined value, iii)
recursively processes all following siblings, with the variable now in-scope
for reference in any XPath expressions and iv) wraps within a let
instruction
the variable defining instructions and a sequence constructor around the processed
remaining
siblings. Thus our example compiles to two nested let
s binding and referencing
variables in slots 0 and 1:
<let var="words" slot="0" type="xs:anyAtomicType?">
<fn role="select" name="tokenize" type="xs:string*" … >
<cvUntyped type="undefined" to="xs:string" diag="0|0||tokenize">
<data diag="0|0||tokenize" type="xs:untypedAtomic?">
<dot type="element()" diag="0|0||tokenize"/>
</data>
</cvUntyped>
</fn>
<let var="chars" slot="1" type="xs:anyAtomicType?">
<fn role="select" name="sum" type="xs:anyAtomicType?" … >
<forEach type="xs:integer" diag="0|0||sum">
<varRef name="Q{}words" slot="0" type="xs:string*"/>
<fn name="string-length" type="xs:integer">
<dot type="xs:string" diag="0|0||string-length"/>
</fn>
</forEach>
</fn>
<copy flags="cin">
<sequence type="attribute()*">
<att name="words" type="attribute()" … >
<fn name="string-join" role="select">
<convert from="xs:untypedAtomic" to="xs:string">
<data>
<fn role="select" name="count" type="xs:integer" … >
<varRef name="Q{}words" slot="0" type="xs:string*" … />
</fn>
</data>
</convert>
<str val=""/>
</fn>
</att>
<att name="chars" type="attribute()" … >
<fn name="string-join" role="select">
<convert from="xs:untypedAtomic" to="xs:string">
<data>
<varRef role="select" name="Q{}chars" slot="1" type="xs:anyAtomicType?" … />
</data>
</convert>
<str val=""/>
</fn>
</att>
</sequence>
</copy>
</let>
</let>
We can now look at typing. Checking and adapting to static or dynamic type is critical
for XPath and XSLT and even more so for Saxon-JS, as the runtime makes lots of assumptions
about argument types, or the presence of runtime type checking or casting instructions
being
present in the SEF tree. Consequences of error in processing the tree for these cases
usually result in a JavaScript error through a null pointer reference. The XPath compiler
has a built-in typechecker (in JavaScript) and we have mildly extended the compiler
such
that at least the top level result type appears on the compiled SEF tree as a
@type
attribute.
This means that after XPath compilation we should know what expected types will result
from that expression. We can now do static type checking at the XSLT level, though
at
present parts of the code are a little rough-and-ready: eventually we'll converge
to a
consistent coherent type checker, either exposing the JavaScript one or a new one
written in
XSLT. The typing is used in a number of places:
-
Determining the type of variables and parameters, either from the inferred type of
the @select
expression, or the defined (@as)
type or otherwise
form the sequence constructor, such that correct type information is used for compiling
XPath expressions that reference those variables.
-
Inferring the type of the context-item, either from a template matching or in the
body of a context-altering instruction, such as xsl:for-each
where the
context-type for dot
in the body is the singular of the type returned by
the @select
expression.
-
Determining whether a cast, a dynamic check or other instruction should wrap an
element in the generated SEF tree, or indeed whether a static type error must always
result.
Patterns
Patterns to match items are critical to XSLT templates and are also used in other
features such as xsl:number
, accumulators and keys. Whilst looking like XPath
expressions, technically they have a different top-level grammar, some (subtree) productions
of which are identical to those of XPath. In XSLT3.0 pattern possibilities have increased
a
great deal over XSLT2.0, especially in matches for non-nodal items. Eventually we'll
have to
bite the bullet and produce a variant of the XPath compiler (written in JavaScript
and used
for dynamic evaluation) which handles the XSLT pattern grammar and generates SEF
instructions using the additional pattern-related instructions. This is particularly
the
case for positional predicates
in patterns, where integer values denote
qualifications on sequence position and the conversion becomes distinctly tricky and
sensitive.
However for a lot of the simpler patterns, it is possible to compile them as if they
were an XPath and then convert the (higher) sections of the SEF tree into suitable
pattern
elements. For example a/b[c]
will compile if treated as XPath as:
<docOrder type="element()*" … >
<slash>
<axis name="child" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return item.nodeType==1 && q.uri=='' && q.local=='a';"/>
<filter type="element()*">
<axis name="child" type="element()*" jsTest="… q.local=='b';"/>
<fn name="exists" type="xs:boolean">
<axis name="child" type="element()*" jsTest="… q.local=='c';"/>
</fn>
</filter>
</slash>
</docOrder>
but as a pattern should be:
<p.withUpper role="match" axis="parent" upFirst="true" type="element()*">
<p.withPredicate type="element()*">
<p.nodeTest test="element()*" type="element()*"
jsTest="var q=SaxonJS.U.nameOfNode(item);
return SaxonJS.U.isNode(item) &&
item.nodeType==1 && q.uri=='' && q.local=='b';"/>
<fn name="exists" type="xs:boolean">
<axis name="child" type="element()*" jsTest="… q.local=='c';"/>
</fn>
</p.withPredicate>
<p.nodeTest test="element()*" type="element()*" jsTest="… q.local=='a';"/>
</p.withUpper>
Such simple conversions can be done with a modest number of templates used in a push
mode. With such a pattern we can then also determine the default priority for a template
using it, by matching various elements in the resulting pattern tree. This conversion
is
also used to create a separate template for each member of a union of patterns on
a template
match, as they may have separate default priorities and type for the context item
matched.
Component binding
With all the components now created, the final remaining step is to link them as
required through their component binding vectors. Each invocation instruction requiring
linking (such as applyT
for xsl:apply-templates
to the appropriate
mode
containing relevant template rules) has a @bSlot
attribute
which is a number indexing into the containing component @binds
attribute (the
binding vector), which is a whitespace sequence of numbers corresponding to the
@id
of the component linked. This process is performed by a simple collection
of mappings from global variables, named templates and modes and the components containing
their definitions and a recursive tree-walk using a small number of templates. With
this
operation the compilation is complete and the SEF tree can be used as an execution
plan for
evaluation of the given transform.
Does it work?
To start with, this was just a small experiment to generate an SEF for some simple
XSLT
using Saxon-JS in the browser, but it became clear fairly quickly that something more
substantial and with significant standards conformance might be possible. An early
goal was to
be able to get most of the XSLT tests for xsl:evaluate
at least running and
preferably passing. These tests involve a large mixture of namespace bindings, functions,
data-borne expressions and so forth, so getting this going would be, as was, quite
a milestone
– currently only 5 of 36 fail.
At the time of writing, of the 120 different categories of relevant test (stylesheet
attributes, declarations, instructions etc.) the tests in 40 of them are
working
(some 2,400 tests in total), which means the tests compile, run and
perhaps >90% of them pass or generate the wrong error. The rest of the categories have not been addressed specifically, but there is a
lot of cross-dependence between the the categories of course.
Performance (at least on a high-end laptop) was pleasing. For example the compiling,
running and checking of the 51 tests for xsl:choose
takes some 7 seconds, the 36
tests for xsl:evaluate
similar.
One of the original goals of Saxon-JS was to reduce the loading memory footprint,
which
had been just under 1 MB for Saxon-CE. This has been achieved as the minified Saxon-JS
runtime
is about 220kB. XPath evaluation support (additional JavaScript) is dynamically loaded
by
Saxon-JS and takes roughly the same amount of memory (~180kB minified). The compiler
(inside
whatever front-end is being used) is currently loaded as a complete SEF into Saxon-JS
and is
about 800kB, but contains much information redundancy, as a ZIP compression reduces
it to a
tenth of its size.
The technology is however sufficiently advanced that a simple editor/executor for
XSLT
stylesheets can be run completely in the browser, using all the features of Saxon-JS
to
support drag-and-drop interactive programming.
One of the eventual acid tests for portability of a compiler-written-in-itself, is
an
ability to successfully compile itself into a form that can repeatedly
compile itself idempotently. In some previous work (Lumley2015) we showed
how an XSLT3.0→2.0 converter, written in XSLT3.0, could convert itself and the resulting
XSLT2.0 conversion successfully and recursively
convert the original XSLT3.0
source.
The current development is a long way from that stage yet, but theoretically the compiler
is only using (some of) the features of XSLT for which it can compile successfully,
so it
should be possible. The XSLT3.0 repertoire supported by the current target platform
(Saxon-JS)
has limitations (higher-order functions might be most useful), but of course the compiler
cannot use them. Interestingly enough are some of the features that the compiler (writer)
hasn't used thus far – attribute sets and accumulators amongst them.
References
[Delpratt2013] Delpratt, O'Neil and Kay,
Michael. Multi-user interaction using client-side XSLT.
In Proceedings of
XMLPrague 2013, pp1-23 (2013)
[online] http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf
[Kay 2007] Kay, Michael. Writing an XSLT
Optimizer in XSLT.
In Proceedings of Extreme Markup Languages. (2007).
[online] http://www.saxonica.com/papers/Extreme2007/EML2007Kay01.html
[Lockett2016] Lockett, Debbie and Kay, Michael.
Saxon-JS: XSLT 3.0 in the Browser.
In Proceedings of Balisage: The Markup
Conference 2016. Balisage Series on Markup Technologies, vol. 13 (2014).
doi:https://doi.org/10.4242/BalisageVol17.Lockett01. [online] http://www.balisage.net/Proceedings/vol17/html/Lockett01/BalisageVol16-Lockett01.html
[Lumley2014] Lumley, John. Analysing XSLT
Streamability.
In Proceedings of Balisage: The Markup Conference 2014. Balisage
Series on Markup Technologies, vol. 13 (2014).
doi:https://doi.org/10.4242/BalisageVol13.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[Lumley2015] Lumley, John. Two from Three (in
XSLT).
In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on
Markup Technologies, vol. 15 (2015).
doi:https://doi.org/10.4242/BalisageVol15.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[Lumley2017] Lumley, John, Lockett, Debbie and Kay,
Michael. XPath 3.1 in the Browser.
In Proceedings of XMLPrague 2017, pp 1-18
(2017)
[online] http://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf
[Pemberton2013] Pemberton, Steven.
Invisible XML.
In Proceedings of Balisage: The Markup Conference 2013.
Balisage Series on Markup Technologies, vol. 10 (2013).
doi:https://doi.org/10.4242/BalisageVol10.Pemberton01. [online] http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html
[Richards1971] Richards, Martin. The
portability of the BCPL compiler.
Software Practice and Experience, Vol 1, Issue 2,
April 1971pp 135-146 (1971). doi:https://doi.org/10.1002/spe.4380010204
[REx] Rademacher, Gunther: REx Parser
Generator.
[online] http://www.bottlecaps.de/rex/
[XPath3.1] Robie, Jonathan, Dyck, Michael and Spiegel,
Josh, Editors. XML Path Language (XPath) 3.1
. World Wide Web Consortium, 21
March 2017. [online] http://www.w3.org/TR/xpath-31/
[XPath.FO] Kay, Michael, Editor. XQuery and
XPath Functions and Operators 3.1
. World Wide Web Consortium, 21 March 2017.
[online] http://www.w3.org/TR/xpath-functions-31/
[XSLT3.0] Kay, Michael, Editor. XSL
Transformations (XSLT) Version 3.0
. World Wide Web Consortium, 18 April 2017.
[online] http://www.w3.org/TR/xslt-30/