Introduction[1]

XSLT1.0 was originally developed primarily as a client-side processing technology, to increase data-adaptability and presentational flexibility with a declarative model supported in the browser. By 2006 all the major desktop browsers supported it, but the rise in importance of mobile client platforms and their code footprint pressures then sounded the death-knell. However, remnants of support for the XPath 1.0 component of XSLT can be found in most, but not all, of the current desktop browsers.

In the meantime, spurred mostly by unexpected takeup of XSLT and XQuery in server-side XML-based architectures, the technologies have progressed through the 2.0 stage (temporary variables, type model and declarations, grouping, canonical treatment of sequences, extensive function suites...) to the current candidate recommendation standards for 3.0/3.1 in XSLT, XQuery and XPath. At this stage support for maps, arrays and higher-order functions join let constructs, increased standard function libraries and others to provide a core set of common functionality based on XPath. XQuery uses this with its own (superset of XPath) syntax to support query-based operation and reporting. XSLT adds a template-based push model, within an XML syntax, which is designed to be able to support streaming processing in suitable use cases.

These developments are sufficiently robust and powerful that, other circumstances permitting, exploiting XSLT3.0/XPath 3.1 to process XML data is highly attractive. But at present this can only be performed in server-side situations – no browser developers could contemplate either the development effort or the necessary memory footprints needed for the 2.0 level compilers for what they consider niche (i.e. non-mobile) applications, let alone that required for 3.0+.

What has been developed extensively in browsers are JavaScript processors, both internal compilers, JIT runtimes and development libraries, such that very significant programmes can be executed in-browser[2]. And in general the level of conformance of and interoperability between implementations in different browsers is reasonable.

Exploiting this JavaScript option for supporting XSLT has been explored in a couple of cases:

  • Saxon-CEDelpratt2013 cross-compiled a stripped-down XSLT2.0 Saxon compiler from Java into JavaScript using Google's GWT technology. This worked, but the loaded code was large, very difficult to test and debug and very exposed to GWT's cross-browser capabilities.

  • During 2016 Saxonica developed Saxon-JS Lockett2016, a runtime environment for XSLT 3.0, implemented in JavaScript. This engine interprets compiled execution plans, presented as instruction trees describing the execution elements of an XSLT3.0 stylesheet. The execution plan is generated by an independent compiler, which performs all necessary parsing, optimisation, checking and code generation. Consequently the runtime footprint is modest (~ 200kB in minimised form) and programme execution incurs no compilation overhead.

This paper describes on-going work to construct a compiler for XSLT3.0, generating the runtime execution plan for interpretation by Saxon-JS, written in XSLT3.0. It uses very heavily an ability to compile XPath3.1 expressions contained within the source stylesheet, described later. This makes it possible to consider compiling XSLT3.0 packages entirely within a JavaScript environment, such as a browser, or supporting the transform() function in XPath in such environments. It also may form the basis for a portable Saxon compiler for XSLT3.0, when SEF-consuming runtimes for other language environments might be possible.

Saxon was originally written for the Java platform, but despite the mantra of "write once, run everywhere", there has been increasing demand over the years for Saxon's functionality to be made available on platforms that Java cannot reach. Sometimes this requirement can be met by cross-compilation: thus Saxon on .NET is developed by translating Java bytecode (using IKVMC) to the .NET equivalent, Saxon for C/C++ and PHP is developed using a Java compiler (JET) that generates Intel machine code, and Saxon-CE on the browser was developed using GWT, which translates Java source code to JavaScript. Cross-compilation, however, has its limitations, one of which is a total dependency on third-party tools that may be discontinued at any time. The Saxon-JS product introduced a new approach, by splitting the compile-time and run-time parts of the product and with a JavaScript runtime created an efficient way of executing Saxon stylesheets in the browser. But for other environments (Node.js being perhaps the most obvious example), users really need to be able to compile stylesheets on the target platform (in this case JavaScript) rather than compiling them on a Java platform and then moving the compiled code across to the target platform for execution. The classic solution to this problem Richards1971 is to write the front end of the compiler in its own language (that is, in this case, in XSLT). Given the availability of a run-time for a particular target platform, the compiler immediately becomes available on that platform. The idea of using XSLT to implement a compiler is intriguing: on the one hand, a compiler can be seen as a pipeline of tree-to-tree transformations which makes it a classic XSLT application; on the other hand, the data model and instruction set of XSLT is designed for handling a rather different kind of data than that found in traditional compilers.

An earlier attempt to implement part of the compilation phase of an XSLT compiler (specifically the optimizer) was reported in Kay 2007. The earlier experiment proved unviable, largely because of the expense of using XSLT to make a small modification to a large tree, and then performing this operation repeatedly (and recursively). A further problem was the difficulty of holding generalized values (for example a sequence of integers) as properties on an XML tree. It turns out that these obstacles are largely removed by the introduction of maps into the XDM data model. Maps in Saxon are implemented using an immutable hash trie, and a property of this data structure is that despite the fact that modification to a map always returns a new map and leaves the existing map intact, the new map can share most of the data in the old map, and the cost of the modification (in both time and space) is constant, rather than being proportional to the size of the map. While one can envisage similar optimizations to the tree structures used to represent XML documents, the fact that nodes in an XML document have distinct identity, and links to their parent nodes, makes it far more difficult; furthermore, the primitive operations on maps (put, remove) translate far more easily into simple operations on a hash trie than the complex instructions in XSLT used to construct XML result trees.

In the rest of this section we describe the Saxon-JS runtime, the extensions that were added to support dynamic XPath evaluation, and discuss what the execution-plan format is. In subsequent sections we then outline the approach for the compilation taken so far and some of the problems faced. The testing strategy is described and then finally we outline further work and draw conclusions.

Saxon-JS Runtime

The Saxon-JS runtime environment is written entirely in JavaScript and is intended to interpret and evaluate XSLT program execution plans and write results into the containing web page. These execution plans are presented as instruction trees describing the execution elements of an XSLT3.0 stylesheet in terms of sets of templates and their associated match patterns, global variables and functions, sequence constructors consisting of mixtures of XSLT instructions and XPath expression resolution programs and other static elements such as output format declarations.

Usually XSLT programs for Saxon-JS are compiled specifically for this target environment, using Saxon-EE to generate an XML tree as a Stylesheet Export File (SEF)[3]. This tree is loaded by a SaxonJS.transform() within a web page, which then executes the given stylesheet, with a possible XML document as source and other aspects of dynamic context, writing results to various sections of the web page via xsl:result-document directives. The engine supports specialist XSLT modes (e.g. ixsl:on-click) to trigger interactive responses, based on a model originally developed for Saxon-CE.

Saxon-JS supports a very large proportion of XSLT3.0 and XPath3.1 functionality, the notable current absences being schema-awareness, streaming, functional items and higher-order functions. As all the static processing (parsing, static type checking, adding runtime checks.…) of the presented XSLT stylesheet is performed by Saxon-EE during its compilation operation, programs running in Saxon-JS should benefit from the very high levels of standards compliance that the standard Saxon product achieves. At runtime Saxon-JS only has to check for dynamic errors and often these are programmed in the execution plan as specific check instructions. The consequences are that

  • The Saxon-JS code footprint is very much smaller, consisting only of a runtime interpreter.

  • Execution starts immediately after loading the instruction tree – there is no compilation phase

  • But Saxon-JS assumes the code is correct, as all static errors have been removed and necessary dynamic checks have been added.

Dynamic evaluation of XPath expressions

In late 2016 we extended the Saxon-JS runtime with a facility (written in JavaScript) to support dynamic execution of XPath expressions (needed for the xsl:evaluate instruction), which involved a parser for the XPath 3.1 grammar and a compiler for the resulting parse tree to an appropriate SEF tree. Full details of this are available in Lumley2017, but some aspects will necessarily be described in this paper. This feature (whose code is of the same order of size as Saxon-JS itself) is implemented as a dynamically loaded package in Saxon-JS, so those stylesheets that don't use dynamic evaluation avoid the additional loadtime penalty. (This XPath3.1 support can also be invoked directly from JavaScript, permitting use outside an XSLT context.)

Stylesheet Export File

The Stylesheet Export File was originally developed as a format to describe an exported, compiled, XSLT package for use in Saxon's support of package linking for XSLT3.0. As such it is required to describe all the necessary information required to reconstruct an operational transform. Full descriptions of its form can be found in Lockett2016, but the reader should understand the following.

The SEF is written in XML, in the namespace http://ns.saxonica.com/xslt/export, as a package element with the following structure of components, corresponding to the top-level components of a complete and statically compiled stylesheet. Each component is defined as a <co id="number"> element surrounding the definition itself.

Global parameters & variables

The global xsl:param and xsl:variable children of the stylesheet with a default value generator and for a parameter, an acceptor/converter expression for externally defined values, as their children[4] .

Global functions

Each function with a set of arguments and a body expression to evaluate the result, on the assumption the argument values have already been bound into the appropriate parameter slots.

Named templates

Each named template, i.e. those xsl:template[@name] with its sequence constructing instructions as its child.[5]

Modes

The defined or implict modes for template rules, ie.e those xsl:template[@match]. Each mode contains all the template rules associated with it, ordered by import precedence, priority and sequence position, corresponding to the conflict resolution rules. (The body of a template may appear multiple times when the match pattern is a union, as different rules with possibly different priorities. Equally well a template rule may appear in multiple modes, and the body may be shared with a named template as both @match and @name can co-exist.) The mode also describes its default behaviours (on-no-match="shallow-copy", on-multiple-match="fail" etc.) as encoded attributes.

Keys & accumulators

Declarations for xsl:key and xsl:accumulator definitions.

Other declarations

Definitions of the other declarations of the stylesheet (xsl:strip-space, xsl:decimal-format, xsl:output) appear as top level components of the package.

As an example here are a couple of samples, with various sections elided for brevity:

Table I

Some export file examples

XSLT
<xsl:stylesheet … version="3.0">    
</xsl:stylesheet>
          
SEF
<package xmlns="http://ns.saxonica.com/xslt/export"
   version="30" packageVersion="1" saxonVersion="9.7.0.15" 
   timeStamp="2017-05-29T13:11:19.791+01:00" target="JS">
 <co id="0" binds="">
  <mode onNo="TC" flags="dW" patternSlots="0"/>
 </co>
 <overridden/>
 <output>
  <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
 </output>
 <decimalFormat/>
</package>
XSLT
<xsl:stylesheet … version="3.0">
  <xsl:variable name="var1">Variable1</xsl:variable>
  <xsl:mode on-no-match="shallow-copy"/>
  <xsl:template name="A">        
    <A>
      <xsl:value-of select="$var1"/>
    </A>
  </xsl:template>
  <xsl:template match="a">
    <xsl:call-template name="A"/>
    <xsl:next-match/>
  </xsl:template>
</xsl:stylesheet>
SEF
<package xmlns="http://ns.saxonica.com/xslt/export"
 version="30" packageVersion="1" saxonVersion="9.7.0.15"
  timeStamp="2017-05-29T13:23:58.27+01:00" target="JS">
 <co id="0" binds="">
  <globalVariable name="Q{}var1" type="document-node()"
    …  visibility="PRIVATE">
   <doc … validation="preserve">
    <valueOf flags="Sl">
     <str val="Variable1"/>
    </valueOf>
   </doc>
  </globalVariable>
 </co>
 <co id="1" binds="0">
  <template name="Q{}A" … slots="0">
   <elem role="body" name="A">
    <valueOf flags="l">
     <gVarRef name="Q{}var1" bSlot="0"/>
    </valueOf>
   </elem>
  </template>
 </co>
 <co id="2" binds="1">
  <mode onNo="SC" flags="dW" patternSlots="0">
   <templateRule prec="0" prio="0.0" seq="0" rank="0" minImp="0" slots="0"… >
    <p.nodeTest role="match" test="element(Q{}a)" 
      jsTest="var q=SaxonJS.U.nameOfNode(item); 
              return SaxonJS.U.isNode(item)
              &amp;&amp; item.nodeType==1 
              &amp;&amp; q.uri==''&amp;&amp;q.local=='a';"/>
    <sequence role="action" … >
     <callT name="A" bSlot="0"/>
     <nextMatch flags="ti"/>
    </sequence>
   </templateRule>
  </mode>
 </co>
 <overridden/>
 <output>
  <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
 </output>
 <decimalFormat/>
</package>

The first example is the minimal stylesheet, which has implicit built-in template rules with 'text-only-copy' behaviour, described by the unnamed mode with no child template rules. In the second example the xsl:call-template name="A" is bound to its target named template component, and the $var1 reference to its definition, by the @bSlot reference to the component-binding-vector (@binds) on its enclosing component. The template rule has two children: one with a match role, which is a pattern definition and the other an action role, which is the sequence constructor to be used when the template is selected.

The general form of the bodies of these components is an XML tree that defines an instruction/function invocation, such as <forEach>arg0,arg1</. The name of the element describes the operation, in this case a for-each, where the first child is the generator instruction of the for values and the second child defines the sequence constructor instruction for each of those when bound as the context item. Saxon-JS's main loop is an interpreter of such instructions, systematically evaluating instruction children as arguments and producing suitable results, usually as (lazy) iterators over sequences of atomic values, maps/arrays and result-tree fragments.

It is not surprising that most of these instructions correspond to either productions in the XPath grammars or XSLT declarations or instructions, often using the same name, or camelCase() thereof, as the major internal semantics of evaluation within XSLT sequence constructors and XPath expressions is functional.

Some of these instructions are parameterised with attributes, such as <fn name="fn-name> which calls the given system function. Other attributes describe possibly-needed properties, such as active namespace bindings (e.g. ns=" =foo map=~" records that the default namespace is "foo" and that the prefix map is bound to its conventional namespace for the map functions.) For some instructions, the arguments are positional (such as forEach) – for others (such as templateRule) the @role attribute is used for identification. In other cases instructions of given name are used (such as catch within try).

As will become apparent later, for some of the important instructions there is no difference between whether the original code was in an XPath expression or an XSLT tree fragment. For example both $in !body and <xsl:for-each select="$in">body</ are both represented by forEach. Similarly xsl:variable bindings are represented as let instructions surrounding their definition as the first child and a sequence of subsequent result-generating children as the second, in exactly the same manner as let $var := … return … would in an XPath expression. This means it should be possible to reuse much of the code already developed for XPath compilation within an XSLT situation.

Compiling the XSLT

The goal of the project is to construct a high-conformance compiler from XSLT3.0 to SEF, the compiler itself being written in XSLT3.0 with the addition of XPath compilation functions, present in Saxon-JS, developed for the support of dynamic XPath evaluation. At this stage the compiler (which has been externally compiled/exported into SEF) is intended to run within the JavaScript environment of a browser. Given that both source (XSLT3.0) and destination (SEF) are well-defined XML, it was anticipated that using XSLT3.0 for the compiler would give a strong advantage, and so it proved.

This project is still in development, so some of the steps described will change as we tidy up the code, extend functionality and streamline the execution. In particular we've chosen to use several passes (6 or 7?) across the stylesheet/package trees to increase visibility for debugging and limit the everything's empty cascade, primarily using a central XML tree representation of the evolving program during initial development of the compiler. Some of the phase separations shouldn't really be possible and might be plain wrong in the most general case, but for early development and the simpler majority of test cases, make development more transparent. An example is in the interaction of static variables and parameters and conditional compilation on resource collection where originally collection was followed by static processing but are now merged into a single pass – see section “Preprocessing: package collection and static and conditional processing”.

The development methodology was to get something running quickly – just compiling the simplest of stylesheets and then attempt to attach to a (browser-based) testdriver running the XSLT3.0 test suite (see section “Testing”). This framework not only allowed selected tests to be compiled, run and checked but also to examine various intermediate stages of the compilation state, as well as diving under the hood with the browser-based JavaScript debuggers. The process was then to choose one XSLT instruction (we chose xsl:choose) and get both the XSLT test situation and the compiler operational, and then slowly expand the set of features supported [6].

Once the compiler is fully operational, and we have an executing large example corpus, we can begin to merge phases together and rationalise some of the internal data structures, especially exploiting map() and array() types when a mixture of properties and tree fragments are needed.

The reader should note also that there are often several different strategies that will produce the same result in Saxon-JS so the fact that code shown here is not the same as that which Saxon-EE would produce isn't necessarily an error. However we trust the reader will get a sense of the approach being taken.

The main phases of the compilation process required are:

  • XML parsing – not really part of the compiler, we assume this has already been done before we start.

  • preprocessing – evaluation of static variables, use-when, shadow attributes, include/import.

  • validation and normalization – essentially the kind of validation that could be done with a schema processor, checking that elements and attributes appear in the right place and have permitted values .Currently this is performed as part of the XSLT to SEF instructions process, but might be better to do it as a free-standing phase. For example, normalizing boolean attributes from yes|1|true to true early on can save a lot of boring code later.

  • indexing components: building symbol tables for named objects such as variables, functions, named templates and attribute sets, resolving references, checking for duplication.

  • parsing of XPath expressions and patterns.

  • type analysis: inferring the types of expressions, checking (as far as is statically possible) that these are consistent with required types, adding run-time type checking code to the tree where necessary.

  • optimization: replacing constructs with equivalent constructs that are faster to evaluate (for example, rewriting (position() = 1 to 10) with (position() ge 1 and position() le 10). Currently little of this is being done, save that some arithmetic expressions over constants are resolved during XPath compilation.

  • streamability analysis: not being done at the moment, since Saxon-JS does not implement XSLT 3.0 streaming. (Note that work reported in Lumley2014 describes an XSLT implementation of streamability analysis, which may be of relevance to a future XSLT-based compiler product.)

  • code generation: normally the last phase of a compiler, in this architecture all that happens here is actions such as allocating slots for variables and component references, generating JavaScript code for node tests and name tests, and optional serialization of the final SEF tree as lexical XML, including a checksum to deter attempts at manual modification.

In the current design most of the representation passed between phases is a (single) XML tree, but we anticipate fuller use of maps and arrays, that XSLT3 supports, describing more collections of components and associated properties.

A simple walk-through

We'll start by a simple walk-though of one of the first test cases we used – involving xsl:choose. The stylesheet is:

<xsl:stylesheet …  version="2.0">
  <xsl:template match="/doc">
    <out>
      <xsl:for-each select="person">
        <xsl:choose>
          <xsl:when test="sex='M'">
Male: </xsl:when>
          <xsl:when test="sex='F'">
Female: </xsl:when>
          <xsl:otherwise>
Who knows?: </xsl:otherwise>
        </xsl:choose>
        <xsl:value-of select="name"/>
      </xsl:for-each>
    </out>
  </xsl:template>
</xsl:stylesheet>

For the moment we'll ignore the static collection phase and show the result of the first conversion pass:

<package …  version="30" packageVersion="1" saxon-version="Saxon-JS-internal"
   target="JS" name="TOP-LEVEL" relocatable="false">
   <co id="0">
      <mode onNo="TC" flags="dW" patternSlots="0">
         <templateRule prec="0" seq="0" ns="xml=~ xsl=~" minImp="-10" flags="s"… 
            select="node()" match="/doc" module="choose-0101.xsl">
            <xpath role="match" xpath="/doc" ns="xml=~ xsl=~"/>
            <elem name="out" type="element()" 
               role="action" ns="xml=~ xsl=~" namespaces="">
               <forEach>
                  <xpath role="select" xpath="person" ns="xml=~ xsl=~"/>
                  <sequence>
                     <choose type="item()*">
                        <xpath role="" xpath="sex='M'" ns="xml=~ xsl=~"/>
                        <valueOf>
                           <str val="
Male: "/>
                        </valueOf>
                        <xpath role="" xpath="sex='F'" ns="xml=~ xsl=~"/>
                        <valueOf>
                           <str val="
Female: "/>
                        </valueOf>
                        <true/>
                        <valueOf>
                           <str val="
Who knows?: "/>
                        </valueOf>
                     </choose>
                     <valueOf flags="l">
                        <fn name="string-join" role="select">
                           <convert from="xs:untypedAtomic" to="xs:string">
                              <data>
                                 <xpath role="select" xpath="name" ns="xml=~ xsl=~"/>
                              </data>
                           </convert>
                           <str val=" "/>
                        </fn>
                     </valueOf>
                  </sequence>
               </forEach>
            </elem>
         </templateRule>
      </mode>
   </co>
   <overridden/>
   <output>
      <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
   </output>
   <decimalFormat/>
</package>

The package basically consists of a single unnamed mode of templates with just one template. We leave the @match for this template as an xpath child, recording both its role and the XPath/pattern expression it defines. Then we collect up all its sequence constructor components (in this case the <out> element) as the second child labelled as the action role. (If the template had had parameters these would have been top-level children of the rule, hence the use of the role identification attributes.)

Within that sequence constructor we compile recursively. The <out> becomes an element-generating instruction with the required name as a parameter[7]. Its single child is a xsl:for-each, which compiles to a forEach leaving its content selector as an xpath role="select" child, and its sequence constructs as a sequence around the two children, which have compiled into a choose and a valueOf.

The valueOf contains instructions to convert the result into a space-separated concatenation of the evaluation of the name XPath on the context item[8]. The choose has been expanded as a sequence of test/result pairs, were the test is an XPath that will checked against context and the xsl:otherwise has a test of true(), for which a specific instruction true is provided in Saxon-JS. (During the conversion of xsl:choose to choose checking of static language errors such as xsl:otherwise not being last or incorrect children, is carried out trivially by a series of tests such as

exists(xsl:otherwise/(following-sibling::*|following-sibling::text()[matches(.,'\S')])

During the next step we compile all the embedded XPath expressions, leaving their resulting instruction trees in situ:

<package...>
  <co id="0" binds="">
    <mode onNo="TC" flags="dW" patternSlots="0">
      <templateRule prec="0" seq="0"...>
        <docOrder role="match">
          <slash>
            <fn name="root" type="node()?">
              <treat type="node()" jsTest="return SaxonJS.U.isNode(item);" … >
                <dot type="item()" … />
              </treat>
            </fn>
            <axis name="child" type="element()*" 
               jsTest="var q=SaxonJS.U.nameOfNode(item); 
                       return item.nodeType==1 &amp;&amp;
                        q.uri=='' &amp;&amp; q.local=='doc';"/>
          </slash>
        </docOrder>
        <elem name="out" type="element()" role="action" ns="xml=~ xsl=~" namespaces="">
          <forEach forType="element()" type="item()*">
            <axis role="select" name="child" type="element()*" 
              jsTest="var q=SaxonJS.U.nameOfNode(item); 
                      return item.nodeType==1 &amp;&amp; 
                        q.uri=='' &amp;&amp; q.local=='person';" … />
            <sequence type="item()*">
              <choose type="item()*">
                 <gc role="" type="xs:boolean" card="1:1" comp="..." op="="… >
                    <data type="xs:anyAtomicType*">
                      <axis name="child" type="element()*" 
                        jsTest="var q=SaxonJS.U.nameOfNode(item); 
                                return item.nodeType==1 &amp;&amp; 
                                  q.uri=='' &amp;&amp; q.local=='sex';"/>
                    </data>
                    <str val="M" type="xs:string"/>
                </gc>
                <valueOf>
                  <str val="
Male: "/>
                </valueOf>
                 …
              </choose>
              <valueOf flags="l">
                <fn name="string-join" role="select">
                  <convert from="xs:untypedAtomic" to="xs:string">
                    <data>
                      <axis role="select" name="child" type="element()*"
                        jsTest="var q=SaxonJS.U.nameOfNode(item);
                                return item.nodeType==1 &amp;&amp; 
                                  q.uri=='' &amp;&amp; q.local=='name';"… />
                    </data>
                  </convert>
                  <str val=" "/>
                </fn>
              </valueOf>
            </sequence>
          </forEach>
        </elem>
      </templateRule>
    </mode>
  </co>
  <overridden/>
  <output>
    <property name="{http://saxon.sf.net/}stylesheet-version" value="30"/>
  </output>
  <decimalFormat/>
</package>

Now the XPath expressions have been expanded, two of them to a plain axis and one to a general comparison of an axis value against a string. Each of the axis instructions contains some JavaScript code to qualify node items for inclusion in the axis search results. (Note well – these code statements have been line-wrapped for brevity – in practice they must not contain newlines, as JavaScript treats this as end of statement.) Also the type result of the expression has been determined and propagated upwards so that result trees are now typed. This becomes important in issues of whether type casts are necessary, dynamic type/cardinality checks are required, what the type of the context-item is or if statically-determined type errors have occurred.

Now we complete the process by converting the XPath determined for the match pattern into a true pattern:

<package...>
   <co id="0" binds="">
      <mode onNo="TC" flags="dW" patternSlots="0">
         <templateRule prec="0" seq="0" …  prio="0.5">
            <p.withUpper role="match" axis="parent" upFirst="true" type="element()*">
               <p.nodeTest test="element()*" type="element()*"
                 jsTest="var q=SaxonJS.U.nameOfNode(item); 
                   return SaxonJS.U.isNode(item) &amp;&amp;
                     item.nodeType==1 &amp;&amp; q.uri=='' &amp;&amp; q.local=='doc';"/>
               <p.nodeTest test="document-node()"  type="node()?"
                 jsTest="return SaxonJS.U.isNode(item) &amp;&amp;
                    (item.nodeType==9 || item.nodeType==11);"/>
            </p.withUpper>
            <elem name="out" type="element()"… >
….

In this case the pattern matches doc when the parent node is a document-node() or a document fragment. The default priority for this pattern has been determined, based on the complexity of the pattern tree and written onto the rule, when priority has not already been declared. This, along with the import precedence and sequence position will be used to order the rules for matching and using xsl:next-match[9].

As should be apparent, these code samples can be large. In the rest of the document I'll concentrate on the main operations and phases of the compilation process and some of the more complex constructs.

Preprocessing: package collection and static and conditional processing

The first phase of package formation is to collect together from the top level package, all the necessary resources in terms of required subsidiary packages and imported and included stylesheets. At present we are only addressing resources in the form of XSLT stylesheets, not packages. At first glance this should be a comparatively simple exercise with templates matching the collection directives (xsl:import|xsl:include), deferencing the document locations and including contents recursively. However there are two, or three, complications.

Some components, such as templates, in stylesheets that are imported have a lower precedence in execution that those in the importing stylesheets, or sibling imported stylesheets. This is an important mechanism for libraries in XSLT1.0 and 2.0 and requires accurate tracking of importation position.

In addition, in XSLT 3.0 stylesheets can be parameterised statically in two ways, which can influence the collection result significantly. Firstly elements of the stylesheet can be included conditionally by @[xsl:]use-when decorations, using XPath expressions that can involve static global variables and parameters as well as the more usual features of the compiling environment, such as the implementation name or version number. Secondly the values of attributes on XSLT elements can be computed at compile-time from such static variables. This feature, termed shadow attributes, would for example permit different versions of a library to be included at compile time, dependent upon a supplied parameter (using _href="lib-{$version}.xsl") .

A further compilcation arises from the possibility that static (global) variables in imported or included stylesheets have effective scope not just within their containing stylesheets, but also for all subsequent components in calling stylesheets, which infers they migrate to top level – effectively the scope then becomes following::*. This requires static variable bindings to migrate up along with included and imported preprocessed stylesheets.

Given the necessity of transmitting state (static variables, import precedences) along the sibling axes of the stylesheets during this process, we use an iterator over the stylesheet which returns a map, of the general form:

<xsl:mode name="static" on-no-match="shallow-copy"/>

<xsl:template match="xsl:stylesheet | xsl:transform" mode="static" priority="1.5" as="map(*)">
  <xsl:param name="import-precedence" select="1" tunnel="true"/>
  <xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/>
  
  <xsl:variable name="me" select="."/>      

  <xsl:iterate select="*">
    <xsl:param name="static-vars" as="map(*)" select="map{}"/>
    <xsl:param name="precedence-base" select="$import-precedence"/>
    <xsl:param name="parts" select="()"/> 
    <xsl:on-completion>
      <xsl:variable name="sheet" as="element()">
        <xsl:for-each select="$me">
          <xsl:copy>
            <xsl:sequence select="@*, $parts"/>
          </xsl:copy>
        </xsl:for-each>
      </xsl:variable>
      <xsl:sequence select="map{'sheet':$sheet,
                                'precedence':$precedence-base,
                                'static-vars':$static-vars}"/>
    </xsl:on-completion>
    <!– Process component -->
  </...>

<xsl:template match="component" mode="static">....

where the static evaluation state can be altered for following-sibling components, by a next-iteration instruction with altered parameter values, which also accumulates the components of the stylesheet though the parts parameter.

First we evaluate and accumulate the state of the static variables and parameters. These are global variable or parameters of a stylesheet that have been declared @static="yes". These are restricted in the sense that their values can only be defined either by external compile-time binding (for parameters) or by an XPath expression which involves the usual static context and reference to other earlier-defined static variables or parameters only, though static variables from included stylesheets do come into scope.

Pleasingly in the most common cases this turns out to be comparatively straightforward, relying on the two restrictions: i) that static variables can only set a value via an XPath expression in their @select attributes (i.e. no sequence constructors) and the static context for that expression excludes any user-defined functions in the package and ii) reference to variables can only be to prior static global variables. These together mean that given the determined values of all previously defined static variables, the value for the next variable can always be determined by an <xsl:evaluate xpath="@select"> action, supplying a namespace context and a map of those previously defined variable values on the @with-params map argument. So our process step for such variables is:

<xsl:when test="self::xsl:variable/@static = ('yes', 'true', '1')">
  <xsl:variable name="possible.component" as="element()?">
    <xsl:apply-templates select="." mode="#current">
      <xsl:with-param name="static-vars" select="$static-vars" tunnel="true"/>
    </xsl:apply-templates>
  </xsl:variable>
  <xsl:if test="exists($possible.component)">
    <xsl:variable name="value" as="item()*">
      <xsl:evaluate xpath="@select" namespace-context="." with-params="$static-vars"/>
    </xsl:variable>
    <xsl:next-iteration>
      <xsl:with-param name="static-vars" as="map(*)"
        select="map:put($static-vars, resolve-QName(@name, .), $value)"/>
      <xsl:with-param name="parts" select="$parts,$possible.component"/>
    </xsl:next-iteration>
  </xsl:if>
</xsl:when>

It is first necessary to process the variable declaration itself, as it can be subject to conditional compilation (as discussed just below). If the declaration still stands we evaluate its @select expression with all previous static variables in scope and then add that name/value mapping to the static-variable state for the next iteration on subsequent components, with the variable definition added to the list of component parts[10].

The conditional inclusion directive (@[xsl:]use-when) is processed by a simple high-priority template, shown below, matching elements which include the directive. When the directive is a shadow attribute it is first preprocessed to expand the expression value. Then the expression is evaluated in current static variable context. If its boolean value is true then the element is processed conventionally. If not, nothing is written to the expanded stylesheet.

<xsl:template match="xsl:*[@use-when | @_use-when] | *[@xsl:use-when | @xsl:_use-when]"
        priority="3" mode="static">
   <xsl:param name="static-vars" as="map(*)" select="map{}" tunnel="true"/>
   <xsl:variable name="use-expression" as="xs:string">
     <xsl:choose>
       <xsl:when test="@*:_use-when">
         <xsl:apply-templates select="@*:_use-when" mode="#current"/>
       </xsl:when>
       <xsl:otherwise>
          <xsl:sequence select="@*:use-when"/>
       </xsl:otherwise>
     </xsl:choose>
   </xsl:variable>
   <xsl:variable name="use" as="xs:boolean">
      <xsl:evaluate xpath="$use-expression" namespace-context="." with-params="$static-vars"/>
   </xsl:variable>
   <xsl:if test="$use">
      <xsl:next-match/>
   </xsl:if>
</xsl:template>

Shadow attributes are processed by a similarly simple template, writing a new attribute with the appropriate name.

Finally we can now track the importation precedence. The goal is that components in a higher-level stylesheet have higher precedence than those in any imported sheets, and lower than any following-sibling sheet. If we make a simplifying assumption, common in most designs and mandated in XSLT 1.0 and 2.0, that any importations, or inclusions that include importations, in a stylesheet precede any templates or other components that need to track precedence, then we can process in the same iteration:

<xsl:when test="self::xsl:import">
  <xsl:variable name="processed-directive" as="element()?">
    <xsl:apply-templates select="." mode="#current"/>
  </xsl:variable>
  <xsl:if test="$processed-directive">
    <xsl:variable name="doc" as="document-node()">
      <xsl:call-template name="collect-doc">
        <xsl:with-param name="uri" select="resolve-uri($processed-directive/@href, base-uri(.))"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="collected" as="map(*)">
      <xsl:apply-templates select="$doc/*" mode="#current"/>
    </xsl:variable>
    <xsl:variable name="max-precedence" as="xs:integer"
      select="xs:integer(max(($precedence-base, $collected?precedence)))"/>
    <xsl:next-iteration>
      <xsl:with-param name="precedence-base" as="xs:integer" select="$max-precedence + 1"/>
      <xsl:with-param name="parts" select="$parts,$collected?sheet/*"/>
      <xsl:with-param name="static-vars" select="$collected?static-vars"/>
    </xsl:next-iteration>
  </xsl:if>
</xsl:when>

The technique processes the imported stylesheet recursively, recording precedence as an attribute on components that require it and then finds the maximum of the precedences recorded over the (expanded) stylesheet. One more that that precedence is then used for the remaining components.

However, XSLT 3.0 has liberalised the situation, such that importations, and inclusions that contain importations, can occur anywhere in the top level of a stylesheet. This means that templates that precede an importation will now have higher precedence than those imported, as well as those that follow. To handle this case the process is changed to use a map-base tree of importations and their precedences and a recursively-sensitive algorithm to distribute computed precedences across elements in this tree.

During this phase we also track and record active namespaces and the base URI for each element. During subsequent processing, with a plethora of temporary trees, ancestry is frequently lost and these important early properties often with it. They are currently recorded as attributes in a separate namespace for later bulk removal. (This also a suitable point to carry out some minor forms of canonicalisation, such as normalizing whitespace on @as type declarations. Most of these will probaby migrate to a separate normalization phase.)

XSLT to SEF instructions

With the stylesheet expanded statically, the bulk of the compilation action involves conversion of XSLT instructions to suitable SEF elements and subtrees. At present we perform this in two phases – conversion without XPath compilation, followed by compilation of XPath expressions and typechecking.

Naturally XSLT's push mode is highly suitable for the converstion of XSLT components and instructions to SEF. Here are some simple examples:

<xsl:template match="xsl:for-each" mode="sef">
  <xsl:param name="attr" as="attribute()*" select="()"/>
  <xsl:call-template name="check-select"/>
  <xsl:sequence select="f:check-sort(.)"/>
  <forEach>
    <xsl:sequence select="$attr"/>
    <xsl:iterate select="xsl:sort">
      <xsl:param name="inner">
        <xsl:apply-templates select="@select" mode="create.xpath"/>
      </xsl:param>
      <xsl:on-completion select="$inner"/>
      <xsl:next-iteration>
        <xsl:with-param name="inner">
          <sort>
            <xsl:sequence select="$inner"/>
            <xsl:apply-templates select="." mode="#current"/>
          </sort>
        </xsl:with-param>
      </xsl:next-iteration>
    </xsl:iterate>
    <xsl:call-template name="sequence-constructor"/>
  </forEach>
</xsl:template>

In compiling a xsl:for-each we first check it has a @select attribute (an error will be thrown by the named template if not) and then that any xsl:sort children are first. The forEach SEF instruction has possible attributes added through the $attr parameter – this is usually to set a possible role identifier, such as when the xsl:for-each is the only (action) child of a template rule. Then the selected content is created as an xpath element nested inside any defined xsl:sort declarations, by using an iterator. Finally the sequence constructor instructions are generated using a canonical named template. (Note that regardless of the number of xsl:sort children, the forEach only has two children – the first a content-generating instruction (e.g. sort[sort[xpath]]), the second a sequence constructor.)

 <xsl:template match="xsl:function" mode="sef">
   <xsl:variable name="params" as="element()*">
     <xsl:apply-templates select="xsl:param" mode="#current"/>
   </xsl:variable>
   <co>
     <function name="{f:EQName(@name,.)}" as="{(@as,'item()*')[1]}" slots="{count($params)}">
        <xsl:call-template name="record-namespaces"/>
        <xsl:sequence select="$params"/>
        <sequence role="body">
           <xsl:apply-templates select="* except xsl:param" mode="#current"/>
        </sequence>
     </function>
   </co>
</xsl:template>
    
<xsl:template match="xsl:function/xsl:param" mode="sef">
   <arg slot="{position() - 1}" name="{f:EQName(@name,.)}">
     <xsl:sequence select="@as"/>
   </arg>
</xsl:template>

An xsl:function is compiled to a separate package component with a function child, with name, type and number of argument slots properties. The active namespaces are then recorded so that in subsequent XPath compiling the correct prefix-namespace bindings are available. The parameters become named and typed arg children, each with an allocated value slot. Finally the sequence constructor is compiled from all the non-parameters and identified as having the body role for the function.

Some other declarations on stylesheet elements (@default-collation, @expand-text etc.) are also processed during this phase, by modifying the static compiling context, which is currently distributed across a set of tunnelled variables[11]and which can be retrieved for compilations that are dependent upon them. As these declarations are orthogonal to each other, we arrange a series of priority-ordered templates which pass on responsibility for final effect using xsl:next-match with a suitable tunneled parameter. For example:

<xsl:template match="xsl:*[@xpath-default-namespace]" priority="3" mode="sef">
  <xsl:param name="attr" as="attribute()*" select="()"/>
  <xsl:next-match>
    <xsl:with-param name="xpath-default-namespace" select="@xpath-default-namespace"
                tunnel="true"/>
    <xsl:with-param name="attr" select="$attr"/>
  </xsl:next-match>
</xsl:template>

<xsl:template match="xsl:*[@expand-text]" priority="3.3" mode="sef">
  <xsl:param name="attr" as="attribute()*" select="()"/>
  <xsl:next-match>
    <xsl:with-param name="expand-text" as="xs:boolean"
       select="@expand-text = ('true', 'yes')" tunnel="true"/>
    <xsl:with-param name="attr" select="$attr"/>
  </xsl:next-match>
</xsl:template>

Local variable and template or iteration parameter declarations are processed in part at this stage, but left as specific VARDEF or PARAMDEF elements, with name and possible type attributes and their value generator as a child, either from a @select expression or a sequence constructor. They will be finally converted in the next phase.

At this point we have mappings of almost all of the XSLT declarations and instructions into SEF equivalents, save that XPath expressions, matching patterns and parameter and variable declarations have not been finalised.

XPath and typing

Now we have the main elements of the package as a series of components with subtrees. Some of the components define global variables, parameters and functions, others define named-templates and modes of template rules. Scattered throughout are uncompiled XPath expressions described as <xpath xpath="expression"… /> elements, as well as unprocessed variable and parameter definitions as VARDEF|PARAMDEF subtrees. We now proceed to compile these XPaths, by using the saxon:compile-XPath() function developed for use in the support of xsl:evaluate. But in this process, these XPath expressions can, and often do, make reference to resources that are defined elsewhere in the stylesheet, namely user-defined functions and global and local variables and parameters, or even the type of the context-item for its evaluation. Such reference covers not only their existence (and possible slot for their value) but equally crucially the type and cardinality of value/result/argument they deliver or expect.

For functions the process involves forming a map, keyed by name and arity, of each of the function bodies – this is passed in as part of the static context state to XPath compilations – when the language property of the compilation is XSLT then these definitions are searched for function calls if the standard library fails to match[12]. Static type checking of both arguments and results of calls to these user-defined functions is carried out in exactly the same manner as for built-in functions, save that the call instruction is ufCall rather than fn and the call contains a binding to the appropriate package component for the function body.

Global variable and parameter definitions are passed into XPath compilation as maps of name→slot and name→type after having themselves been compiled to both build value generation instructions and determine value type. (As they can acyclically reference each other, this is carried out in a similar manner as for local variables in scope.)

For local variables and parameters we need to handle things with regard to their scoping. Recall that in the previous stage we left them as VARDEF or PARAMDEF children, typically within a sequence constructor:

<xsl:variable name="words" select="tokenize(.)"/>
<xsl:variable name="chars" select="sum($words!string-length(.))"/>
<xsl:copy>
  <xsl:attribute name="words" select="count($words)"/>
  <xsl:attribute name="chars" select="$chars"/>
</xsl:copy>
→
<sequence>
  <VARDEF name="words">
    <xpath role="select" xpath="tokenize(.)"… />
  </VARDEF>
  <VARDEF name="chars">
    <xpath role="select" xpath="sum($words!string-length(.))" … />
  </VARDEF>
  <copy flags="cin">
    <sequence>
      <att name="words" type="attribute()" … >
        <fn name="string-join" role="select">
          <convert from="xs:untypedAtomic" to="xs:string">
            <data>
              <xpath role="select" xpath="count($words)" … />
            </data>
          </convert>
          <str val=""/>
        </fn>
      </att>
      <att name="chars" type="attribute()" … >
         … 
           <xpath role="select" xpath="$chars" … />
         …
      </att>
    </sequence>
  </copy>
</sequence>

The scope of variables in XSLT is along the following-sibling::*/descendant-or-self::* compound axis. Thus for an instruction which has a child defining a variable we need to bind the variable and ensure the correct reference is made in any subsequent XPath. The mechanism for such in SEF is a let, in exactly the same manner as that within an XPath expression, with two children – the first defines the value, the second the result generator. The let itself describes the name of the variable and most importantly its value binding slot. Thus we need to generate suitably nested lets which we do with a recursive template. For each VARDEF encountered as the child of a variable-containing element it i) compiles the XPaths in its definition using the current static context, ii) modifies static state to include the binding of the variable name, its determined type and its allocated slot to store the run-time determined value, iii) recursively processes all following siblings, with the variable now in-scope for reference in any XPath expressions and iv) wraps within a let instruction the variable defining instructions and a sequence constructor around the processed remaining siblings. Thus our example compiles to two nested lets binding and referencing variables in slots 0 and 1:

<let var="words" slot="0" type="xs:anyAtomicType?">
  <fn role="select" name="tokenize" type="xs:string*" … >
    <cvUntyped type="undefined" to="xs:string" diag="0|0||tokenize">
       <data diag="0|0||tokenize" type="xs:untypedAtomic?">
         <dot type="element()" diag="0|0||tokenize"/>
       </data>
    </cvUntyped>
  </fn>
  <let var="chars" slot="1" type="xs:anyAtomicType?">
    <fn role="select" name="sum" type="xs:anyAtomicType?" … >
      <forEach type="xs:integer" diag="0|0||sum">
        <varRef name="Q{}words" slot="0" type="xs:string*"/>
        <fn name="string-length" type="xs:integer">
          <dot type="xs:string" diag="0|0||string-length"/>
        </fn>
      </forEach>
    </fn>
    <copy flags="cin">
      <sequence type="attribute()*">
         <att name="words" type="attribute()" … >
           <fn name="string-join" role="select">
             <convert from="xs:untypedAtomic" to="xs:string">
               <data>
                 <fn role="select" name="count" type="xs:integer" … >
                   <varRef name="Q{}words" slot="0" type="xs:string*" … />
                 </fn>
               </data>
             </convert>
             <str val=""/>
           </fn>
         </att>
         <att name="chars" type="attribute()" … >
           <fn name="string-join" role="select">
             <convert from="xs:untypedAtomic" to="xs:string">
               <data>
                 <varRef role="select" name="Q{}chars" slot="1" type="xs:anyAtomicType?" … />
               </data>
             </convert>
             <str val=""/>
           </fn>
         </att>
      </sequence>
    </copy>
  </let>
</let>

We can now look at typing. Checking and adapting to static or dynamic type is critical for XPath and XSLT and even more so for Saxon-JS, as the runtime makes lots of assumptions about argument types, or the presence of runtime type checking or casting instructions being present in the SEF tree. Consequences of error in processing the tree for these cases usually result in a JavaScript error through a null pointer reference. The XPath compiler has a built-in typechecker (in JavaScript) and we have mildly extended the compiler such that at least the top level result type appears on the compiled SEF tree as a @type attribute.

This means that after XPath compilation we should know what expected types will result from that expression. We can now do static type checking at the XSLT level, though at present parts of the code are a little rough-and-ready: eventually we'll converge to a consistent coherent type checker, either exposing the JavaScript one or a new one written in XSLT. The typing is used in a number of places:

  • Determining the type of variables and parameters, either from the inferred type of the @select expression, or the defined (@as) type or otherwise form the sequence constructor, such that correct type information is used for compiling XPath expressions that reference those variables.

  • Inferring the type of the context-item, either from a template matching or in the body of a context-altering instruction, such as xsl:for-each where the context-type for dot in the body is the singular of the type returned by the @select expression.

  • Determining whether a cast, a dynamic check or other instruction should wrap an element in the generated SEF tree, or indeed whether a static type error must always result.

Patterns

Patterns to match items are critical to XSLT templates and are also used in other features such as xsl:number, accumulators and keys. Whilst looking like XPath expressions, technically they have a different top-level grammar, some (subtree) productions of which are identical to those of XPath. In XSLT3.0 pattern possibilities have increased a great deal over XSLT2.0, especially in matches for non-nodal items. Eventually we'll have to bite the bullet and produce a variant of the XPath compiler (written in JavaScript and used for dynamic evaluation) which handles the XSLT pattern grammar and generates SEF instructions using the additional pattern-related instructions. This is particularly the case for positional predicates in patterns, where integer values denote qualifications on sequence position and the conversion becomes distinctly tricky and sensitive.

However for a lot of the simpler patterns, it is possible to compile them as if they were an XPath and then convert the (higher) sections of the SEF tree into suitable pattern elements. For example a/b[c] will compile if treated as XPath as:

<docOrder type="element()*" … >
  <slash>
    <axis name="child" type="element()*" 
      jsTest="var q=SaxonJS.U.nameOfNode(item); 
              return item.nodeType==1 &amp;&amp; q.uri=='' &amp;&amp; q.local=='a';"/>
    <filter type="element()*">
      <axis name="child" type="element()*" jsTest="… q.local=='b';"/>
      <fn name="exists" type="xs:boolean">
            <axis name="child" type="element()*" jsTest="… q.local=='c';"/>
      </fn>
    </filter>
  </slash>
</docOrder>

but as a pattern should be:

<p.withUpper role="match" axis="parent" upFirst="true" type="element()*">
  <p.withPredicate type="element()*">
    <p.nodeTest test="element()*" type="element()*"
      jsTest="var q=SaxonJS.U.nameOfNode(item); 
              return SaxonJS.U.isNode(item) &amp;&amp; 
                     item.nodeType==1 &amp;&amp; q.uri=='' &amp;&amp; q.local=='b';"/>
    <fn name="exists" type="xs:boolean">
      <axis name="child" type="element()*" jsTest="… q.local=='c';"/>
    </fn>
  </p.withPredicate>
  <p.nodeTest test="element()*" type="element()*" jsTest="…  q.local=='a';"/>
</p.withUpper>

Such simple conversions can be done with a modest number of templates used in a push mode. With such a pattern we can then also determine the default priority for a template using it, by matching various elements in the resulting pattern tree. This conversion is also used to create a separate template for each member of a union of patterns on a template match, as they may have separate default priorities and type for the context item matched.

Component binding

With all the components now created, the final remaining step is to link them as required through their component binding vectors. Each invocation instruction requiring linking (such as applyT for xsl:apply-templates to the appropriate mode containing relevant template rules) has a @bSlot attribute which is a number indexing into the containing component @binds attribute (the binding vector), which is a whitespace sequence of numbers corresponding to the @id of the component linked. This process is performed by a simple collection of mappings from global variables, named templates and modes and the components containing their definitions and a recursive tree-walk using a small number of templates. With this operation the compilation is complete and the SEF tree can be used as an execution plan for evaluation of the given transform.

Testing

In the development of the dynamic XPath support, we managed to build a testdriver framework, written in XSLT, that could exercise a great deal of the QT3 test suite, passing each required test to xsl:evaluate and checking the result assertions and reporting the results entirely within a webpage. (More details can be found in Lumley2017.) By using this tool, which also enabled compiled execution plans to be observed (using serialization) and the various phases of the compilation controlled, we managed to acheive very high levels of standards conformance, with some 18,000+ tests passed and O(100) failures, mostly in obscure edge conditions.

Obviously a similar approach, using the ~9,000 relevant tests of the XSLT3.0 test suite would be attractive and, given previous experience, an important, even central, tool in the design development. The testdriver framework was generalized using a modest amount of conditional compilation and two differing test stub XSLT stylesheets for XSLT and XPath testing. This enabled standard tests to be employed from very early on, and with some option controls it was possible to examine various phases of the compilation process on a given test (such as static expansion, XSLT conversion, XPath expansion, pattern formation and component binding) to isolate bugs, faults and errors. Usually the procedure was to select a single test-set (comprising 10-100 test cases) and first work to get the compilation phase for all of them avoiding a JavaScript error (usually a null pointer reference). Then this was extended to the run of these compilations and checking the result assertions.

A useful consequence of this was that parts of the XSLT test suite exercised portions both of XPath semantics and Saxon-JS requirements that presumably hadn't been covered before, as we discovered (and corrected) a number of new errors, both in XPath compilation and the Saxon-JS runtime.

Does it work?

To start with, this was just a small experiment to generate an SEF for some simple XSLT using Saxon-JS in the browser, but it became clear fairly quickly that something more substantial and with significant standards conformance might be possible. An early goal was to be able to get most of the XSLT tests for xsl:evaluate at least running and preferably passing. These tests involve a large mixture of namespace bindings, functions, data-borne expressions and so forth, so getting this going would be, as was, quite a milestone – currently only 5 of 36 fail.

At the time of writing, of the 120 different categories of relevant test (stylesheet attributes, declarations, instructions etc.) the tests in 40 of them are working (some 2,400 tests in total), which means the tests compile, run and perhaps >90% of them pass or generate the wrong error[13]. The rest of the categories have not been addressed specifically, but there is a lot of cross-dependence between the the categories of course.

Performance (at least on a high-end laptop) was pleasing. For example the compiling, running and checking of the 51 tests for xsl:choose takes some 7 seconds, the 36 tests for xsl:evaluate similar.

One of the original goals of Saxon-JS was to reduce the loading memory footprint, which had been just under 1 MB for Saxon-CE. This has been achieved as the minified Saxon-JS runtime is about 220kB. XPath evaluation support (additional JavaScript) is dynamically loaded by Saxon-JS and takes roughly the same amount of memory (~180kB minified). The compiler (inside whatever front-end is being used) is currently loaded as a complete SEF into Saxon-JS and is about 800kB, but contains much information redundancy, as a ZIP compression reduces it to a tenth of its size.

The technology is however sufficiently advanced that a simple editor/executor for XSLT stylesheets can be run completely in the browser, using all the features of Saxon-JS to support drag-and-drop interactive programming.

One of the eventual acid tests for portability of a compiler-written-in-itself, is an ability to successfully compile itself into a form that can repeatedly compile itself idempotently. In some previous work (Lumley2015) we showed how an XSLT3.0→2.0 converter, written in XSLT3.0, could convert itself and the resulting XSLT2.0 conversion successfully and recursively convert the original XSLT3.0 source.

The current development is a long way from that stage yet, but theoretically the compiler is only using (some of) the features of XSLT for which it can compile successfully, so it should be possible. The XSLT3.0 repertoire supported by the current target platform (Saxon-JS) has limitations (higher-order functions might be most useful), but of course the compiler cannot use them. Interestingly enough are some of the features that the compiler (writer) hasn't used thus far – attribute sets and accumulators amongst them.

Possible developments and Conclusion

In this paper we definitely are reporting on on-going work. The first intention is to complete a compiler that will run in our current environment, that of Saxon-JS operating in a browser. With that complete we can start to consider mechanisms to make some of the technology more generic and exploitable on other platforms, probably starting with Node.js.

Completing the browser compiler

Parts of the compiler are being modified as new issues are uncovered and understanding increases. The original design was almost entirely based on a central XML tree and tunnelled variables handling separate components of static compiling state.

As illustrated in the section section “Preprocessing: package collection and static and conditional processing” we can begin to use compound (map, XML) data structures more heavily in the design. Two obvious candidates are firstly to group all the static context into a single map rather than a set of tunneled variables. The second is build a better type representation structure for the XSLT processing – currently compiled XPath expressions leave their determined type as an attribute, and in the XSLT this is mostly manipulated with string processing, which isn't really robust enough.

The XPath compiler already has a type system using a simple JavaScript property pair (.baseType, .card) If we arrange that the XPath compiler returns a tree/type (map) structure then we may be able to avoid the necessity of attribute-recorded type properties[14].

Even more interestingly, we may be able to expose as an extension function the static type checking code that is already present in the XPath compiler to use from XSLT – this will ensure much greater type coherence between the two language implementations.

A generic compiler framework

This work was only possible because of the very sizeable effort in building the Saxon-JS runtime interpreter of SEF XML trees. Building the XPath compiler involved addition of an XPath parser (whose portability is dependent on the excellent REx REx parser-generator we have used for both XSLT and JavaScript implementation targets) followed by an parse tree reducer (in the spirit of Pemberton2013) written in JavaScript. The compiler itself (and the static type checker) were also JavaScript. But it would be entirely possible to write those later phases in XSLT, using some of the structures of and lessons from the compiler for XSLT, albeit at a (compile-time) performance penalty.

This then opens the possibility of reducing the portability issue for a new target to three stages – procuring/generating an XPath parser (and arranging tree reduction), developing the data model and API and the minor issue of building the SEF interpreter itself.

Other points

One of the most useful features within this project (and indeed the XPath compiler too) has been the early writing of an extended, though non-compliant serialize() function. Unlike the standard function it will serialise sequences of items into an XPath-like form ( e.g. (1,'fred', <foo>) ) as well as expanding/indenting XML trees and serializing map() and array() types. This meant that during debugging it was possible to see reasonably full details of intermediate data structures, either through xsl:message or in xsl:result-document to the browser screen, in ways that would be virtually impossible using the JavaScript debugger. It is central to displaying intermediate stages of many of the developemtn and testing processes.

Acknowledgements

The authors thank those who urged the preparation of this paper – during the writing several parts of the compiler were cleaned, redesigned and debugged in a very satisfactory way!

References

[Delpratt2013] Delpratt, O'Neil and Kay, Michael. Multi-user interaction using client-side XSLT. In Proceedings of XMLPrague 2013, pp1-23 (2013) [online] http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf

[Kay 2007] Kay, Michael. Writing an XSLT Optimizer in XSLT. In Proceedings of Extreme Markup Languages. (2007). [online] http://www.saxonica.com/papers/Extreme2007/EML2007Kay01.html

[Lockett2016] Lockett, Debbie and Kay, Michael. Saxon-JS: XSLT 3.0 in the Browser. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 13 (2014). doi:https://doi.org/10.4242/BalisageVol17.Lockett01. [online] http://www.balisage.net/Proceedings/vol17/html/Lockett01/BalisageVol16-Lockett01.html

[Lumley2014] Lumley, John. Analysing XSLT Streamability. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:https://doi.org/10.4242/BalisageVol13.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html

[Lumley2015] Lumley, John. Two from Three (in XSLT). In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html

[Lumley2017] Lumley, John, Lockett, Debbie and Kay, Michael. XPath 3.1 in the Browser. In Proceedings of XMLPrague 2017, pp 1-18 (2017) [online] http://archive.xmlprague.cz/2017/files/xmlprague-2017-proceedings.pdf

[Pemberton2013] Pemberton, Steven. Invisible XML. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Pemberton01. [online] http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html

[Richards1971] Richards, Martin. The portability of the BCPL compiler. Software Practice and Experience, Vol 1, Issue 2, April 1971pp 135-146 (1971). doi:https://doi.org/10.1002/spe.4380010204

[REx] Rademacher, Gunther: REx Parser Generator. [online] http://www.bottlecaps.de/rex/

[XPath3.1] Robie, Jonathan, Dyck, Michael and Spiegel, Josh, Editors. XML Path Language (XPath) 3.1. World Wide Web Consortium, 21 March 2017. [online] http://www.w3.org/TR/xpath-31/

[XPath.FO] Kay, Michael, Editor. XQuery and XPath Functions and Operators 3.1. World Wide Web Consortium, 21 March 2017. [online] http://www.w3.org/TR/xpath-functions-31/

[XSLT3.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 3.0. World Wide Web Consortium, 18 April 2017. [online] http://www.w3.org/TR/xslt-30/



[1] Much of this section is lifted verbatim from my XMLPrague paperLumley2017. The background is pretty similar, so why not practice reuse?

[2] In the process pushing client-based Java towards oblivion too.

[3] Apart from some additional Saxon-JS attributes, the tree is identical to that used for linking separately-compiled packages for execution by a Java-based server-side Saxon engine – hence it shares the same degree of compilation correctness as any other Saxon-processed stylesheet.

[4] Sometimes a global variable/parameter is split into a pair of the global item and a generated global variable that takes care of the conversion. References to the variable/parameter are redirected though the variable.

[5] The parameters of the template are defined as the start of the sequence constuctor – their evaluation has the side-effect of setting a value in the appropriate parameter slot for interpolation later in the sequence constructor.

[6] Needless to say, new changes for an extra instruction can sometimes have deleterious effects on large swathes of other already-passed tests. Sometimes convergence is not as fast as hoped, but in the process problem understanding does increase.

[7] When the name contains an attribute value template, an alternative compElem instruction is used, where the first child is the name generator.

[8] The stylesheet says nothing about a person only having one name.

[9] Technically it isn't necessary to order at the compilation phase – Saxon-JS orders the template rules according to the rules on conflict resolution as it forms an internal Mode object. But it can help in debugging the compiler to see templates ordered correctly.

[10] This code still leaves an included static variable as its original definition – since we must have evaluated it, we could substitute the evaluated value.

[11] A map() form might be more coherent, but first designs have used separate variables.

[12] When the language is XSLT, the suite of supported functions expands to cover those only supported in XSLT. Currently this is known in the XPath compiler, but it would be possible to dynamically load a coherent map of function signatures covering both XSLT and user-supplied stylesheet functions.

[13] Whereas the complete (~18,000) QT3 tests could be run in O(5 mins), the XSLT tests must perforce take quite a lot longer, if only for the numbers of embedded XPath expressions within a stylesheet that must be compiled.. The ~800 Type category tests take O(5mins) suggesting slower by O(20x). So far we haven't run them all.

[14] Though this is exceptional useful when the trees are serialized in debugging.

John Lumley

jωL Research

Saxonica

A Cambridge engineer by background, John Lumley created the AI group at Cambridge Consultants in the early 1980s and then joined HPLabs Bristol as one of its founding members. He worked there for 25 years, managing and contributing in a variety of software/systems fields, latterly specialising in XSLT-based document engineering, in which he subsequently gained a PhD in early retirement. Rarely happier than when writing XSLT to write XSLT to write XSLT, he is currently helping develop the Saxon XSLT processor for Saxonica and consulting on various aspects of using XSLT.

Debbie Lockett

Software Engineer

Saxonica

Debbie Lockett joined the Saxonica development team in 2014 following post-doctoral research in Pure Mathematics at the University of Leeds. Her Ph.D and further research generally involved symmetries of infinite relational structures. Since moving into the "real" world of software development at Saxonica, Debbie has worked on performance benchmarking, developing the tools for creating Saxonica's product documentation, and the implementation of XQuery 3.1 features, as well as the development of Saxon-JS.

Michael Kay

Director

Saxonica

Michael Kay has been developing the Saxon product since 1998, initially as a spare-time activity at ICL and then Software AG, but since 2004 within the Saxonica company which he founded. He holds a Ph.D from the University of Cambridge where he studied databases under the late Maurice Wilkes, and spent 24 years with ICL, mainly working on the development of database software. He is the editor of the W3C XSLT specification.