Why?
For many problems in document engineering, exploiting the meta-syntax standard of
XML, and
the functional
and declarative tree-transformational language XSLT, furnishes
many advantages in both design efficiency, robustness and potential codebase reuse.
Over the
past 14 years the XSLT language has evolved through three major versions from an initial
simple pattern-matching transformation mechanism to a complete (homoiconic) language
with XML
trees as the principal data type, functional properties and a large suite of suitable
datatypes and extensive function libraries. Anyone working in document engineering
using XML
over that period would have tracked the changes in XSLT and often modified code to
take
advantage of the evolving functionality.
The author has been researching ideas for an extensible architecture for variable-data
documents (DDF
) over many years (Lumley2005, LumleyPhD, Lumley2013). This architecture was based heavily on
XML documents containing mixtures of subtrees from several different XML
dialects
– standards, such as XSLT, SVG and parts of XSL-FO, and elements and
attributive properties describing specialist layout and document-structure aspects.
Almost all the processing machinery for these documents was constructed in XSLT, in
two
principal areas. Firstly the effect of interpolation of variable data through sections
of the
document structure was processed according to XSLT semantics (much as the first phase
of
XSL-FO) by a compilation and execution mechanism. Secondly the layout interdependencies
(flows, constraints, text-layout etc.) were resolved by a top-down recursive process,
satisfying declared parent-children
geometric constraints, along with a system
of single-assignment presentational variables
.
The main implementation toolset to support this architecture was a set of about 45 XSLT files containing definitions for some 700 templates and functions[1]. The toolset was used mainly for a publication step, binding a document to some data, evaluating it and then generating a PDF snapshot of the SVG sections of the result for printing.
Until 2011 this toolset was written entirely in XSLT 2.0, but thereafter some sections
were converted to use features from XSLT 3.0, generally to increase the coherence,
and the
declarative nature
of the code. Specific examples include:
-
Using
xsl:iterate
to define the operation of a large pagination processor, rather than use of a small set of mutally recursive templates. (This will not be described in detail in this paper, but involves some 15+ choices to be made for each component to be allocated, and a couple ofaccumulated
variables, such asnext-y
position and current floating and kept components.) -
Describing the set of currently active
presentational variables
(already layed-out sections of content that can be reused in a similar fashion to XSLT variables) as amap{}
rather than a pair ofstack-frame
tunnelled variables. -
Making some of the processing code sections more generic by passing some of the functionality (e.g. sibling-relative geometric constraint solving) as a function item to a higher-order processing template, rather than having a larger set of similarly structured routines, each implementing the specific constraint solution for each case.
This paper is about reversing
automatically some of these syntactic and
semantic changes to accomodate running code that has migrated to XSLT 3.0 on a 2.0
platform,
for which the original target was a browser-based Saxon-CE SaxonCE
implementation. The conversion processor is itself written totally in XSLT. Many parts
of it
of course involve a degree of emulation and partial evaluation of XSLT within an XSLT
program
itself, as well as processing some actions that might normally be found within the
workings of
an XSLT processor – a fun time for those who relish quoting, self-reference, meta-abstraction
and multi-level recursion!
The XSLT standards
Since its inception in the mid 90s, the world of XML has been governed by standards. Originally attempting to regularise the extension of web pages, XML was developed as a meta-syntax for markup, aimed at using a strict tree-based representation of propertied element nodes containing sub-trees or text nodes. Very soon work started on developing XSL as a formatted and paginated alternative to HTML for documents with professional appearance. As we’re all well aware, the two aspects of XSL, formatting and variable document generation, split into two orthogonal standards – XSL-FO and XSLT. The latter developed, with XPath (and associated XQuery), into a full declarative XML-transformational language. In most recent versions XSLT/XPath/XQuery have become full functional programming languages, with XML trees as central data type.
Significant work under the auspices of W3C has developed and finalised these standards for XSLT/XPath/XQuery[2] through three major versions over the past 15 years. In each case a very comprehensive specification has been developed, reviewed, criticised and modified in cycles that are typically 3 years long and involve a few dozen contributors. Examples, test cases and experimental/operational implementations are all used to develop and finalise the specification, which is often followed by a further 3 years of polishing.
Once a standard (version) has been finalised, a degree of stability should then encourage developers of both implementations and applications to build and support software, without the language’s syntax or semantics altering. Of course it is exactly such full-scale use of a language that could expose shortcomings or new features that are needed to increase utility. The art of developing standards is to anticipate as much as is needed to get a useful and robust set of features that i) is adequate for significant application use but ii) not too complex for implementation or application.
XSLT 2.0 to XSLT 3.0
The major changes in XSLT (and XPath) between version 2.0 (2007) and version 3.0 (2014) are:
-
Support for streamed processing of large documents. This added a number of features to mitigate the need for
backtracking
through the source document in common design problems. Some of those were defined at the XSLT instruction level, such asxsl:stream
,xsl:iterate
andxsl:accumlator
; others were declarations associated with streaming-related properties, such asxsl:mode
; yet others were added to the XPath repertoire, such asmap{}
, and others to the function library (e.g.snapshot()
). -
Definitions for packaging of XSLT material to support non-source level delivery of resources, including mechanisms for visibility and overriding of within-package components and version-based selection.
-
Templates can match atomic items (including function types), rather than just nodes.
-
Dynamic error trapping and recovery mechanisms (
xsl:try/xsl:catch
) -
Dynamic evaluation of XPath expressions (
xsl:evaluate
). -
Support for higher-order functions and function-typed items in XPath.
-
Local variable bindings within XPath (
let $x := value return ....
). -
More support for
compile-time
, i.e. static, evaluation of stylesheets, such as shadow attributes and static declarations. -
Convenience additions to syntax and the core functional library, such as '
||
' being a string concatentation operator.
Why downgrade?
Two serious questions immediately arise:
-
If you have an operational program written in XSLT 3.0, why would you want to downgrade it to 2.0?
-
If you are designing to run on an XSLT 2.0 why would you want to write code in XSLT 3.0 ?
The first point is the main instigator for this work and is described below. As far
as
the second point is concerned, code written in 3.0 can be more coherent than the 2.0
equivalents. Indeed in the description of xsl:iterate
in the standard, it is noted that:
"There are two main reasons for providing the
xsl:iterate
instruction. One is that many XSLT users find writing recursive functions to be a difficult skill, and this construct promises to be easier to learn."
Moreover, the existence of a reasonable
semi-automatic transformer can
act as a stimulus to both developers and implementors, providing a means to write
test and
exemplar programs within 3.0 that themselves can be tested before full XSLT 3.0 processor
ipeentations are available.
Implementation code for parts of the variable document architecture mentioned earlier
had been upgraded to XSLT 3.0, to take advantage of the new features in increasing
coherence, robustness and re-usability of the code. (For example many new
parent-children
layouts, such as all-children-in-a-circle
,
could be added merely by defining the XPath function required to work out the positions
and
place each component of a sequence – all encapsulation and child-recursive evaluation
was
handled in common. Similarly the addition of maps simplified a stack-frame implementation
of
single-assigment presentational variables.) As mentioned earlier, the author's variable
document architecture was used principally to publish a set of instances of a variable
data
document bound to a sequence of data instances. As such it can be considered to be
a
server-side operation. In 2013 the release of Saxon-CE as a fully-compliant
browser-based XSLT 2.0 implementation opened the possibility of
looking at client-side and potentially interactive operation of the
document architecture. However, by this point some significant sections of the
implementation had migrated to XSLT 3.0, as well as altering or extending the semantics
of
some of the operations.
We now have the problem of two branches – the old 2.0 version, (which had not of course been updated to reflect changing semantics) and the now master 3.0 version. We could retrofit (and then develop both branches in parallel) but another option was worth exploring – could sections of the 3.0 code be converted to an operationally equivalent 2.0 code automatically? The answer is of course yes.
Assumptions
All this work reported here is based on the following assumptions:
-
The practitioner is (highly) skilled in XSLT2.0/3.0 ! (This is vital – when errors occur, and they will, tracking them down across a 2.0/3.0 mixture can be entertaining and require considerable expertise.)
-
The practitioner can act as an oracle as far as the source code is concerned. As an example, if string literals in XPath expressions can be guaranteed not to contain substrings that might be mistaken for some elements of XPath syntax (e.g. a variable interpolation such as
$some.var
), then various transformations can be performed using regular expression substitution, avoiding the need to carry out complete XPath expression parsing and modification of parse trees – a relatively expensive operation in a complete XSLT system. -
We are not attempting complete back-conversion. It is assumed that some heavily-used 3.0 features can be left to automated process, but others may require conditional code development by the programmer. Some features may require parallel development of 2.0 and 3.0 versions. Using
@xsl:use-when()
with version conditions can be exceptionally helpful to accomodate such changes. (For example the use of a higher-order architecture for layout processing described earlier, was conditionally retained for true 3.0 conditions, leaving 2.0 parallel code for the 2.0 case.) -
Some (optional) aspects of XSLT (3.0) have been ignored completely. These include schema-awareness and validation, any aspect of streaming, function-type items, higher-order functions, templates matching atomic items, dynamic error trapping (
xsl:try
), multi-thread declaration (xsl:fork
,xsl:merge)
, detailed serialization control, and so forth. -
Any support for packaging will be confined to
flattening
packages into resultant non-packaged stylesheets. -
Error tracking and trapping isn't of highest priority in the final runtime environment. For example there will be no equivalent of
xsl:try/catch
and compile/run source line numbers will likely be highly errant. -
Other aspects
outside
the stylesheet, such as unusual entry invocations (initial-function
,initial-context-item
) are ignored. -
Some aspects may differ in slight degree. For example preserving correct whitespace treatment may be difficult.
Note
Some of the examples shown, especially XPath expressions, are particularly simple,
possibly redundant or just plain daft, and may well have other shorter equivalent
forms
that would be perfectly legal in XSLT2.0. However they are short enough to comprehend
and
their complete treatment can hopefully be followed completely by hand
, by
the reader.
Similar work
There has been a small amount of previous investigation of adding XSLT features through
transforming XSLT. Of closest relevance is the work of Oliver Becker Becker2000, more than a decade ago, on an XSLT loop compiler. Additional
instructions in a separate namespace (loop:for
, loop:while
,
loop:do
, loop:last
loop:update
) can be added to an XSLT stylesheet to define iterative constructs,
which are then compiled
by an XSLT transform (some 500 lines, of which about
half is concerned with validation) into an equivalent using recursive named templates
and
parameters. As an example this re-working of a number-summing example from Michael
Kay's XSLT
Programmer's Reference:
<xsl:template match="/"> <xsl:variable name="list" select="normalize-space(.)"/> <xsl:variable name="total" select="0" /> <loop:while test="contains($list,' ')"> <loop:last> <xsl:value-of select="number($total) + number($list)" /> </loop:last> <loop:update name="list" select="substring-after($list, ' ')" /> <loop:update name="total" select="number($total) + number(substring-before($list, ' '))" /> </loop:while> </xsl:template>
is transformed into the following XSLT:
<xsl:template match="/"> <xsl:variable name="list" select="normalize-space(.)"/> <xsl:variable name="total" select="0"/> <xsl:call-template name="while-loop-d1e12"> <xsl:with-param name="list" select="$list"/> <xsl:with-param name="total" select="$total"/> </xsl:call-template> </xsl:template> <xsl:template name="while-loop-d1e12"> <xsl:param name="list"/> <xsl:param name="total"/> <xsl:choose> <xsl:when test="contains($list,' ')"> <xsl:call-template name="while-loop-d1e12"> <xsl:with-param name="list" select="substring-after($list, ' ')"/> <xsl:with-param name="total" select="number($total) + number(substring-before($list, ' '))"/> </xsl:call-template> </xsl:when> <xsl:otherwise> <xsl:value-of select="number($total) + number($list)"/> </xsl:otherwise> </xsl:choose> </xsl:template>
David Carlisle in 2006 (Carlisle2006) published a series of XSLT2 stylesheets that supported the manipulation of XQuery expressions, converting them into other syntaxes, including XSLT. This work employed an adjunct (Java-based) parser for XQuery expressions to produce an XML parse tree which was then processed by XSLT (a stylesheet of ~2000 lines) into appropriate mixtures of XSLT instructions and XPath expressions. For example, the XQuery:
declare variable $x := <a> <b>zzz</b> </a>; $x/b
is transformed into the stylesheet:
<xsl:stylesheet ....> <xsl:param name="input" as="item()" select="1"/> <xsl:output indent="yes"/> <xsl:variable name="x" as="item()*"> <xsl:for-each select="$input"> <xsl:element name="a"> <xsl:element name="b"> <xsl:text>zzz</xsl:text> </xsl:element> </xsl:element> </xsl:for-each> </xsl:variable> <xsl:template name="main"> <xsl:for-each select="$input"> <xsl:sequence select="$x/ b "/> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Embedded XML content in the XQuery expressions are mapped to function invocations that produce the appropriate sequence values.
With the introduction of XSLT 2.0, Dimitre Novatchev showed that it was possible to
implement higher-order functions (Novatchev2006) by exploiting uniquely
named elements to act as template references to function items. Each
function()
item is represented by two XSLT-defined functions (a zero-arity to
return the identifying element and the true-arity version to do the work) and a template
that
matches the unique template reference element and invokes the working
function.
Higher-order functions (e.g foldl()
) are written to use this
route-via-template
technique, and a complete set of higher-order features has
been implemented. Note that this approach requires all function items to be created
in XSLT
space – XPath local functions (function($args) {....}
) are not supportable
directly. The library developed during the work, FXSL, is very extensive.
Equivalent constructs
Whilst both XSLT 2.0 and 3.0 operate generally on the same basic data types, there
are
subtle differences in areas such as their environment interface (serialization,
unparsed-text()
, etc.) and the additional function
types in
XSLT3.0. This means that in general it will not be possible to write an
equivalent XSLT 2.0 program that behaves externally exactly as a defined 3.0 one.
But for many
practical cases there should be many sections of code that can. In this section we
discuss the
type of code transformations that seem reasonably susceptible. Most are in XSLT, though
a few
are in XPath, and others involve compile-time evaluation.
XSLT constructs
Many of the additional XSLT instructions were targetted to support streaming but have
other more general utility and may be susceptible to reversal
in certain
conditions. Those I have attempted so to engineer include:
xsl:mode
The mode declaration was added to support declaration of a mode as
streamable
, but also gave a very useful means to declare the default
behaviour of that mode, i.e. what should happen in the absence of any other template
matching a given node in push mode. (The default behaviour of text-only
was
too restrictive and many coding errors arose because of missing identity
transforms for a named mode.)[3] The declaration is of the following form:
<xsl:mode name="fred" on-no-match="shallow-copy" .../>
where default behaviours can be shallow or deep copy or skip, text-only-copy or failure. These declarations can be replaced with a small number of template rules, along the lines of those shown in XSLT3.0 Built-in Template Rules :
<xsl:template match="@*|node()" mode="fred" priority="-1000"> <xsl:copy> <xsl:apply-templates select="@*" mode="#current"/> <xsl:apply-templates select="node()" mode="#current"/> </xsl:copy> </xsl:template>
The priority is highly negative to ensure these have priority lower than any true
templates, but higher than the built-in
rules (text-only-copy) of XSLT 2.0.
In practice with stylesheet importation this substitution needs to be placed below
the
lowest level of importation precedence as well, which suggests a specialist stylesheet,
containing all the mode declaration expansions, should be added to
the importation tree at suitable low level. Use of xsl:apply-imports
(rare in
my experience) would complicate matters further. (Supporting the
@on-multiple-match
property is probably less useful and more dependent upon
the XSLT implementation being employed.)
xsl:iterate
The iteration instruction was added to support a 'move-forward' processing model for streaming with some forward transmission of accumulated and computed data. A very simple example, which adds chapter number attributes, might be:
<xsl:template match="/"> <xsl:variable name="today" select="current-date()"/> <xsl:iterate select="//chapter"> <xsl:param name="no" select="1"/> <xsl:copy> <xsl:attribute name="no" select="$no"/> <xsl:attribute name="date" select="$today"/> <xsl:sequence select="@*|*"/> </xsl:copy> <xsl:next-iteration> <xsl:with-param name="no" select="$no + 1"/> </xsl:next-iteration> </xsl:iterate> </xsl:template>
The iteration processes each item in the selected sequence in turn, returning the
evaluation of the sequence constructor for this item as context and potentially modifying
values of parameters for the processing of the next and subsequent members. Children
xsl:next-iteration
, xsl:break
and
xsl:on-completion
provide means to invoke controlled continuation, early
exit and completion postlude and tidying.[4] .(Of course there are many extant mechanisms to do this particular example in
XSLT 2.0 or even XSLT 1.0, not least xsl:number
, but a simple example helps
to explain the equivalence.)
For most cases an equivalent structure for XSLT 2.0 involves use of a recursive named template, which for the previous example could look like:
<xsl:template name="d3e14.iterate-0"> <xsl:param name="iterate.sequence"/> <xsl:param name="no"/> <xsl:param name="today" tunnel="yes"/> <xsl:variable name="head" select="$iterate.sequence[1]"/> <xsl:variable name="tail" select="subsequence($iterate.sequence,2)"/> <xsl:for-each select="$head"> <xsl:copy> <xsl:attribute name="no" select="$no"/> <xsl:attribute name="date" select="$today"/> <xsl:sequence select="@*|*"/> </xsl:copy> <xsl:call-template name="d3e14.iterate-0"> <xsl:with-param name="iterate.sequence" select="$tail"/> <xsl:with-param name="no" select="$no + 1"/> </xsl:call-template> </xsl:for-each> </xsl:template> .... <xsl:template match="/"> <xsl:variable name="today" select="current-date()"/> <xsl:call-template name="d3e14.iterate-0"> <xsl:with-param name="iterate.sequence" select="//chapter"/> <xsl:with-param name="no" select="1"/> <xsl:with-param name="today" select="$today" tunnel="yes"/> </xsl:call-template> </xsl:template>
Text value templates {}
Text value templates are a syntactic convenience to interpolate XPath expressions
within result tree text nodes. For example if this element has its closest
ancestor::*[@expand-text]
defining the tag to a true value
(yes|1|true
):
<weekday>Today is {format-date(current-date(),'Fn')}</weekday>then it could could be replaced by an equivalent construct:
<weekday> <xsl:value-of> <xsl:text>Today is </xsl:text> <xsl:value-of select="format-date(current-date(),'Fn')"/> </xsl:value-of> </weekday>
xsl:evaluate
Various vendor extensions to support dynamic XPath resolution have been standardised
in XSLT3.0 with the xsl:evaluate
instruction that takes a string defining an
XPath expression, a possible context item, a set of possible parameters that can referred
to within the expression and some optional contextual information such as namespace
bindings. The error-free result is a sequence of items resulting from evaluating that
given expression on the context.
Whilst it is highly unlikely that this instruction can be emulated to full
specification, there are two possible routes. The most effective, if the chosen target
XSLT implementation supports one, is to map into one of these vendor extension functions.
For example saxon:evaluate()
can support much of the functionality, with
suitable mappings for parameters and minor modifcations of the presented XPath expression
string[5].
A last-ditch
possibility is to implement an XPath interpreter in XSLT
2.0. Using a full XPath parser, implemented in XPath 2.0 (see
later) we can generate a full parse tree. A simple recursive interpreter matching
the XPath grammar productions can generate a sequence of result items by working its
way
through the tree. Requests for built-in functions (e.g. FunctionCall
name="sum"
) are mapped to calls to the appropriate built-in, having pre-evaluated
the argument subtrees. This mechanism can work, but is unsuprisingly very very
slow.
XPath expressions
Whilst most focus has been on XSLT instructions, some XPath 3.0 expressions might also be susceptible to reversal in certain cases. In the most general case substitution will involve accurate and full parsing of an XPath expression to its parse tree. In restricted cases (that the developer could indicate), regular-expression substitution on the XPath expression strings might be sufficient. Here are some possibilities:
let
Local variable bindings in XPath expressions are supported through let $var :=
expr0 return expr1
where
$var
is now in-scope in expr1
. (The
for
directive will be similar for singleton sequences, but let
preserves sequence values.). In simple cases, where the let
s are nested from
the outside, it may be possible to convert to an equivalent set of XSLT variables.
For example:
<xsl:value-of select="let $i := (1 to 6), $avg := avg($i) return $i[. gt $avg]"/>could be replaced by:
<xsl:value-of> <xsl:variable name="i" select="1 to 6"/> <xsl:variable name="avg" select="avg($i)"/> <xsl:sequence select="$i[. gt $avg]"/> </xsl:value-of>
where we've effectively pulled the let tree
into a series of local
bindings in the XSLT space. When we can place these variable bindings within an entirely
local context (i.e. the sequence constructor of the xsl:value-of
) then we
will not risk name clashes. When the let
is buried below other constructs
(e.g. a for
) then we cannot use this technique, as XSLT instructions cannot
appear within XPath expressions, and we must look to further expansion of the XPath
tree
into the XSLT space. This is described further in section “Manipulating mixed XSLT/XPath trees”.
map
Maps were again added to support dynamic collection and storage of data through a
streamed processing operation. A common use of map{}
is to associate a series
of keys to values that are sequences of items (i.e. item()*
). For example in
the author's work on document layout a tunnelled variable $lay:variables as
map(xs:string,item()*)
contained bindings to named sections of layed-out
components, that could be reused or examined (e.g.
$lay:variables('background')
might give all the items in the page
background).
There are a number of possibilities for representing maps in certain (limited) circumstances. In the case where maps are isolated, i.e. attached to variables and not mixed with other items in a general sequence[6], then a possible scheme is to emulate the map with a key-and-value stack of the following general form:
where the <binding/>
elements are reserved forms and clearly
distinguishable from normal
content. On the left foo
binds to a sequence of length 3, bar to one of length 2 and
charlie to an empty sequence. After some update (i.e. an addition
to the map in some more local context) shown on the right, foo now
binds to a doubleton sequence, in this case result tree nodes that were bound to the
variable $elements
.
By searching for the last binding for a given key we can
simulate map member updating
, whilst still preserving locality of scope.
(As all variable bindings in XSLT/XPath are single-assignment with variable name
overriding strictly constrained to following-sibling::*/descendant-or-self::*
situations, any local addition to a map is effectively copy-and-add
which
only has effect in the local scope.) The length attached to the
binding is used to extract the appropriate number of values.[7].
To implement map lookup for this scheme we substitute the interpolation
$some.name(
name)
by a call to the
(stylesheet-defined) function
map:get($some-name,
name)
[8] which looks up the last binding entry in $some-name
for the given
name (hence preserving locality of scope) and then returns the appropriate subsequence
of
the stack from immediately behind the binding element[9].
This approach has some limitations that may be important depending upon the
application. Most derive from the fact that the representation is not a single item
(which
a map(*)
is) but a sequence. This means it cannot be treated as an
item by some other manipulation – it will produce incorrect results for
functions and operators that do inspect items, such as count()
,
is
and instance of
[10]. The function empty()
will be unable to distinguish between an
empty map and no map (i.e. exists($map) and map:size($map) = 0
would be
indistinguishable from empty($map)
). Equally well this scheme cannot
represent a sequence of maps. In the author's limited experience most
conventional
use of maps rarely involves such existential manipulation of
arbitrary maps. Perhaps mostly simply, this scheme only works for interpolations of
keyed-values from maps.
An alternative method, which separates the stack from the keys, requires binding to
two separate XSLT variables[11] e.g. xsl:variable name="
name"
as="map(*)
→ xsl:variable
name="
name.stack" as="item()*
,
xsl:variable name="
name.keys"
as="element(map:binding)*
. Now the bindings contain an offset and length into the
stack. This approach can be more efficient in lookup, but is significantly more restricted
in the situations in which it can be deployed.
Maps in XSLT 3.0 can be created through three possible mechanisms: XPath syntax
(map{ key : value,...}
)[12], XPath constructor functions (map:entry(), map:merge()
) or XSLT
instructions (xsl:map
, xsl:map-entry
). How and to what extent,
these features might be simulated in 2.0 is discussed
later.
Arrays, being introduced in XPath3.1, which can store mixed arrays of items (including
sequences of items), could be emulated in a similar manner, using a reserved marker
element between array members, and defining emulations of the constructor and accessor
functions (array:get()
etc.). Restrictions on existential
manipulation of these emulations are similar to those for map()
.
Convenience functions and operators
XPath 3.0 has added a number of convenience functions and operators (XPath.FO), again often to support streaming, but which are of more general utility and can increase the coherence of code. Some of them include:
-
head()
andtail()
– reducing a sequence. This can of course be replaced by a number of equivalent XPath forms, but the most coherent is to use equivalent XSLT-defined functions:<xsl:function name="f:head" as="item()?"> <xsl:param name="seq" as="item()*"/> <xsl:sequence select="$seq[1]"/> </xsl:function> <xsl:function name="f:tail" as="item()*"> <xsl:param name="seq" as="item()*"/> <xsl:sequence select="subsequence($seq,2)"/> </xsl:function>
where
f:
is bound to some reserved namespace. -
innermost()
andoutermost()
– producing lowest and highest ancestry nodes. Again these are most simple supported with equivalent XSLT-defined functions:<xsl:function name="f:innermost" as="node()*"> <xsl:param name="nodes" as="node()*"/> <xsl:sequence select="$nodes except $nodes/ancestor::node()"/> </xsl:function> <xsl:function name="f:outermost" as="node()*"> <xsl:param name="nodes" as="node()*"/> <xsl:sequence select="$nodes[not(ancestor::node() intersect $nodes)]/."/> </xsl:function>
-
A number of exisiting 2.0 functions have reduced argument forms added, such as
string-join($seq)
which defaults the second (joiner) argument to a zero-length string. Again such invocations (if detected) can be linked to a simple currying function:<xsl:function name="f:string-join" as="xs:string"> <xsl:param name="arg1" as="xs:string*"/> <xsl:sequence select="fn:string-join($arg1,'')"/> </xsl:function>
where
fn:
is bound to the normal XPath function namespace.
||
(string concatenation)
This is defined as a binary operator such that $a || $b
is equivalent to
fn:concat($a,$b)
. Thus if the associativity can be analysed such operations
can be replaced by calls to concat()
.
!
(simple map)
This permits sequences of general items to be processed in an analgous manner to
/
with sequences of nodes. Thus (1 to 5)! (. + 10)
is almost
equivalent to for $i in (1 to 5) return ($i + 10)
, though technically the
context focus for the right hand operand of !
is set to each of the left in
turn; this would have effect on context-defaulting
functions such as
name()
which would need to have default arguments added.
Compile-time modification
Some features are aimed at evaluating or modifying the XSLT statically at compile
time,
with a @static
property being supported on global variables and parameters. For
many of the situations envisaged, the 2.0 program will be projected from the master
3.0
version under conditions which are effectively static. Thus many of
these components may be pre-evaluated. In some cases, where the static variables are
used to
control conditional code use, this might be essential.
As an example default push-mode behaviour for a given mode might be selected statically through a toggle and a mutually exclusive pair of declarations:
<xsl:variable name="process-all" select="false()" static="yes"/> <xsl:mode name="process" on-no-match="shallow-copy" use-when="$process-all"/> <xsl:mode name="process" on-no-match="deep-copy" use-when="not($process-all)"/>
Clearly, provided XPath expressions can be parsed and evaluated statically, this choice can be predetermined. (In fact it has to, otherwise conflicting templates at the same priority will be placed in the resulting stylesheet.) In a similar manner, XSLT provides shadow attributes whose values can be computed statically. In this way the choice above could be rephrased as:
<xsl:variable name="process-all" select="false()" static="yes"/> <xsl:mode name="process2" _on-no-match="{if($process-all) then 'deep' else 'shallow'}-copy"/>
which is similarly capable of being evaluated during the program transformation.
Obviously to perform this evaluation we need to be able to evaluate XPath expressions
with associated variable bindings dynamically. Within XSLT3.0 this
can be acheived through the xsl:evaluate
instruction and is discussed further
in section “Conditional compilation”.
Transforming the code
The previous section has described some of the possible 3.0 constructs that might be reverse-engineered to 2.0 equivalents and what those forms might look like. In this section we describe how such alterations might be performed, i.e. what code would be needed to convert a candidate XSLT 3.0 stylesheet into a 2.0 version.
One great advantage with XSLT is that it is homoiconic, that is
programs are written in a syntax that is (one of) its main data types, XML[13]. Hence reading and generating XSLT from XSLT is comparatively easy, though
abstraction levels and quoting
problems can be tricky. However, a significant
proportion of an XSLT program's functionality is described in XPath expressions, which
at the
XML syntax levels are merely string values of attributes. For some simple cases some
dependencies can be analysed through clever use of regular expressions, an example
being
finding names of normal variable references when they don't appear within string literals.
However in general a parse tree for XPath expressions is needed. That requires an
extension
library, but such things are possible, especially with tools such as Gunther Rademacher's
REx
REx.
Note
The methods described in this paper assume that the source XSLT is syntactically and semantically valid. Addition of full error checking would complicate the code significantly and it is anticipated that correct operation of the 3.0 code would be tested before such conversions implied here.
The basic method of course is to employ push-mode template matching. I'll describe
the
approach for two XSLT instructions (xsl:mode
and xsl:iteration
),
text value templates, an XPath construct (map{}
) and static evaluation and
conditional compilation of the XSLT program.
Converting xsl:mode
Transforming the xsl:mode
instruction is relatively straightforward. A
template matching an xsl:mode
defining no-match behaviour has conditional
choices of content, derived effectively from XSLT3.0 Built-in Template
Rules, chosen dependent upon the @on-no-match
attribute value:
<xsl:template match="xsl:mode[@on-no-match]" <xsl:param name="priority" select="-1000"/> <xsl:variable name="node.patt">*|text()|comment()|processing-instruction()</xsl:variable> <xsl:choose> <xsl:when test="@on-no-match='text-only-copy'"/> <xsl:otherwise> <xsl:choose> <xsl:when test="@on-no-match='shallow-copy'"> <X:template match="@*|{$node.patt}|document-node()" mode="{@name}" priority="{$priority}"> <X:copy> <X:apply-templates select="@*" mode="#current"/> <X:apply-templates select="node()" mode="#current"/> </X:copy> </X:template> </xsl:when> <xsl:when test="@on-no-match='deep-copy'"> <X:template match="@*|{$node.patt}|document-node()" mode="{@name}" priority="{$priority}"> <X:copy-of select="."/> </X:template> </xsl:when> ...... </xsl:choose> </xsl:otherwise> </xsl:choose> </xsl:template>
The prefix X:
is bound to an aliased namespace that will become
xsl:
in the output. The priority is chosen, as remarked earlier, to be lower
than any priority found for that mode within the entire expanded stylesheet, but higher
than
the (text-only-copy) default rules. (It should be possible to find a reasonable minimum
priority through min(xsl:template/@priority) - 1
.) Text-only-copy behaviour is
the built-in default, so a request for that requires no substituted templates. (More
strictly, these defaults should be placed in the lowest precedence
position, which means in a separate stylesheet itself imported at lowest precedence,
i.e. as
the first import of the toplevel stylesheet.)
Converting xsl:iterate
Processing the xsl:iterate
instruction into the recursive template
declaration and xsl:call-template
instruction pair shown earlier is rather more
complex and needs to be a recursive process as of course the bodies of iterations
can
themselves contain further iterations.
Luckily using push-template programming supports such recursion naturally. The process has two parts: formation of the templates (as top-level declarations in the result stylesheet) for each of the iterate instructions found, and generation of the specific call.
There are several specific aspects of the xsl:iterate
instruction that must
be considered. Firstly, the only result(s) of the instruction arise from the evaluation
of
the sequence constructors contained within. Consequently these sequence
constructors need to be preserved in the resulting code. Secondly, the iterative nature
of
the instruction is performed by xsl:next-iteration
directives which potentially
modify transmitted (state) information through parameters – these will be converted
into
suitably parameterised xsl:call-template
instructions. Thirdly, extraordinary
exit and postlude processing needs to be supported. Finally, the whole
xsl:iterate
instruction operates within a local scope within which variables
can exist and whose values can be interpolated. Any solution must preserve such bindings,
including local bindings within nested iteration constructs.
The transformation process involves the following stages:
-
Find each top level stylesheet declaration (template, function or variable) that contains an
xsl:iterate
. For each of these generate a uniquely named template corresponding to each found iteration (which may be nested) as an additional top-level declaration. -
For each of these iteration instructions determine the in-scope XSLT variables and parameters, using using an XPath
ancestor-or-self::*/preceding-sibling::(xsl:variable|xsl:param)
(Global variables can be determined separately and allocated to$global.variables
). Then determine which variables are actually used in the parameter lists and body of the iteration. As all references can only be through XPath expressions, these will only occur in a small number of attributes within the body - principally@select
or@test
or attibute and text value templates ({...}
). These attributes and value templates can be found with a suitable XPath lookup. -
If it can be guaranteed that string literals within these XPath expressions do not contain sequences that could be read as variable references (i.e. of the form '
$Qname...
') then we can find the names of referenced variables through regular expressions:\$(\i\c*)
. We can then reduce the set of in-scope variables to just those needed and use this set to both generate the list of parameters to be added to the named template, and the parameter bindings to be set on the call[14]. Similar issues apply to scope for on-completion. The parameters are designatedtunnelled
to support pass-through access during recursion.When this restriction on string literals within the XPath expressions cannot be guaranteed, then full XPath parsing will be required. This is discussed in section “Expanding XPath Expressions”
-
The sequence to be iterated over is bound to a reserved parameter (
$iterate.sequence
)[15]. The body of the named template is then effectively:<xsl:variable name="head" select="$iterate.sequence[1]"/> <xsl:variable name="tail" select="subsequence($iterate.sequence,2)"/> <xsl:choose> <xsl:when test="empty($head)"> on-completion sequence constructor </xsl:when> <xsl:otherwise> <xsl:for-each select="$head"> body sequence constructor </xsl:for-each> ....
The body sequence constructor usually contains one or more
xsl:next-iteration
instructions which will have been transformed to :<xsl:call-template name="iteration.name"> <xsl:with-param name="iterate.sequence" select="$tail"/> <xsl:param ...../> </xsl:call-template>
which invokes the processing of the rest of the sequence with potentially modified parameter bindings. In the absence of a next-iteration directive[16], a simple tail call is added to the end of the
xsl:for-each
to support the default behaviour. In the absence of anon-completion
instruction, the choose can of course be replaced just by the for-each.xsl:break
instructions terminate the closest surrounding iteration leaving a possible completion component. In this case the break is replaced by either its sequence constructor or an interpolation of its@select
expression.
Converting text value templates
Interpolation of text value templates is controlled by @[xsl:]expand-text
attributes attached to ancestor elements. Processing these is straightforward, with
a
pre-emptive template setting a boolean state, and text nodes which contain such text
value
templates being processed by regular expression analysis:
<xsl:template match="*[@xsl:expand-text]|xsl:*[@expand-text]" priority="6"> <xsl:next-match> <xsl:with-param name="expand-text" as="xs:boolean" tunnel="yes" select="(@expand-text|@xsl:expand-text) = ('yes','true')"/> </xsl:next-match> </xsl:template> <xsl:template match="*/@xsl:expand-text|xsl:*/@expand-text" priority="6"/> <xsl:template match="text()[contains(.,'{')]"> <xsl:param name="expand-text" as="xs:boolean" select="false()" tunnel="yes"/> <xsl:choose> <xsl:when test="$expand-text"> <X:value-of> <xsl:analyze-string select="." regex="{$reg.AVT}" expand-text="yes"> <xsl:matching-substring> <X:value-of select="{regex-group(1)}"/> </xsl:matching-substring> <xsl:non-matching-substring> <X:text>{.}</X:text> </xsl:non-matching-substring> </xsl:analyze-string> </X:value-of> </xsl:when> <xsl:otherwise> <xsl:next-match/> </xsl:otherwise> </xsl:choose> </xsl:template>
Converting map{}
The major issue is whether a full XPath expression parsing is required. As hinted
previously, with some (oracle-provided) guarantees on the content of string literals
(or
more accurately the absence of certain forms in such strings), simpler mechanisms
using
regular expressions may suffice. The most important is that the regular expression
\$\i\c*\((\i\c*\))
can recognise such map interpolations.
The model for map equivalents suggested earlier, using a stackframe, requires two
different actions: interpolation from the map stack and adding values to the stack.
As
described earlier a map interpolation can be replaced with a function call referencing
the
stack frame: $some-name(
name)
→
map:get($some-name,
name)
. One implication
of this approach is that a map cannot be a value
in another map, i.e. we
cannot support anonymous maps – they must all have (variable) names within XSLT/XPath
scope.
How do we build the map construction features? The main technique is to try to convert
other constructors to a call on emulations of the map construction
functions, such as
map:entry(
$key,$value)
,
map:merge(
$maps)
and so forth.
When the value of the map is defined by XSLT instructions such as:
<xsl:variable name="map1" as="map(*)"> <xsl:map> <xsl:map-entry key="'one'" select="'First'"/> <xsl:map-entry key="'rest'" select="'Second','Third','Fourth'"/> </xsl:map> </xsl:variable> <xsl:sequence select="'1:',$map1('one')"/> <xsl:variable name="map1" as="map(*)"> <xsl:map> <xsl:sequence select="$map1"/> <xsl:map-entry key="'foo'" select="1,2,3,4,5"/> <xsl:map-entry key="'one'" select="'Primus','inter','pares'"/> </xsl:map> </xsl:variable> <xsl:sequence select="'2:',$map1('one')"/>
then a push-template implementation along the following lines:
<xsl:template match="xsl:variable[@as='map(*)'][*]"> <xsl:copy> <xsl:sequence select="@name"/> <xsl:attribute name="as">item()*</xsl:attribute> <xsl:apply-templates select="xsl:map" mode="#current"/> </xsl:copy> </xsl:template> <xsl:template match="xsl:map"> <xsl:apply-templates select="*" mode="#current"/> </xsl:template> <xsl:template match="xsl:map-entry[@select]"> <X:sequence select="map:entry(({@key}),({@select}))"/> </xsl:template>
produces a working form with suitable constructs:
<xsl:variable name="map1" as="item()*"> <xsl:sequence select="map:entry(('one'),('First'))"/> <xsl:sequence select="map:entry(('rest'),('Second','Third','Fourth'))"/> </xsl:variable> <xsl:sequence select="'1:',map:get($map1,'one')"/> → 1: First <xsl:variable name="map1" as="item()*"> <xsl:sequence select="$map1"/> <xsl:sequence select="map:entry(('foo'),(1,2,3,4,5))"/> <xsl:sequence select="map:entry(('one'),('Primus','inter','pares'))"/> </xsl:variable> <xsl:sequence select="'2:',map:get($map1,'one')"/> → 2: Primus inter pares
When maps are built with calls to the map constructor functions (e.g.
map:entry()
) then by providing suitable emulation functions in a
stylesheet-defined library, XPath expressions need not be touched. Some of the emulation
functions are as follows:
<xsl:function name="map:entry" as="item()+"> <xsl:param name="key" as="xs:anyAtomicType"/> <xsl:param name="value" as="item()*"/> <xsl:sequence select="$value"/> <binding name="{$key}" length="{count($value)}"/> </xsl:function> <xsl:function name="map:merge" as="item()*"> <xsl:param name="maps" as="item()*"/> <xsl:sequence select="$maps"/> </xsl:function> <xsl:function name="map:keys" as="xs:anyAtomicType*"> <xsl:param name="maps" as="item()*"/> <xsl:sequence select="distinct-values($maps[self::binding]/@key)"/> </xsl:function>
(Since our map emulation is merely a sequence, a sequence of such maps is already
a map,
so map:merge()
just passes through). Using these functions as interfaces
decouples the exact details of the representation from the actual use of maps.
When maps are defined in XPath map syntax map{ }
, we
need to modify the XPath parse tree. Ignoring tokens, the tree for map{ 1 :=
('fred',3+4), 2 := 'bert'}
is shown on the left:
Table I
<MapExpr> <Literal type="xs:integer" value="1"/> <Expr> <Literal type="xs:string" value="'fred'"/> <AdditiveExpr op="+"> <Literal type="xs:integer" value="3"/> <Literal type="xs:integer" value="4"/> </AdditiveExpr> </Expr> <Literal type="xs:integer" value="2"/> <Literal type="xs:string" value="'bert'"/> </MapExpr> |
<Expr> <FunctionCall name="map:entry"> <Literal type="xs:integer" value="1"/> <Expr> <Literal type="xs:string" value="'fred'"/> <AdditiveExpr op="+"> <Literal type="xs:integer" value="3"/> <Literal type="xs:integer" value="4"/> </AdditiveExpr> </Expr> </FunctionCall> <FunctionCall name="map:entry"> <Literal type="xs:integer" value="2"/> <Literal type="xs:string" value="'bert'"/> </FunctionCall> </Expr> |
so if we surround each pair of children with a function call to
map:entry()
and replace the MapExpr
element with
Expr
, we get the tree on the right that would have been parsed from
(map:entry(1,('fred',3+4)), map:entry(2,'bert'))
, which is the desired
representation for the initialised map. Back-conversion of the parse tree into text
correctly modifies the XPath expression. (If the representation of the map is something
other than a sequence of map:entry()
results, such as using
map-start...map-end
markers, then surrounding this construct with a
function call to map:merge()
will be adequate.)
Conditional compilation
To support conditional compilation (either through the @use-when
directive
or within shadow attributes), we need two activities: firstly to collect and detemine
the
values of all the (global) static variables and secondly to evaluate the XPath expression
of
the @use-when
or attribute value templates within shadow attributes, using
those bindings. It is possible for the values of static variables to be interpolated
within
@use-when
directives (which may be attached to static variable declarations)
or shadow attributes, so these two processes must be handled concurrently.
For both cases the most complex is to determine the values of the static variables.
Luckily they are defined to have their values determined only by
@select
attributes (i.e. XPath expressions, no XSLT instructions can
influence) and reference between static variables (which must all be top-level children
of
stylesheets and have unique names) is only permitted in a reverse direction. Hence
an
iteration across the top-level children of a stylesheet, evaluating any static variables
with possible variable bindings already accumulated into a map, evaluating the effect
of
static variables on the top-level trees and determining @use-when
effects, will
produce the statically evaluated top-level stylesheet children:
<xsl:template match="xsl:stylesheet|xsl:transform" mode="X:static" as="element()"> <xsl:copy> <xsl:sequence select="@*"/> <xsl:iterate select="*"> <xsl:param name="vars" as="map(*)" select="map{}"/> <xsl:variable name="conditional" as="element()?"> <xsl:apply-templates select="." mode="#current"> <xsl:with-param name="static.var.values" select="$vars" tunnel="yes"/> </xsl:apply-templates> </xsl:variable> <xsl:choose> <xsl:when test="$conditional/self::xsl:variable[@static='yes']"> <xsl:variable name="value" as="item()*"> <xsl:evaluate xpath="@select" with-params="$vars"/> </xsl:variable> <xsl:copy> <xsl:sequence select="@* except (@static|@use-when)"/> <xsl:attribute name="select" select="X:select($value)"/> </xsl:copy> <xsl:next-iteration> <xsl:with-param name="vars" select="map:merge(($vars,map{@name := $value}))"/> </xsl:next-iteration> </xsl:when> <xsl:otherwise> <xsl:sequence select="$conditional"/> <xsl:next-iteration/> </xsl:otherwise> </xsl:choose> </xsl:iterate> </xsl:copy> </xsl:template>
(The function X:select()
converts atomic items into suitable XPath
expression values, such as surrounding xs:string
with quotes.) Toplevel
children are processed with matching templates for @use-when
and shadow attributes[17]:
<xsl:template match="xsl:*[@use-when] | *[@xsl:use-when]" mode="X:static" priority="5"> <xsl:param name="static.var.values" tunnel="yes"/> <xsl:variable name="use-when" as="xs:boolean"> <xsl:evaluate xpath="@use-when|@xsl:use-when" with-params="$static.var.values"/> </xsl:variable> <xsl:if test="$use-when"> <xsl:next-match/> </xsl:if> </xsl:template>Shadow attributes (which can appear only on XSLT elements) can be detected with
xsl:*/@*[starts-with(name(.),'_')]
and any value templates
processed using string analysis – a properly named attribute with value is then substituted.
All these actions can take place in an early 'static processing' phase (which is still
effectively in the 3.0 space) before subsequent code substitutions are made.
During this phase it is also possible to project the consequences of
static
information that will be true in the execution context of the
transformed stylesheet. For example the built-in function
system-property(
property-name)
can yield
information about the implementation, such as the version of XSLT supported. This
is often
used within conditional code, such as use-when="system-property('xsl:version') =
'3.0'"
. In this case we anticipate that the transformed stylesheet will operate
under a regime where system-property('xsl:version')
has the value
'2.0'
, so if we replace such a function call with its expected value, the
conditionality can be projected during the transformation.
Expanding XPath Expressions
In the work reported so far we've mostly managed to short-cut determining XPath expression
properties by using regular expressions on the XPath string representations. (This
is one of
the reasons programmer oracle
powers are useful.) In the more general and
robust cases XPath expressions must be parsed fully to recover necessary information
or
generate correctly modified expressions.
Luckily Gunther Rademacher's REx parser-generator can produce
an effective parser for XPath 3.0 implemented entirely in XSLT (2.0). This can be
set to
generate an XML tree of the complete expression parse, whose element names correspond
to the
left-hand sides of the XPath EBNF productions. This tree can then be subjected to
some
post-processing (namespace remapping, collapsing of singleton leaf sub-trees, attaching
operators to suitable binary nodes, etc...) before returning the completed parse-tree
for
conversion
[18].
The compact parse tree can then be modified as required, most readily through a specialist
push-template mode. For example, in the conversion of XPath syntax map constructors
(map{}
) as described earlier, the following
template is sufficient:
<xsl:template match="MapExpr" mode="XPath.3to2"> <xsl:variable name="real" as="item()*"> <xsl:apply-templates select="*" mode="#current"/> </xsl:variable> <Expr> <xsl:for-each select="1 to (count($real) idiv 2)"> <FunctionCall name="map:entry"> <xsl:sequence select="subsequence($real,. * 2 - 1,2)"/> </FunctionCall> </xsl:for-each> </Expr> </template>
Other simple cases using this technique include string concatenation (||
)
where StringConcatExpr
→ FunctionCall name="concat"
.
In cases where XSLT instructions need to be introduced, as with let
at an
outermost level , we permit adding those XSLT instructions into the parse tree, in
this case
placing variable bindings as the sequence constructor of a suitably named XSLT
variable:
<xsl:template match="LetExpr" mode="XPath.3to2"> <xsl:apply-templates select="SimpleLetBinding" mode="#current"/> <X:sequence> <xsl:apply-templates select="* except SimpleLetBinding" mode="#current"/> </X:sequence> </xsl:template> <xsl:template match="SimpleLetBinding" mode="XPath.3to2"> <X:variable name="{@var}"> <xsl:sequence select="@as"/> <xsl:apply-templates select="*" mode="#current"/> </X:variable> </xsl:template>
After modifications are completed, the parse tree can be projected. XPath elements
will be
flattened to equivalent XPath expression strings, XSLT instructions will remain. In
the
absence of any XSLT instructions the string will usually be placed in the XPath-holding
attribute (e.g. @select
). When XSLT instructions are present the result will be a
mixed sequence, placed as the sequence constructor of the enclosing element, within
which
string items (representing XPath expressions) will be placed in the @select
attributes of xsl:sequence
instructions, and the XSLT instructions will stand in place[19].
Manipulating mixed XSLT/XPath trees
In the previous section I've discussed some initial manipulation of a parse tree which is principally XPath, but is transformed into a mixture of XSLT and XPath. Along similar lines to that used in Lumley2014 this can be generalised to manipulating a tree that contains arbitrary mixtures from both languages. For example consider the fragment:
<xsl:for-each select=" let $avg := avg(books/@pages) return books[number(@pages) gt $avg]"> <too-big> <xsl:value-of select="@title"/> </too-big> </xsl:for-each>
where the presence of the let
in the for-each selection means that the
variable assignment must be performed within the XSLT space in 2.0, whereas of course
in 3.0
it is defined in XPath context. Figure 2 shows the type of process that
will be required. The first tree shows the parse of the XSLT instruction and its constituent
XPath trees, placed as immediate children of their carrying instructions. The left-hand
let
performs the select role for the for-each; the
right-hand tree (a too-big
element) will be evaluated for each selected item
collectively yielding the sequence value of the for-each construct.
At stage 2 we have started replacing the let
with a sequence of an XSLT
binding of the local variable (avg
), followed by a interpolation of that value
within the predicate. As this is encapsulated in a sequence constructor, name locality
is
preserved. But of course this is not correctly executing XSLT - this new item should
behave in
the select role for the for-each
. In the third tree we have
lifted this above the for-each
, binding its value to a unique XSLT variable
(which is typed item()*
) and interpolating its value in the for-each
selector. Now what we have is legal – there is no XSLT embedded within XPath. Thus
flattening
the XPath trees back into carrying attributes (@select
, @test
) will
yield a correct XSLT2.0 construction.
Similar methods may be needed for xsl:apply-templates
,
xsl:choose/xsl:when
, xsl:if
and attribute value templates. The
main test is the presence of a buried xsl:variable
within an XPath expression
tree after first stage modification.
In a similar manner sometimes XPath constructs need to be lifted to XSLT space because of descendant variable bindings. Figure 3 shows the process for
<xsl:template match="/"> <xsl:for-each select=" for $b in books return let $hasAppendix := exists($b/appendix) return if($hasAppendix) then $b else ()"> <appendix> <xsl:value-of select="@title"/> </appendix> </xsl:for-each>where a
let
is buried within an XPath
for
expression effectively in the sequence constructor
role. Following the let replacement as shown before, it is replaced by a sequence
of an XSLT
variable binding and an interpolative use within an if
test as shown at stage 2
of Figure 3. But it is still under an XPath construct (the
ForExpr
) which is acting in the select role for the
xsl:for-each
.
Migrating upwards, whilst still preserving the for loop behaviour, can be achieved
by
converting the ForExpr
from XPath to XSLT space, and as before binding it into a
unique variable, whose declaration is promoted in front of the surrounding use. The
resulting
final XSLT is:
<xsl:template match="/"> <xsl:variable name="ForExpr.d274e8" as="item()*"> <xsl:for-each select="books"> <xsl:variable name="b" as="item()" select="."/> <xsl:variable name="hasAppendix" select="exists($b/appendix)"/> <xsl:sequence select="( if($hasAppendix) then $b else () )"/> </xsl:for-each> </xsl:variable> <xsl:for-each select="$ForExpr.d274e8"> <appendix> <xsl:value-of select="@title"/> </appendix> </xsl:for-each> </xsl:template>
Inclusions
Stylesheets often include resources from other stylesheets, using xsl:include
and xsl:import
redirection instructions. For most of this work, these are
not flattened, but used as links to discover the programmatic reach of
the top-level stylesheet. An equivalent 2.0 tree of files is generated with the same
relative
naming structures retained. This is essential to preserve import precedences.
Testing
It is helpful to know whether this program transformation technique works. One might consider using one of the extensive test suites developed for XSLT 3.0, but these tend to be very small stylesheets focussed on one particular aspect of XSLT semantics. Better is to compare the outputs of an XSLT3.0 stylesheet operating on a given input with the result from the transformed program, expecting of course deep-equality in the resulting XML. A more extreme test possibility is discussed in the conclusion.
Since the converter is supposed to convert 3.0 features into 2.0, then perfectly legal
2.0
facilities should remain untouched. In practice, since a 2.0 stylesheet is assumed
to be
legal
XSLT 2.0, then the bodies of such stylesheets are not processed, but
passed through unchanged.
The original stimulus for this work was to convert a 3.0-based document layout system into one that would run on the 2.0 environment of browser-based Saxon-CE. Indeed, this was successful, enabling admittedly simple documents to be processed in both a server-side (3.0) context and a browser-based (2.0) one.
To illustrate this more directly, Figure 4 and Figure 5 show runs of the same document source code
using the original XSLT 3.0
version of the layout library (comprising some 30 different files), and its 2.0 modified
version. The document not only shows various forms of layout, for which their original
support
code requires both XSLT 3.0 (xsl:iterate
, xsl:evaluate
) and XPath
3.0 (map{}
...) facilities, but also self-examines the library and reports on XSLT
versions and features found.
Here we can see that the vast majority of the layout library stylesheets are defined
in
XSLT3.0 and there is use of both XSLT 3.0 instructions and XPath 3.0 operators. (Figures
for
the XPath operators are the number of attributes (@select
etc.) which contain one
or more of that type of operator.) Having processed the layout library through the
XSLT 3 to
2 converter, we can again process the source document using the modified (XSLT2.0)
library and
get the following output.
The document reports that it was processed in XSLT2.0 mode, by the Saxon HE processor
(which supports no XSLT3.0 features). The layout sections to the left produce the
same result
as in the 3.0 evaluation. The inspection
element reports that all stylesheets
used were declared to be in XSLT2.0 and that no XSLT3.0 features were found in that
set. The
extra
five stylesheets are accounted for by the support library, added by the
converter, which provide various emulation functions and XPath parsing and evaluation
support.
Support for the dynamic XPath evaluation used within the left hand traffic
lights
layout is very expensive indeed here, involving full parsing and emulation of
document-borne XPath expressions entirely by XSLT2.0 code. (If a built-in evaluator
is
available, such as saxon:evaluate()
, almost two orders of magnitude improvement
in speed can be expected.)
Conclusion
I've shown that it is possible to convert some aspects of XSLT 3.0 syntax and semantics
into equivalent XSLT 2.0 code, using only XSLT stylesheets as the conversion mechanism.
In the
hands of experienced XSLT programmers, who can provide oracle
control
information, these tools can be used to automate considerable sections of such conversion.
The
remainder can be accomodated by alternative conditional code sections controlled by
@use-when
directives, which can be retained or removed by the converters during
static evaluation. (Recall that the converter knows
that the target
xsl:version
will be 2.0
, so it can evaluate accordingly during
conversion.)
Some specific lessons learned include:
-
With a full XPath parser, it's often actually a great deal easier programmatically, using a small sequence of templates, and certainly much more robust, to carry out alterations to XPath expression strings on the parse trees themselves, rather than employing regular expression modifications discussed earlier.
-
Relativity between imported and included stylesheets that are modified during conversion needs to be managed very carefully, to retain correct precedence semantics. This has proven to be especially the case in the
self-processing
example discussed below. -
Namespace management also needs much care, to ensure that namespace references within attribute values are retained. Again that became critical in the converter
self-processing
, where innocuousexclude-result-prefixes="xs math"
declarations raised many problems. -
Processing XSLT with XSLT is such fun.
Can it process itself?
I stated earlier that the converter code was written in XSLT 3.0. This of course begs the question whether it can generate a 2.0 version of itself. Obviously this depends upon the complexity of the code written. At the time of writing, the some 1500 lines appear to use only the following 3.0 features:
-
The string concatenation operator (
||
) reasonably heavily, mainly to reduce clutter in forming expressions. -
Text value templates (
{...}
) to reduce code size. -
map{}
to hold bindings of static variables to support processing of static features such as@use-when
. -
xsl:mode
for every one of the half-dozen modes employed. -
xsl:iterate
to handle and track variable bindings, mostly for static resolution and condiitonal compilation. (Interestingly, conversion ofxsl:iterate
itself does not at present involve use of iteration, though perhaps the view of variable scoping is over-generous). -
xsl:evaluate
to evaluate static variables for substitution and processing of@use-when
directives.
Of these features, string concatenation would require full XPath parsing, as it uses
operator associativity extensively and is thus very unsuitable for regular expression
handling. (A refactoring to use concat()
is probably overdue – actually they
aren't used in very complex situations.) The use of map{}
is comparatively
simple, but does involve functional constructors. The most thorny issue is use of
xsl:evaluate
. In (commercial) Saxon 2.0 implementations there is a
saxon:evaluate()
function that can be be exploited, with suitable variable
bindings as discussed earlier. Unfortunately this isn't a option in the current open-source
Saxon-CE[20].
Any system involving program generating program inevitably encounters quoting problems
-
how to distinguish real program from program to be written. In XSLT the
xsl:namespace-alias
declaration helps solve this issue, in this case by
declaring the X:
prefixed namespace to map into the XSLT namespace in the final
result. However if the compiler is processing itself, then input elements marked with
that
prefix need to be preserved
. This requires a remapping and
re-alisassing.
Full XPath parsing was added to the repertoire and the the system tuned by examining it operating on itself. The first goal was of course to generate a result that will compile successfully. Having reached that point, further tuning was required to get successful operation of the generated compiler on small tests, whose runs could be compared against benchmarks.
If this transformation is successful then of course the resulting 2.0 version of the
converter should again be capable of converting its 3.0 master copy into 2.0,
producing an identical copy of the converting stylesheet. Thus if
Convert3
is the converting stylesheet, then
let Convert2 := Convert3(Convert3); Convert2(Convert3) == Convert2
should yield the given equality.The author can report that this is now indeed the case[21]. Stability and idempotency has been tested successfully through checking that
Convert2(Convert3) = (Convert2(Convert3))(Convert3)
Alternatively this is best shown through the following diagram, where the final two generated versions of the converter are found to be identical:
Further possibilities
The list of assumptions (section “Assumptions”) excluded a number of areas that
might be interesting to explore. Streaming could be treated as a transparent operation,
but
of course there would be no performance guarantees and unless full streamability analysis
was undertaken, programs that would ordinarily be deemed unstreamable
would
still operate. Simple packaging based on static analysis, flattening, visibility projection
and overriding replacement is a distinct possibility: provided all stylesheet packages
are
available, the package processing is effectively a static operation [22] .
The most intriguing would be to investigate what sort of function-valued item behaviour
could be supported, by attempting to attach to Dimitre Novatchev's FXSL framework
(Novatchev2006). Most of the standard
higher-order functions
have implementations within the library, and static definitions of function items
can be
determined from XPath parses, so the issue is whether these can be converted to appropriate
template-reference elements and associated template and function
trios. Whilst this paper has described a useful model for maps in some circumstances,
a
system that could accomodate dynamic functions may support maps more accurately,
as
essentially they are functions over a finite discrete domain of key values.
Acknowledgements
Most of the impetus for this work came from the work of Michael Kay, Philip Fearon and O'Neil Delpratt when they released SaxonCE in 2011. Offering a possibility of trying my document system directly in a browser environment acted as the stimulus to looking at detailed program transformation of the by-then-extensive codebase. Little did I realise that a few years later I'd be working with them.
References
[Becker2000] Becker, Oliver. The XSLT Loop
Compiler
. [online] http://www2.informatik.hu-berlin.de/~obecker/XSLT/loop-compiler/
[Carlisle2006] Carlisle, David. XQ2XML: XML
syntaxes for XQuery
. [online] http://monet.nag.co.uk/xq2xml/
[SaxonCE] Delpratt, O'Neil and Kay, Michael.
Multi-user interaction using client-side XSLT
. [online] XML Prague 2013
proceedings, pp1–22.
http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf
[Lumley2005] Lumley, John, Gimson, Roger and Rees,
Owen. A Framework for Structure, Layout & Function in Documents
. Proceedings
of the 2005 ACM symposium on Document engineering. doi:https://doi.org/10.1145/1096601.1096615.
[online] http://www.hpl.hp.com/techreports/2005/HPL-2005-95R1.pdf
[Lumley2013] Lumley, John. Functional,
Extensible, SVG-based variable documents
. Proceedings of the 2013 ACM symposium on
Document engineering, pp 131-140. doi:https://doi.org/10.1145/2494266.2494274.
[online] http://dl.acm.org/citation.cfm?doid=2494266.2494274
[Lumley2014] Lumley, John. Analysing XSLT
Streamability
.
doi:https://doi.org/10.4242/BalisageVol13.Lumley01. August
2014. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[LumleyPhD] Lumley, John. Documents as
Functions
. University of Nottingham, PhD Thesis. June 2012.
[online] http://etheses.nottingham.ac.uk/2631/
[Novatchev2006] Novatchev, Dimitre.
Higher-Order Functional Programming with XSLT 2.0 and FXSL
. Proceedings of
Extreme Markup Languages, Montreal 2006.
[online] http://conferences.idealliance.org/extreme/html/2006/Novatchev01/EML2006Novatchev01.html
[REx] Rademacher, Gunther. REx Parser
Generator
. [online] http://www.bottlecaps.de/rex/
[XPath3.0] Robie, Jonathan, Chamberlin, Don, Dyck,
Michael and Snelson, John, Editors. XML Path Language (XPath) 3.0
. World Wide
Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/
[XPath.FO] Kay, Michael, Editor. XQuery and
XPath Functions and Operators 3.0
. World Wide Web Consortium, 08 April 2014.
[online] http://www.w3.org/TR/xpath-functions-30/
[XSLT2.0] Kay, Michael, Editor. XSL
Transformations (XSLT) Version 2.0 (Second Edition)
. World Wide Web Consortium, 23
January 2007. [online] http://www.w3.org/TR/xslt20/
[XSLT3.0] Kay, Michael, Editor. XSL
Transformations (XSLT) Version 3.0
. World Wide Web Consortium, 2 October 2014.
[online] http://www.w3.org/TR/xslt-30/
[1] There were some dozen additional Java classes for handling encapsulated operations
and
functions such as within-paragraph
text layout, or image size lookup, but
these aren't germane to this paper – all main layout was processed through XSLT
code.
[2] Whilst there are differences, for the rest of this paper, unless stated otherwise, the term XSLT or XPath is used to refer to any of the three.
[3] There have also been suggestions for this construct being used to declare default typing for a mode, or even acting as a wrapper around a whole suite of templates....
[4] An xsl:iterate
with none of these directives behaves as
xsl:for-each
, save that the latter could execute for all selected nodes
in parallel.
[5] Whilst in later versions of Saxon this is only supported in the commercial editions, which support XSLT 3.0 anyway, in the earlier (free) Saxon-B product, which supports XSLT 2.0 extensively, this extension function is available.
[6] Ironically, in the author's XSLT streamability analysis tool (Lumley2014), just such mixed sequences were used to return both XML trees and property information, the maps being filtered out for separate processing at a later stage.
[7] With the addition of map ids to the keys and a map-end
marker, it
might be possible to represent singleton values of map(*)
type within
such maps.
[8] It has the same arity as, but a different signature
(item()*,xs:anyAtomicType)
from, the standard
map:get(map(*),xs:anyAtomicType)
and shamelessly exploits the fact that
in XSLT2.0 the namespace associated with maps is not
reserved.
[9] The @length
property of the binding may appear redundant (search to
the next binding...) but the length of the sequence is known when the entry is created
and it makes retrieval of the key's associated value more efficient.
[10] Is is possible to write specialist filter functions such as
isMapItem()
which can differentiate between map entries and non-map
entries in such sequences. This is used in some of the processing of presentational
variables described in Lumley2013, but they are used in very
specific circumstances.
[11] The reader won't be surprised that this was the system used in the original 2.0 version of the layout processor, which became more coherent by using maps in 3.0
[12] At the time of writing, the map key/value association operator is being changed
from its initial draft string of
to :=
.
:
[13] Other examples of homoiconic languages include of course LISP, Prolog, Mathematica and Tcl, not forgetting lowest-level machine code.
[14] We make the assumption the original XSLT 3.0 is valid. If so, we should be able
to just look at the references to get the required variables. By filtering the
in-scope variable definitions by reference needs we can retain any type information
(@as
) on the generated parameters, which increases robustness and
possible performance.
[15] Placing these variables in a reserved namespace reduces risk of name conflict
[16] Technically if any tail position lacks a next-iteration directive then it behaves as if one exists, albeit with no modification of parameters.
[17] Strictly this analysis should be carried out on the expanded
stylesheet importation/inclusion tree, but the situation for a single stylesheet is
described for simplicity.
[18] This parser was used extensively as the basis of the author's XSLT streamability analysis tool Lumley2014.
[19] This mixed-sequence approach is only possible for XSLT elements which require exactly
one of @select
or a child sequence constructor, such as
xsl:variable
or xsl:sort
. Other situations involving XPath
expressions such as xsl:if
test="
expr
"
, where any element
sequence constructors carry result trees, would require closely-preceding temporary
variables to be set to an appropriate value and referenced from the test.
[20] It is possible to produce modified versions of the open-source
Saxon-CE with an extension function that can support an equivalent of
saxon:evaluate()
.
[21] Hard pounding this, gentlemen.
This activity exercises problems of
quoting, namespace preservation and URI relativity more acutely than most other XSLT
work.
[22] The XPath parsing code, wrapping around Gunther Rademacher's generated parser, is written in XLST3.0 package format and flattened to use in current non-packaged implementations.