Braaksma, Abel. “In pursuit of streamable stylesheet functions in XSLT 3.0.” Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). https://doi.org/10.4242/BalisageVol13.Braaksma01.
Balisage: The Markup Conference 2014 August 5 - 8, 2014
Balisage Paper: In pursuit of streamable stylesheet functions in XSLT 3.0
Abel Braaksma is an invited expert of the XSL and XQuery Working Group and is
creator and owner of Exselt, a
streaming XSLT 3.0 processor. Next to his XSL work for the Working Group he runs
a consultancy and outsourcing firm Abrasoft, specializing in data aggregation and XML in .NET
environments. He has over 15 years experience in XML and related technologies.
You can contact him about Exselt or XML, XSLT and C# / F# related inquiries. His
personal thoughts on technological challenges and XSLT in particular are
collected on his blog http://undermyhat.org.
With the XSLT Working Draft being in Last Call since December 2013 and the XSL
Working Group working hard to get the latest bugs fixed, it is only a matter of time
that the XSLT 3.0 Draft becomes a Candidate Recommendation, locked for changes and
suitable for implementors to adopt.
One of the bugs the Working Group received was about the inability to create
stylesheet functions that take streamable nodes as an argument. The group considered
the omission and decided to ask me to write up a proposal. We discussed several
iterations of the proposal until recently it was adopted and added to the internal
Working Draft.
This paper investigates the ways in which stylesheet functions can be made
streamable and why it is such a complex to task to make them so. It summarizes the
rules that have been adopted so far according to the public bug entry and shows the
possibilities it gives for stylesheet and package authors. While the impact on the
specification is minimal, the impact for authors of packages and stylesheet authors
in general is potentially big and opens up a whole world of new possibilities in
streaming.
This paper discusses very recent changes to the XSLT 3.0 specification that are
still under discussion in the related BugZilla bug entries. When the specification
is updated, rules laid out in this paper need updating as well. I will publish those
updates at http://exselt.net/papers
in DocBook and PDF formats.
Prerequisites
To read and understand this paper, a basic understanding of XSLT 3.0 and streaming
is desirable. For an introduction on streaming, you can refer to two earlier papers
in this series, Braaksma 2014a, about the Ten
rules of thumb of streaming and Braaksma 2014b, about
Streaming XSLT design patterns.
Disclaimer
This paper relies on information that can be found in the public XSLT Last Call WD at the time of this writing, and in some of the public XSLT
3.0 BugZilla bugs (see W3C BugZilla). While I am an invited expert for the XSL Working Group, I do not speak at
there behalf, and any thoughts I lay out in this paper are my own and are not
necessarily the thoughts of the XSL Working Group.
Changes to the Last Call Working Draft will be discussed through the same BugZilla
and
will ultimately result in the publication of a new version of the specification in
XSLT Latest Version. Where this paper refers to XPath, XPath functions and
operators or the XDM, it uses XPath 3.0, XPath and XQuery F&O 3.0
and XQuery and XPath Data Model 3.0.
Since the XSLT 3.0 specification is not final yet, it is possible that syntax or
semantics of constructs used in this paper change in the future or are dropped in
their
entirety. Large parts of this paper rely on discussions and conclusions of functionality
reported in the public section of BugZilla. Where the text of this paper deviates
from
the public Working Draft and where these changes are publicly available through
BugZilla, I will state so by using footnotes.
Introduction
Streamability analysis is a complex subject, yet the basics are, as is often the case,
relatively trivial: just write your stylesheet in such a way that it only uses the
child
axis, plus an occasional escape to node properties such as attributes on the ancestor
or
self axes, and you have essentially written a streamable stylesheet.
But in practice, it is not so trivial. Rules in the specification are complex and
even
hard-core spec-readers have trouble following them. They are meant for implementors
and
not necessarily for programmers or users, which is why these rules have to take care
of
every possible corner case and yet make sure that for general use, the rules work
with
the least possible surprises.
This paper tries to fill in the gap between spec-prose and tutorial. While there
currently are no tutorials on the subject of streamable stylesheet functions, this
paper
aims to explain the basics and several rather advanced concepts such as recursive
streamable functions, in a way that it becomes understandable for a larger
public.
A streamable stylesheet function is a function that can take a streamed node from
a
streaming input document, while still passing the tests for being guaranteed streamable.
Being able to write streamable stylesheet functions is import for package designers,
so
that they can create functions that work alike for streaming and non-streaming
scenarios.
This section will briefly discussed terminology used throughout this paper. This
overview does not try to be complete.
Guaranteed streamable: determines
whether a particular construct, instruction or declaration is streamable
according to the streamability rules in section 19 of the XSLT 3.0
specification. If it is, any processor that is conformant with the
streamability feature will be able to process your input using streaming,
that is, by reading the input in forward-only mode without keeping the whole
document in memory.
Posture: the posture of a construct
determines the state of a streamed node in the output of that construct.
Often, this will be the same as the input posture. It is used in
streamability analysis to determine whether a construct returns streamed
nodes and in what way. Many constructs are capable of returning streamed
nodes, but if all rules are followed, the usages of these constructs are
limited in such a way that they can only return a subset that is compatible
with streaming. The different postures are:
Grounded: the construct
does not return a reference to a streamed node. An expression or
instruction operating on a grounded posture is allowed to be
free-ranging. Examples: fn:currentDateTime,
fn:copy-of, xsl:copy-of, a
variable reference.
Climbing: the construct can
return nodes that refer climb the tree through the ancestor or
attribute axis. Examples: parent::foo,
author/@name, ...
Striding: the construct can
return non-overlapping nodes, typically only on the child axis.
Examples: any child-select expression, any match pattern,
fn:zero-or-one,
fn:outermost.
Crawling: the construct can
return overlapping nodes on a downward axis. Examples:
foo/descendant-or-self::bar,
./b//c.
Roaming: the construct can
return nodes that can be anywhere in the input tree. Examples:
following::node(),
ancestor::foo/bar,
preceding-sibling::price.
Sweep: the sweep of a construct
determines how the current read position of the input stream is changed as a
result of executing the construct. The different sweeps are:
Motionless: the read
position does not change. This is true for expressions that only
operate on grounded nodes or data, and for expressions that
request a property of a node or the ancestor axis[1]. Examples: fn:has-children(.),
@name, if(./name()) then ...,
ancestor-or-self::para/@xml:lang, any constant
and any expression not operating on a streamed node.
Consuming: the read
position changes in a forward-only direction. Any expression
that requires read-ahead is considered a consuming expression,
unless the current node is a childless node, such as text- and
comment nodes. Consuming constructs form the heart of
streamability analysis. Any construct may have at most one
consuming construct. Examples: fn:count(foo),
x/y, <xsl:value-of select="name"
/>.
Free-ranging: the construct
cannot be evaluated by using forward-only movement of the read
pointer of the input stream, for instance when evaluation of a
filter expression requires look-ahead. Typically, a free-ranging
construct is also roaming. Examples: foo[bar],
parent::author/book, any xsl:sort
on streamed nodes, @* | foo, if(a) then b
else c, fn:reverse(index).
Usage: the usage determines what an
operand of a construct does when it receives a streamed node as argument.
The following usages are defined:
Inspection: if the argument
is a streamed node, it will inspect the node, but not consume
the node, resulting in a motionless sweep (assuming no other
part of the expression or construct consumes the input). An
inspection operand is the only operand that can take a climbing,
motionless expression as its argument.
Absorption: if the argument
is a streamed node, it will consume the node, resulting in a
consuming sweep. Examples are arguments of fn:data,
fn:string, constructor functions,
xsl:value-of/@select,
xsl:apply-templates/@select and text value and
attribute value templates.
Transmission: if the
argument is a streamed node, the result will also be a streamed
node. Depending on the function or construct, this can result in
any kind of sweep. Examples are arguments of
fn:outermost, fn:remove,
fn:subsequence,
xsl:sequence/@select.
Navigation: if the argument
is a streamed node, there is no guarantee that calling the
construct will be streamable. Typically, a navigational operand
will result in free-ranging and roaming streamability analysis
results. Examples are arguments of
xsl:call-template (the implicit context item,
making it impossible to use this instruction with streamed
nodes), first argument of fn:fold-left, last
argument of fn:key, fn:innermost, any
untyped argument to a stylesheet function not specifically
marked streamable.
The combination of posture, sweep and usage determines whether a certain construct
is
guaranteed streamable or not. Constructs can be nested, and constructs have operands.
What exactly constitutes a construct and operands is outside the scope of this paper[2], except for stylesheet functions, which will be explained in the following
sections.
A case for streamable functions
Functions are a nice machinery in XSLT 2.0 that have surplaced most of the
xsl:call-template scenarios from XSLT 1.0. The benefit of being able to
create a function and call it directly inside an XPath expression or in a pattern
filter
has shown its use-cases. More often than not, functions tend to operate on atomic
types,
such as strings and integers, because the natural habitat for nodes processing is
done
declaratively by template matching. This is often called The XSLT
way of doing things, see Welker 2008, Gerstbach 2006 and Lenz 2005, and even today, this is still
often advocated on the XSL Mailing List[3] and on discussion fora such as StackOverflow[4] and Experts-Exchange.com. However, even advocates of using templates over
(recursive) functions, such as Dimitre Novatchev, sometimes choose a function over
a
template based scenario because it is clearer or because it is simply a better tool
for
that particular job[5], or in the words of Jeni Tennison in Tennison 2001:
“If the result follows the structure of the source, then
a push method is more natural — the source drives the process. If the result
has a substantially different structure from the source, then a pull method
is more natural — the result drives the process.”
It should be said: in XSLT 3.0, the natural way of doing things with nodes from an
input tree is to use template declarations. This is no other than previous XSLT
versions. Perhaps it is even stronger now that it is possible to apply templates on
atomic values such as sequences of strings as well[6]. But this does not help with complex patterns, which can only be replaced by
functions. Also, it does not help in scenarios where reusing a result in an expression
is important, which is where functions come into play: they can be used inside
expressions and call-template nor apply-template can do that (unless wrapped inside
a
function, that is).
The current working draft, which is in Last Call, does not have a provision for
streamable functions. That is, it is impossible in any which way to pass a streamed
node
to a function. Dimitre Novatchev's example from the StackOverflow question would
therefore not be possible in a streaming scenario. Now that more and more people have
gotten used to write functions, and with the advent of (precompiled) XSLT packages
containing lots and lots of library functions, it seems unfair that they cannot continue
doing so in streamable scenarios, nor does it seem fair that library vendors will
not
have any means to create library packages with functions that are
streaming-aware.
The XSL Working Group has considered this and as the solution of Bug 25679
shows, the proposal was considered and adopted into the internal working draft. The
following sections will explain my personal analysis on the situation and section “Streamability of stylesheet functions according to the specification” will summarize which parts of this analysis match
the specification. Since the discussion on this bug report is not final yet, anything
in
that section is very preliminary and may change without prior notice.
Stylesheet functions in streaming scenarios before they were allowed to be
streamable
In the current XSL Transformations 3.0 Last Call Working Draft, calls on stylesheet
functions are analysed based on the declared type of the arguments. If an argument
is
typed as an atomizing type, such as xs:string or xs:integer,
the usage of that argument will be absorption, in other
words, it will absorb a streamed node, which in most cases results in a consuming sweep. The result of such function calls is defined
as grounded. If the argument is untyped or is a type
that can take nodes, it is disallowed to pass a streamed node to the function call,
because there is no way of knowing statically what will happen with that node inside
the
function. I call this the safe bet: disallowing
streamed nodes as arguments makes streamability analysis easier. Example:
In this example, the function f:square takes an argument $i
that is of type xs:integer. Upon calling the function, the processor will
have to atomize the value or node. To do so, it must process all its children, which
is
why such a usage is considered consuming. In this
case, the element number will be atomized.
This is a typical way of writing functions and it works perfectly well with streaming.
However, if you want the argument to be a node and get some properties of a node or
process the node's children, it gets trickier, since passing a node to a function
is
disallowed with streaming. Consider the following:
In this example, the call to the function is not streamable. We pass on a node to
the
function and the rules in the current Working Draft state that it has usage navigation[7], which has the effect that the whole expression, in this case the match
pattern author[f:hasname(.)], will be roaming and free-ranging, in other
words, it is not guaranteed streamable.
It is possible to call the function in some situations, by using the
fn:copy-of or fn:snapshot functions. But that means that
users of your function must have enough knowledge to know what atomizing or grounding function to use before passing on an argument. In
this particular case, it would make little sense to use either of these functions,
because a match pattern must be motionless, so it is not possible to use the function
in
a match pattern predicate expression. This is essentially true for any function that
does not consume the input tree. With the current rules in place, even if a function
does not consume the input tree, you still need to create a copy of the tree to call
that function. Hardly efficient and in many cases impossible with streaming, because
the
input node may simply be too big to be copied.
Were you to use it in another context, you still have to choose between
fn:copy-of and fn:snapshot. This function would not work
with fn:copy-of, because that function does not copy the ancestor axis. In
other words, you must know the internals of the function to find out that you need
to
call it with a call to fn:snapshot. And even then, if the function requires
information from the original document, such as the base-uri or other properties that are not copied with either of those
functions, you are out of luck.
In essence, these rules disallow stylesheet authors to write a function that take
nodes as arguments, and in the case of library package authors, they will not be able
to
write functions that take nodes that work the same way in a streaming scenario and
a
non-streaming one. They would have to educate library function users how to use their
functions in a streaming scenario and many functions will never work in streaming
scenarios because their usage would always be consuming even though the function body does not actually consume the
input node. In other words, it makes writing functions for use in streamable stylesheets
next to useless.
The challenge of streamable stylesheet functions
It is often trivial to see at a glance that a function is motionless and that it would not hurt using it on a streamed input node.
Take, for instance, Code listing: get attribute of ancestor from the previous section. It only uses
the ancestor axis and tests whether one exists with an attribute @name.
Since walking the ancestor axis is motionless (the processor is required to keep a
stack
of the ancestor nodes and their properties) and the attribute axis is too, any call
on
that function would be motionless with respect to its argument.
This shows us one thing: it is possible to write functions that can be statically
analyzed to be motionless. But this function has
special properties: the body of the function is motionless and the result of the function is grounded. That means, it can never return any nodes, it can either
return true() or false(). But can we always statically
determine that that is the case? Let us write a slightly different function, this
time
we return the attribute node, instead of testing for it:
The function still takes a node as its argument, but this time it returns a reference
to an attribute of the ancestor axis of that node. The attribute axis has climbing posture, which is limited in that you cannot
navigate downwards again from it. In the first usage of this function, in the predicate
of the pattern, the input is the current node, but a predicate only has to be true
(nodes are there) or false (no nodes). After that, there is no navigation away from
the
climbing posture, so this could be considered
streamable.
If we take a look at the second example, we see there an
xsl:apply-templates. By definition, assuming the rest of your
stylesheet is guaranteed streamable, any apply-templates will be an atomizing construct:
templates must be grounded and therefore, passing on nodes to a template will atomize
those nodes. Just as we saw before, atomizing means that the node is consumed. In
streamability terms this means it has usage absorption. But to consume a node, its children must be visited, which
is a downward movement. The argument has a climbing
posture because it ends with f:hasname(.), which we know returns attribute
nodes, which are climbing.
There is a caveat, however: if the climbing
expression returns childless nodes, consuming such a node will not harm the streaming
process: there are no children to consume. The General
Streamability Rules[8] in the specification have a special provision for this: childless nodes with
a climbing posture are allowed in an absorption context.
So it can be made streamable, right? Wrong! For a construct to be streamable it must
be statically determinable to be so. Here, we have a function with an unspecified
return
type. If we had written as="attribute()", the processor would be able to
detemine that the returned nodes would always be childless and it could be used in
an
absorption context.
But this is only one example. What happens when the stylesheet function returns a
climbing node and we pass it a crawling expression? What happens if the function returns
crawling (overlapping) nodes and the input is
climbing? And how about recursive functions, or
within packages, with abstract or overriden functions?
Let us look at some possible approaches to tackle these issues.
Approach 1: analyzing stylesheet functions statically on their own
Arguably the easiest approach from both the user's point of view and for
implementors alike is to try to find a method to analyze the streamability of
stylesheet functions statically, without taking into account any possible context
the stylesheet function is called in.
Suppose you have a stylesheet function my:for-each which traverses
all nodes in a node set and returns a certain aggregate result. By its definition,
traversing the node set, it will consume the input tree, just like
xsl:for-each would. Writing such a function that it works equally
well for different kinds of input postures is hard, perhaps even impossible. A
function call like my:for-each(head//section) may have to traverse over
overlapping nodes, the expression in the argument is a crawling expression and looping over a crawling node set is not allowed in streaming because certain
buffering is required to cache the overlapping nodes. If the loop is not going to
consume the individual nodes, this is alright[9], but otherwise, it is not guaranteed
streamable.
For such a function to be statically analyzable without knowing what context it is
used in, it is allowed to traverse the input tree, but it is not allowed to consume
the individual nodes of the streamed input tree. A function that can potentially
work with any input posture, albeit climbing,
striding or crawling might look like the following:
This example loops over all the nodes, but does not consume the individual nodes,
it only requests a property of those nodes using the attribute axis, which is
motionless and its result is climbing. And a
motionless expression on either a climbing,
striding or crawling context is always allowed[10].
Even though this example is exemplary for showing the possibility of writing a
function that can take any posture, it limits our possibilities greatly. Suppose we
want to write an implementation of my:for-each that does consume the individual nodes in the streamed node
set, for instance as in the following example:
Whether or not this example would use the child axis within the
xsl:for-each or directly without the loop, for the streamability
analysis this does not matter. The child axis, if applied on a climbing or crawling axis, is
always roaming and has a sweep of free-ranging[11]. This means that the expression my:for-each2(child::book)
is allowed, but the expressions my:for-each2(listing//book) and
my:for-each(title/ancestor::book) are not, because the latter two
have arguments with crawling and climbing postures respectively and can therefor not be
consumed.
This leaves us with essentially two options for static analysis of streamable
stylesheet functions, without having to take the function calls into account:
Option 1: force stylesheet authors to
only write functions that are capable of dealing with any input posture.
This means that the body of functions must be motionless, or if they are
not, they must use a higher-order
operand[12] such as the shown xsl:for-each construct in the
example.
Option 2: limit the allowed posture
to only one posture, so that caller and callee are guaranteed to have
the same input posture, always. Most likely candidate is the striding posture, because it is the most
common posture when processing an input document. This would mean that
the argument that can take nodes always has its context posture set to
striding and that the only allowed
argument posture when calling the function must also be striding.
Both options have severe drawbacks, but they have the advantage of being very
clear-cut and easy to communicate or explain. As it turns out, the current approach
taken by the XSL Working Group, as explained in XSL Bug
25679, takes the approach of Option 2:
the result posture of a function must be striding
or grounded and the argument to a function must
also be striding or grounded. To accomplish this, the parameter that can take nodes,
which can at most be one parameter, will have a context posture of striding and is by itself motionless.
Approach 2: analyzing stylesheet functions from the function call
Radically different from the previous approach is to analyze the function's body
only when we actually know what the context is of the argument that takes a streamed
node. For this approach, we simply ignore analyzing the declaration of the function
on itself.
If we take the example from Code listing: streamable my:for-each failing with argument postures climbing and crawling and call it with a
striding argument, for instance
my:for-each2(*/books/book), the processor, upon encountering such
function call, takes the posture of the argument and sets the context posture of
$node-set to this posture, in this case: striding. With this posture, the processor can now analyze the
function body and return a result of that analysis, not surprisingly the same as in
the previous section: a consuming sweep and a
grounded result posture (because the function
does not return any nodes, the result is grounded).
If user were to call the function with a crawling
argument, say my:for-each2(*/books//book) (mark the extra slash), the
parameter $node-set is set to a context posture of crawling and the resulting analysis of the whole function
body will be roaming and free-ranging. Exactly as expected. And with a climbing argument, or even a roaming argument, the result would be the same.
Table I: Posture and sweep of my:for-each1 function, depending on argument
Argument posture
Function posture my:for-each1
Resulting posture for function-call
Resulting sweep for function call
Grounded
Grounded
Grounded
Motionless
Climbing
Climbing
Climbing
Motionless
Striding
Climbing
Climbing
Motionless or consuming
Crawling
Climbing
Climbing
Motionless or consuming
Roaming
Roaming
Roaming
Free-ranging
The advantage is clear, by postponing analysis of the function body until it is
called, the programmer has more flexibility in creating the function and the user
will get an error if he uses the function with an invalid argument. This is also the
disadvantage: without a clear rule how to call the function, the user may have
trouble understanding why one argument works and the other does not. The previous
table showed a function, my:for-each1, that works with any posture as
argument input (except roaming, but that posture
will always result in an unstreamable result, whether it is a user-defined function,
a build-in function or any other construct). The following table shows the drawback
of this approach, where only upon calling the function it becomes apparent that
certain postures are disallowed:
Table II: Posture and sweep of my:for-each2 function, depending on argument
Argument posture
Function posture my:for-each2
Resulting posture for function-call
Resulting sweep for function call
Grounded
Grounded
Grounded
Motionless
Climbing
Roaming
Roaming
Free-ranging
Striding
Striding
Striding
Consuming
Crawling
Roaming
Roaming
Free-ranging
Roaming
Roaming
Roaming
Free-ranging
As we can see, only one argument posture (bar grounded, which is always allowed), striding, is allowed with the function my:for-each2,
and its sweep is consuming. This is similar to our
expectations, as in the previous section, we already saw that this function only
worked with a striding argument posture. But in the
previous approach, for Option 2, we considered this
the only allowed posture for function arguments to stylesheet functions, which made
it pretty clear for end users. Here, the user does not know beforehand what the
argument posture can be, in fact, he needs intricate knowledge of the function body
to find out what posture is allowed.
While this approach allows more flexibility, it also introduces more room for
error, at least at the static development stage. Of course, the documentation of a
function could contain information of how it should be used, but who reads
documentation anyway?
Another advantage, though, is the potential that many existing functions might
"just work". In fact, quite some functions that can be found online are often short
and meant for a certain given input. As it turns out, if you take, for instance, a
look at the functions on www.xsltfunctions.com that can take nodes as their arguments, quite some
of them are streamable right out of the box if we would allow this approach.
Approach 3: analyzing stylesheet functions based on static posture assessments
If we try to combine the previous two approaches into one, and try to achieve both
static analysis of function bodies and clear information for the users of the
functions, we achieve that by decorating the functions with an extra attribute that
can take one or more postures. Let us call the attribute
argument-posture and add it to xsl:param, with at most
one parameter allowed to have that attribute (the challenge of multiple arguments
that can take nodes is discussed in section “Multiple arguments that can take nodes”). The value
for the attribute is set to be one or more of grounded climbing striding
crawling roaming. The postures grounded
and roaming are redundant, because the former is
always allowed and results in a motionless sweep
and the latter is never allowed, resulting in a free-ranging sweep.
If we take the previous example, my:for-each1 and rewrite it with
this additional decorating attribute, it looks like this:
This tells the processor that the programmer declares the function to be
streamable with each and every one of these postures. The processor can then
statically assess the function body with each posture one by one and report back
whether or not this assessment is correct or not. In this case, it would compile
successfully, because all postures yield correct results.
Still, you might argue, the different input postures do not necessarily mean that
the sweeps are all the same. True, but the sweep is something that can never be
statically determined for a declaration, as it is dependent on the context of the
caller. For instance, if the function is called with a motionless but striding argument,
the resulting sweep is also motionless, because the
xsl:for-each with a motionless
argument and its body containing a motionless
sequence constructor all together yield a motionless call to the function, even though it is striding. This is
true, for instance, for the following call:
This example shows the power of the current rules in the XSL Working Draft. The
existing rules allow a function call inside a predicate in a pattern. Because the
context item expression, (.) is motionless and takes the context posture as its result posture,
which in this case is striding, the result of
calling the function with this argument is also motionless and has a result posture of climbing (which follows from our table above). Finally, the result
must be atomized to be able to compare it to a string. For a climbing expression that returns an attribute, atomization is
allowed and motionless.
As we can see, this third approach gives us more flexibility, but the drawback is
that it requires programmers to understand what a posture is. One of the design
goals of the streamability rules is that they can be applied without knowing all the
internals, and bringing one of those internals, the posture of a construct, to the
surface, may not be desirable.
In the next sections we dive deeper in the analysis of stylesheet functions and
try to come up with a set of rules that works for different inputs, not only
striding, and that works also with recursion or
non-final packaged functions, subjects that have not yet been touched in this
section.
In pursuit of flexible rules for guaranteed streamable stylesheet functions
The previous section introduced three possible approaches. In the following sections
we will research these approaches further. In fact, the focus will be on Approach 2 above.
In the specification, under the rules on path expressions[13] and axis steps[14], a table shows what happens if the output posture of one part of the path
expression is the input posture of the other part of it. Stylesheet functions are
not
much difference. If we take the lessons learned in that section and we reproduce that
table here and adjust it slightly for stylesheet functions, we can deduct that the
following rules apply:
Table III: Posture and sweep of stylesheet functions
Argument posture
Function posture
Resulting posture
Allowed sweepa
a)
This is the highest allowed resulting sweep of the
function body resulting in the given result posture,
if the sweep is higher then the sweep of the
function body is effectively free-ranging.
b)
The term "direct-transitional" is not a posture, it
applies to constructs such as "$n/self::foo" or
"$n", where $n is the streamed
parameter. Such expressions are motionless and take the
posture of their input as their output posture.
c)
The current rules on streamability do not allow a
striding or crawling posture after a climbing or crawling
posture, the result of such analysis is always roaming and
free-ranging.
Grounded
Any
Grounded
Motionless
Any
Grounded
Grounded
Sweep of function body
Any
Direct-transitionalb
argument posture
Motionless
Climbing
Climbing
Climbing
Motionless
Climbing
Stridingc
Roaming
Free-ranging
Climbing
Crawlingc
Roaming
Free-ranging
Striding
Climbing
Climbing
Motionless
Striding
Striding
Striding
Consuming
Striding
Crawling
Crawling
Consuming
Crawling
Climbing
Climbing
Motionless
Crawling
Stridingc
Roaming
Free-ranging
Crawling
Crawlingc
Roaming
Free-ranging
Roaming
any
Roaming
Free-ranging
To determine the posture of a function body, it is easiest to consider the argument
that can take nodes similar to the context item expression ".". This
expression is motionless on itself, but if used with usage absorption it may become consuming
depending on the input posture of the expression or instruction it appears in, called
the context posture. For functions that means, it takes the
posture of the argument to that function, in other words, the context posture of the parameter that can take streamed nodes is the
posture of the argument to the function. Example:
The body of the function f:thirdchild has a posture striding because of the child axis step. The usage of the
function is transitional, which means it can return
nodes. Using the table above, we can lookup the result posture and sweep of any call
to
the function. The first use in this example is in xsl:apply-templates,
which we have seen before takes a usage of absorption, meaning that overlapping nodes are not allowed[15]. The operand here is a call to our function with the current node as
argument. Inside any template, by definition the context posture, and thus the posture
of the context item expression, is striding. The
table then gives us for argument is striding,
function body is striding, then result posture is
also striding and the sweep is consuming.
Looking at the second use of the function we see another
xsl:apply-templates, this time with a crawling path expression (the descendant axis is returns a crawling
posture, it can have overlapping nodes) as argument to the function. From the table
this
gives us for argument is crawling, function body
striding the result posture of roaming and a sweep of free-ranging. Even without the table we can see from the argument that
it returns overlapping nodes and that a processor will have to potentially look back
and
forward to find the third child of each overlapping node. If there is no overlap,
this
problem would not arise, but statically we do not know that, and the processor has
to
assume the worst.
Multiple arguments that can take nodes
With streaming, it is not possible to have one construct take multiple downward select
expressions, or more generally, multiple consuming expressions. Suppose we were to
allow
a function call to take multiple arguments that are streamed nodes, this rule would
backfire. An function call such as f:equals(foo, bar) has two downward
select expressions, foo and bar. Because there is no way the
processor can know in what order these appear in the input document, it is possible
that
foo appears before bar, or the other way around, which
means that the processor may have to look back to find the other child. This is the
general reason why multiple downward selects are not allowed in any single
construct.
We could argue that one argument can be a grounded
node and another a streamed node. But that does not help us much further, because
it
hits the same problems as with the non-streamable functions we showed earlier: how
can
we tell the user of the function to choose between fn:snapshot and
fn:copy-of and how can he determine what argument can take a streamed
node and what argument cannot?
Another argument to allow multiple arguments that can take nodes is that one argument
can be motionless in the function body and the other is not. Together they would be
consuming, or if both are motionless, together they would be motionless. But this
still
does not deal with the fact that to get a node into the function, that node first
needs
to be consumed, unless it is the context node or an ancestor of the context node.
These
restrictions would be very limiting and also very hard to define in spec
language.
The resolution is to keep things simple. Instead of allowing multiple arguments that
take nodes, streamable functions can have a maximum of one parameter that can take
nodes. The other arguments must either be non-node types, or absent.
For this section I consider four kinds of recursion[16]:
Direct recursion: function calls itself
inside the body of the function.
Indirect recursion: function calls
itself indirectly through another function it calls.
Indeterminate recursion: function
applies templates and a template at some level calls the function.
Dynamic recursion: a dynamic function
call resolves, or can resolve to a function which is currently
executed.
Direct recursion
To understand the challenges when dealing with recursive functions it is enough to
look at a most trivial example, which is clearly streamable and uses infinite direct
recursion:
This function does not have an exit-clause for the recursion and will run
indefinitely, likely causing a stackoverflow error. But it is legal XSLT and it is
streamable in the sense that it no movement occurs on the streamed argument.
First we need to determine the posture of the argument to the
function, here it is striding.
This sets the posture of $n to striding as well. The next step is to determine the
posture and sweep of the function body.
To determine that, we need to determine the posture and sweep of the
contained sequence constructor, which is a single
xsl:sequence with here one operand, the
select-expression.
The posture and sweep of xsl:sequence is equal to the
posture and sweep of its select expression[17], here f:recur($n).
We know the posture of $n, but to determine the posture
of the function call f:recur we need to analyze the
function body again, which we already did, but without having a
conclusive sweep or posture yet. We've reached an eternal analysis loop,
the function is non-analyzable.
It is not trivial to resolve this dead end situation. One way forward could be to
split the function body artificially into two operands and analyze each
individually: one for the part without the recursion point[18], setting the recursive function to the empty sequence (),
and one for the argument to the recursion point. This will help somewhat, but feels
wrong and is hard to proof right. What would happen if a function returns streamed
nodes and gets a climbing input and the output is
of a different posture?
Looking back at Table III: Posture and sweep of stylesheet functions we can deduct that
any function call with a climbing or crawling argument must yield a climbing or crawling result. The
only argument that is allowed to return a different result posture is an input with
a striding posture. This makes sense, because the
natural process for streaming is to go through the input document depth-first, which
is the striding posture and once you change
direction upwards, you cannot go down again: it would make a construct roaming. To
summarize, we can consider that:
A climbing input argument must
remain climbing to be streamable, so if the input is climbing, the
recursive argument must also be climbing and the body cannot contain an
expression in another direction.
A striding input argument can
change to climbing but not to
crawling in recursion, because
that would validate the next point. If it changes to climbing the whole function body must be
motionless, because you cannot both climb and consume, and if it stays
striding the whole function body
must be striding and can be either
motionless or consuming.
A crawling input argument can only
take motionless expressions, it cannot turn magically back into
striding[19] nor is it allowed to have a crawling expression operate on another crawling expression.
Anything roaming will always stay
roaming.
The following table shows what this means for different argument postures of calls
to a function, what the recursive argument (the one at the recursion point) can be
and the result posture of the recursive call is and what the posture of the function
body must be (it must be the same) and what the sweep is of the recursive call. The
last column stresses the maximum sweep of the function body with the given
arguments. Note that all these analyses are in the context
of the initial function call, not the recursive call, which makes it
possible to allow one function body to take different postures as input:
Table IV: Posture and sweep of recursive stylesheet function calls
Argument posture
Recursive argument posture
Recursive function body & call posturea
Recursive function call sweep
Maximum function body sweep
a)
The function body must match the recursive call
posture or be grounded, otherwise the posture is
roaming.
b)
In fact, if the argument is grounded, the recursive
argument will also be grounded.
c)
Because on a next iteration, the argument will be
crawling, not striding anymore, it would be a crawling
expr. on a crawling expr. which is not guaranteed
streamable.
Grounded
Any (grounded)b
Groundedb
Motionless
Motionless
any
Direct-transitional
Posture of argument
Motionless
Consuming
Climbing
Climbing
Climbing
Motionless
Motionless
Climbing
Non-climbing
Roaming
Free-ranging
n/a
Striding
Climbing
Climbing
Motionless
Motionless
Striding
Striding
Striding
Consuming
Consuming
Striding
Crawling
Roamingc
Free-rangingc
n/a
Crawling
Climbing
Climbing
Motionless
Motionless
Crawling
Non-climbing
Roaming
Free-ranging
n/a
Roaming
any
Roaming
Free-ranging
n/a
For this table to work in practice, we can allow at most one recursion point, or
each recursion point must have the same posture and at most one can be consuming. Once you have found the posture
and sweeps of the recursive function call, you can continue the analysis of the
function body and then apply the previous Table III: Posture and sweep of stylesheet functions.
The rules in this table are laid out in such a way that it is not possible to have
a climbing expression as the argument to a
recursive function at a recursion point and somewhere else in the same function body
a consuming construct or a non-climbing result. This is necessary, because if we
would allow the combination of a climbing
recursion point and a striding function body,
then on either the initial call (with a climbing
argument) or the recursive call (if the initial call would have been striding for instance) would call the function with an
incompatible climbing posture, which would then
be consumed. And a climbing posture can never be
consumed. More specifically, it prevents a function like the following to be
considered streamable:
Glancing over the example might already give a hint as to why it cannot possibly
be streamable: the xsl:value-of can consume a streamed node and that
streamed node is likely to be at a climbing axis because of the climbing posture of
the recursive argument. If we take the item in the table into account and consider
calling this function with a striding argument, as with f:climbstr(a/b)
then we find in the entry for striding and
climbing that the body of the function must
at most be motionless. It is consuming, because of the xsl:value-of so
the analysis fails and the function is roaming
and free-ranging.
If we go back to our original example (see Code listing: trivial direct recursion) and apply
the rules of this table, we have one recursion point, f:recur($n), the
argument is direct-transitional, that is, it is
motionless and takes the posture of its caller, in this case the caller of the
function. According to the table above, we can use any posture as input, which will
then become the result posture, and the sweep of the function body follows as
motionless and grounded. In our case, the call has a child-select expression in
f:recur(y), which has posture striding and sweep consuming. The
result posture and sweep of the whole template is then grounded and consuming.
Indirect recursion
With indirect recursion there are extra layers of calls before the recursion takes
place, as in A calls B calls C calls A again. The question that rises is whether the
rules in section “Direct recursion” also hold if a function calls itself
indirectly. Indirect recursion is a type of recursion that can also be determined
statically. Let us look at an example:
The example is a very verbose and clumsy way of getting all attributes of all
ancestor-or-self nodes that have a certain name. For instance, with the
following input and stylesheet, the output will be 3.14 1.64:
But whether it is clumsy or not is not the matter. It serves as a basic example of
indirect recursion and the function is clearly streamable because it only traverses
the ancestor and attribute axes, which are motionless. However, the question is, of
course, whether this example is streamable by following the rules from
Table IV: Posture and sweep of recursive stylesheet function calls.
If we try that a first time, we quickly find that it is not immediately obvious
how to apply it, because the functions above are not directly recursive. Recall that
we said in section “Direct recursion” we wrote we can allow at most
one recursion point, or each recursion point must have the same posture and at
most one can be consuming . There are two recursion points here, the one
inside f:filterattr that calls f:take-ancestor and the one
inside f:take-ancestor that calls f:filterattr. The
posture of the first one takes the input posture of the argument, it is Direct-transitional, the second one traverses the
ancestor axis, which gives it a climbing posture.
Finally, the argument inside the template example in Code listing: indirect recursive function call example
has itself a posture of striding, because the
context item expression takes the context posture
which in the case of xsl:template is striding.
In short, if we start at the template, we start with posture striding, then striding again
(because it is direct-transitional), then climbing and on the next recursive iteration this
repeats, but now from climbing as input to
f:filter-attr, then climbing again
(because direct-transitional takes over the posture
of the argument) and so it continues climbing until all recursive iterations have
finished.
This breaks the rule we set to ourselves: the posture changed from striding to climbing.
Yet, we also established that the example functions were streamable. The conflict
arises from the argument $n being direct-transitional, which works like a cameleon and changes the
posture along the way. One resolution to this specific situation is to allow
direct-transitional, provided the rest of the
body of the function is motionless, in which case it does no harm. Another, perhaps
easier solution is to force the same posture in, same posture
out rule and rewrite the argument to this function as follows:
The change to using the ancestor-or-self axis also changes the initial posture to
climbing, which in turn changes the posture of
$n to be climbing as well, making
all postures in all recursion points climbing.
While this relatively trivial example seemed to show an edge-case because it used
a recursive argument posture of direct-transitional, it taught us two things: one, even a trivial
example can quickly appear hard once we try to apply streamability rules on them and
two, to keep things simple, it is best to stick a set of ground rules, even though
that potentially means that we slightly limit certain use-cases.
In practice, indirect recursion is found pretty rarely in XSLT programming, even
though XSLT is itself a functional language, it is much more common to use direct
recursion. But if a scenario requires indirect recursion, it is possible, even with
streaming.
Indeterminate recursion
Another challenge comes from indeterminate recursion as in the following
example:
In this situation, the analysis of the function declaration and the template
declaration can be done independently, without ever finding the recursion point.
However, when this code is run, the recursion kicks in (assuming the argument is an
element). There is, however, one big advantage to this type of recursion: the rules
on
templates define that the result of a template declaration must be grounded[20]. Therefore, by definition, the result of any
xsl:apply-templates is also grounded, which gives the function body above a grounded posture as well. In turn, the call to f:indeter
results in a grounded posture, which concludes the
circle. In addition, any select-expression in xsl:apply-templates must be
atomizable, which in streamability terms means that it cannot be climbing or crawling[21]. This leads us to a temporary conclusion: if the arguments to a function are
striding they can be used in indeterminate
recursive functions. If the arguments are not striding, the analysis fails and makes the call on the function
roaming and free-ranging.
Following Table III: Posture and sweep of stylesheet functions and taking the same
approach that the body of the function should be analyzed during the call to the
function, we can conclude that it is impossible for a function body to apply
templates on something else then a striding
posture. The one exception that is allowed in the General Streamability Rules[22]is for a climbing posture of childless
nodes, which only applies to the attribute axis. This means that we can consider the
following example streamable as well, even though we pass a climbing, or even a crawling
expression to the function:
Whether or not that function will be called indeterminate-recursively or not is
irrelevant, because on the second iteration, the selection will be empty, because
attribute nodes do not have children.
In short: indeterminate recursion does not have influence on the streamability
analysis, nor does it break it. The existing rules suffice.
Dynamic recursion
Dynamic recursion looks a bit like indeterminate recursion in the sense that
whether or not recursion takes place cannot be determined statically. In XPath 3.0,
it is possible to bind a variable to a function[23], this is commonly referred to as higher order functions. The concept is
not new, already in 2001, Dimitre Novatchev showed in his paper
Novatchev 2001 that writing higher order functions is possible
in XSLT 1.0 by means of what he calls template
references[24], a concept also applied to the functional coding patterns in
Mangano 2005, which was further popularized as the Novatchev technique in Kay 2008.
It took until XSLT and XPath 3.0 that function items became first-class citizens,
see my own paper Braaksma 2013 for an extended coverage of that subject
for XSLT and Grust 2013 for a coverage for XQuery and
databases.
For the rest of this section, we will assume you have a basic understanding of
higher order functions in XSLT and XPath. The closure of higher order functions
cannot contain the context item, so for our analysis we should only worry about
arguments that can be streamed nodes, just like with regular functions.
The nature of a dynamic function call is that the function it refers to is only
known at runtime, when the variable is bound to a function item. As a result, there
is not much we can say of the posture and sweep of the function it refers to until
it is actually invoked. Following the rules on streamability, the result of binding
a variable to a function item is always grounded. This makes sense, because at the
moment of binding, a function item is assigned to a variable and no streamed nodes
are possibly involved in the result.
Once a function item is evaluated, its arguments are known. Provided that the
arguments themselves are not roaming and
free-ranging, the analysis will thenceforth
look at the bound function. If this function is streamable, we can do the analysis
from the previous sections as if it was a direct function call. The biggest
difference lies in the fact that the analysis is done during execution. This means
that an error for whether a call is not streamable will be a dynamic error.
This is no different for dynamic recursion. If dynamic evaluation leads to a
recursive call, the posture of the argument, the recursive call and the function
body must be matched to Table IV: Posture and sweep of recursive stylesheet function calls. If the result is
roaming and free-ranging, the recursive call is not guaranteed
streamable.
While it is theoretically possible to allow such dynamic streamability rules, the
rules in the specification are meant to guarantee
streamability statically. As a result, any potential dynamic streamability is prohibited. Processors may allow dynamic
streamability as a vendor extension, but the specification will consider passing a
node to a dynamic function call not guaranteed
streamable, dynamic recursion, as suggested in this section, is
therefor not possible with the current set of rules on static
guaranteed streamability.
Analyzable and non-analyzable functions
Analyzable stylesheet functions are functions that can be analyzed by themselves,
as we saw in . Non-analyzable stylesheet functions are
those that can only be analyzed from the function call, because the body of the
function on itself cannot be analyzed without knowing what the argument is. Most of
the previous sections was about non-analyzable function calls.
Spec-wise, the term non-analyzable refers to the
set of functions that cannot be analyzed statically and are given very conservative
assumptions on streamability. This is true for recursive stylesheet functions and
non-final streamable functions (see section “Non-final streamable functions in packages”). The term non-analyzable does not necessarily mean that the
function is not analyzable, however, the general rule is that it requires a call,
or
knowledge of the context of a call to the function to find out whether or not the
function is streamable. We have seen in the previous sections that it is possible
to
allow quite a variety of postures, even for non-analyzable functions, but in the
specification such functions are limited to a posture of striding and are always considered consuming even if in fact they are not.
Non-final streamable functions in packages
In XSLT 3.0, a new feature, packages allows
programmers to create a collection of modes, functions, named and matching templates,
accumulators etc and pack them in a, potentially pre-compilable, package, which can then be included in other packages or stylesheets
with xsl:use-package[25]. Components in that package can have a visibility attribute that can be set to public, private, final or abstract. If the visibility is
set to private it can only be used in the local
package, and if it is set to final the declaration
cannot be overridden, but can be used also in other packages. Visibility abstract means there is no implementation yet, and it must be
implemented by a using package (one that includes the package). And public, the default, means that a declaration can be
overridden by using xsl:override, which will then take precedence over the
original implementation. This can be considered similar to virtual
methods in object-oriented languages like C++ and C#.
The result for determining whether a function can be guaranteed
streamable or not cannot be determined statically at the moment that a
package is compiled, because it is possible that it will later be overridden and any
calls to that function can be impacted if the implementation, and the streamability,
changes.
As briefly explained in the previous section, if a function is defined with visibility
abstract or public, in other words, if they are non-final and non-private, then they are
considered non-analyzable by the specification. This
means that the processor cannot, and does not have to, try to analyze function calls
too
deeply: it only expects a striding posture and it will
not try to determine whether or not the function is consuming or motionless when you
call the function: it assumes the worst and will consider the call consuming.
A proposal from me in the bug entry [26]awaits reaction from the working group. It shows a way that processors can
use to statically detemine the posture of non-final streamable functions. However,
if
that proposal were excepted, it will pose limitations on overriding non-final streamable
stylesheet functions. I think, however, that not being able to override a function,
and
not being able to write non-final functions, is too severe a limitation. Even if a
function is non-final, if you override it to make it final, you will have to rewrite
the
whole function body, you cannot use xsl:original(), because that would call
into the non-final function again, which takes too much of conservative approach.
An alternative approach is the approach taken for the recursive functions in the
previous sections. For all analysis there, we always considered the function call
itself
to get the whole picture. For functions that are non-final, such an approach can also
work. Because the processor only knows what function to call when it encounters the
function call itself, it should include the body of that function at that moment into
the analysis. This can be done statically, there is no need to first run the
stylesheet.
A potential problem, however, still arises if someone writes a package function that
works correctly with a climbing result posture, but
cannot be used if that is changed into a function override with a striding result posture. Even more, a motionless function could be changed into a consuming function, which will greatly impede existing calls to that
function. One way to remedy that is to disallow overriding functions with a different
result posture or with a different sweep. That means that any given call will need
to be
analyzed against the new function and the xsl:original() function call, as
if it still existed in scope, otherwise it will not be possible for the processor
to
determine whether there is a change in posture or sweep.
Whether or not this turns out to be a feasible option in practice remains to be seen.
After all, there is always a trade-off between usability of a feature and the complexity
of the rules or the complexity for implementations. If the rules become too complex
for
anybody to understand, then there is littly chance it will be used in practice, and
there will be little incentive for existing implementors to implement such feature,
let
alone write specification-tests for it.
An improvement: type-determined posture
In the past few sections we have seen that analyzing a function that can return nodes
can be quite hard and intricate. However, if you know what type your function will
return, you can use the as-attribute on the xsl:function
declaration. This will tell the processor that the result must be converted into the
type defined in that attribute.
In cases where the type is not a node, the processor can go much further with static
analysis than in situations where the type is not known, because if the type is not
known, the processor must assume the worst. Suppose you declare the function with
as="xs:integer", there is no way your function can return nodes. For
recursion this is a big help, because the processor now knows the result of the
recursion will not contain nodes. Whether or not there is overlap in the argument
expression is now less relevant and the posture of any function call, even the recursive
ones, will always be grounded. This makes the
requirement to match the argument with the result posture go away and simplifies both
writing and analysis.
Let us look at an example, adopted from an example by Michael Sperberg-McQueen:
The example takes a node, processes its parents recursively and returns the names
in
reverse order (deepest first) separated by slashes. Because the function is declared
to
return a grounded result, namely
xs:string, there is no need to assume the function can return nodes,
because it cannot. Through Table IV: Posture and sweep of recursive stylesheet function calls we find
that if the argument posture is striding and the
recursive argument posture is climbing, as is the
case here, the recursive function call is considered climbing as well and the function body must then result in either a
climbing or grounded posture.
In this case, the function call itself does not have to be set to be climbing, because we know from the xs:string
result type that the result will always be grounded.
This simplifies the analysis in that we can now consider the function argument to
be of
either absorption or inspection usage. If the function body as a whole consumes the node
referenced by $n it will be absorption and
the analysis fails, because a climbing expression with absorption usage is not allowed. Because the only other operands on the
streamed argument are fn:count and fn:name, both having
inspection operand usage and both being motionless, the result of the analysis as
a
whole is that the function is guaranteed streamable, does not consume the streamed
argument which we can summarize as having usage inspection, and while it is direct-recursive we know it can be called
with any argument, albeit climbing, striding or crawling.
We could expand the type-determined analysis further by including whether or not the
streamable argument has a cardinality of zero-or-one or has a cardinality of
zero-or-more. In the case of zero-or-one there is no way that overlapping nodes can
be
passed on to the function, which can simplify analysis of a function call with a
crawling argument, because it will fail dynamically
if the crawling argument returns more than one node.
This special-case scenario has meanwhile been tackled and been generalized for any
situation and cardinality[27], which can also by applied to stylesheet function call arguments.
Miscelleneous
Inline functions and streamability
Inline functions are a new XPath 3.0 capability[28] that allows you to write an anonymous function inside an XPath
expression. The definition of inline function expressions prohibits referencing the
context item, which makes it not directly possible to refer to the context. However,
the closure of an inline function contains all local variable bindings, which makes
it possible to refer to a variable outside of the inline function, which itself
could be bound to a streamed node. The streamability rules for
xsl:variable and xsl:param prohibit referencing a
streamed node, but one exception is inside a streamable stylesheet function, where
it is possible to have a parameter bound to a streamed node.
There are two solutions possible. One is to come up with complex rules for inline
functions referencing the streamed parameter, the other is to simply prohibit using
a streamed parameter inside an inline function body. This in itself would not be
consistent with the definition in XPath, which states that all local variable
bindings are available in the dynamic context of the inline function, which leads
us
to a third option: simply disallow inline function expressions within the body of
a
streamable stylesheet function.
This third option would be in line with the current streamability rules for inline
function declarations[29], which states that they are, by definition, grounded and motionless, simply
because it is not possible to pass a streamed node into the closure of the inline
function body.
Partial function application and streamability
Partial function application is also a new XPath 3.0 capability by writing down a
question mark as a placeholder for an argument[30], and apply the other arguments already. The result is a function item
with less arguments than the original and some arguments already filled in. An
example is the expression index-of(?, ?, "http://my-collation"), which
returns a function item that presets the collation argument to
http://my-collation, resulting in a two-argument function that
always uses the same collation.
The current rules are under the rules on function calls[31] state that, unless the function is focus-dependent, that the general streamability rules apply, which means that it is
treated as a normal function call, but the placeholder arguments are ignored.
If we take into consideration that binding a streamed node to a variable and
passing it around is prohibited, a similar rule should be applied to partial
function application: if the function is a streamable stylesheet function, the
argument that can take a streamed node, must be the placeholder, or a grounded node
must be supplied, otherwise it would become a roaming and free-ranging
expression.
In other words, partial function application works with user-defined streamable
stylesheet functions, as long as you do not try to bind a streaming node from the
current context to it. You should do that at a later stage, when you actually call
the function.
Named function references and streamability
Named function references are another XPath 3.0 feature[32]: you can create a function item of an existing function by using the
syntax FunctionName#ArgCount, where ArgCount is a literal
giving the arity of the function. For instance, count#0 will give a
reference to the zero-argument function of count and
my:filter#3 will give a reference to a three-argument function
my:filter.
The function item returned by such an expression can be called as a normal
function by applying parentheses and arguments, which in turn will call the function
you referred to to begin with. In practice, this syntax is useful for binding
variables to existing named functions.
Creating a named function reference is an atomic action and does not involve
references to nodes. As a result, a named function reference itself is always
grounded. There is one exception, if you try to create a reference to a function
that is focus-dependent. In such cases, just like in previous rules, the result is
roaming and free-ranging. This is in fact the current rule for streamability of
named function references in the specification[33].
Since stylesheet functions by definition cannot be focus-dependent, this exception
does not apply to named function references that refer to a (streamable) stylesheet
function.
Dynamic function calls and streamability
A dynamic function call, also a new XPath 3.0 feature[34], is a call to a function item that is bound to a variable. Suppose you
have <xsl:variable name="fref" select=" 'name#1' " />, which binds
the one-argument version of the name-function to $fref, then you can
call that function by adding parentheses and arguments, the same way you would have
done if the variable were the actual function:
$fref(child::*[1]).
Since it is not possible to know at runtime what function the variable is bound
to, analysis can only take place once the function is actually called. The current
rules[35] state that all arguments have operand usage navigation, which means that you can only call a dynamic function
when you actually create a grounded copy of a
streamed node. In case you wanted to apply it to a streamed node you are out of
luck.
The rules make an exception in case the signature of the function item is known,
in which case type-determined usage, as in part
explained in section “An improvement: type-determined posture”, can be used. That means that, if an
argument is declared as a non-node type, the usage typically becomes absorption.
Streamability of stylesheet functions according to the specification
The current approach taken by the rules in Bug 25679 is a
pessimistic one. For stylesheet functions to return nodes, they must be striding and are allowed to be either motionless or consuming. As a further
limitation, which actually follows from forcing the striding posture on function bodies, is that the argument to the
function must itself also be striding. In fact, it is
not possible, by the current definition, to have a streamed node with crawling or climbing posture
act as an argument to a streamable stylesheet function.
By limiting the result posture of the stylesheet function to one allowed posture,
it
is easier to write rules for both analyzable and non-analyzable stylesheet functions,
see section “Analyzable and non-analyzable functions”. However, because the bug entry is quite fresh and the
discussion is still ongoing, there is little conclusive I can tell about the rules
that
will eventually make it into the next public version of the specification[36]. For instance, in Comment#1
of the same public bug entry I have proposed some of the rules from Table III: Posture and sweep of stylesheet functions. Whether or not any or all of these suggestions will
make it into the specification will remain to be seen.
Even if the specification will only allow relatively pessimistic stylesheet functions,
vendors are still allowed to use wider rules on streamability. In the case of functions,
Exselt will allow any function that is streamable, as explained
in the previous section, by using an optimistic operational mode at user option.
The streamable attribute on xsl:function
In none of the previous sections in this paper have I mentioned the
streamable attribute on an xsl:function declaration.
Since most of the paper discussed streamability of a stylesheet function in the
context of the function call, whether or not a function is streamable then depends
on its implementation and there is no need for the processor to know beforehand what
functions to analyze and what not.
The streamability rules in the specification are made such that the processor must
statically determine whether or not a function declaration is streamable or not.
Since it is likely that any given stylesheet will have both streamable and
non-streamable stylesheet functions, it is necessary to tell the processor which are
and which are not streamable. For that, the xsl:function declaration
gets a new attribute, similar to the attribute of xsl:accumulator and
xsl:mode, which determines whether or not a function should be
analyzed for streamability.
The attribute takes a value of yes or no. If the value
is yes, then the function must be guaranteed
streamable and the function can be called with a streamable node as
an argument.
I think that such an attribute is not necessary if we take the approach in this
paper and analyze the entire function only upon the actual function call. If that
function call is inside a streamable context and a streamable node is passed as an
argument, the body of the function is analyzed with the context posture of the
streamable argument set as the posture of the streamable parameter reference.
Status of current processors with respect to streamable stylesheet functions
Since streamable stylesheet functions are a relative recent addition to the
specification and lots of it is not yet publicly available, there are currently no
processors available that fully support streamable stylesheet functions.
At the time of this writing, of the two streaming processors that I know of, Exselt and Saxon, the former currently has a full streamability analysis for all
constructs but a limited ability to analyze stylesheet functions for streamability.
However, the available set of tests is still growing and the rules on streamability
may
change between now and the actual presentation of this paper, or even afterwards,
since
the specification is not final yet and work on streamable stylesheet functions may
change the current rules. Apart from analyzing a function for streamability, Exselt
chooses an optimistic streaming approach: if you write your stylesheet in a correct
way
with forward expressions only, your stylesheet will process streaming input in a
streamable way.
For Saxon, I do not know the actual current status on streamable stylesheet functions,
but I do know that they plan to do it in the not-so-far future.
If you are interested in streaming, or more specifically in streamable stylesheet
functions, keep an eye out on the website of Exselt and Saxon as it is likely that
in
the near future, both processors will support this feature.
Conclusion
The previous sections have shown that writing streamable stylesheet functions is not
that hard, as long as you do not require recursion, or want to write overridable
non-final functions. But if you do require recursion, the simplest rule to remember
is
same posture in, same posture out. As long as you
stick to that rule, and forget about the exceptional cases, you are in safe streaming
waters.
We have also seen that the current resolution in the specification taken to allow
streamable stylesheet functions is far more limiting, allowing only a posture of
striding for the input arguments and in the case of
recursive functions or non-final functions, always concludes that a call to such a
function is consuming, even if it is not. This
conservative approach has a valid reason, though: it tries to keep things simple and
the
rules understandable and implementable. But even with this limitations in mind, it
gives
stylesheet authors quite a wide range of possibilities to write streamable stylesheet
functions.
This paper has shown how a relative small change to the specification opens up the
way
for stylesheet and library package writers to write streamable stylesheet functions,
which can be used in both streamable and non-streamable scenarios alike[37]. This change was an important and vital one and has brought us very close to
making the streamability rules as mature as they can be and ready for widespread use
in
stylesheets and packages. Hopefully this paper has given potential library writers
a
strong hint that they should take the extra step to write their library packages with
streamability in mind to be useful for both the streaming and non-streaming
markets.
[Grust 2013]
Grust, Torsten and Ulrich, Alexander.
First-Class Functions for First-Order Database Engines
Presented at International Symposium on Database Programming Languages.
In Proceedings of the 14th International Symposium on Database Programming Languages
(DBPL 2013), Trento, Italy.
arXiv:1308.0158:
http://arxiv.org/pdf/1308.0158v1.
[Gerstbach 2006]
Gerstbach, Peter.
Generating Structured Documents to Create Reports by Integrating Data from CMS/DMS
and EAI Systems: pp 27-28.
Master thesis. At: Softwaretechnik und Interaktive Systeme der Technischen Universit¨at
Wien. May 2006.
http://www.gerstbach.at/2006/thesis/. Web.
[Kay 2008]
Kay, Michael.
XSLT 2.0 and XPath 2.0 Programmer's Reference
2nd edition, 5 May 2008: pp 251+. Published by O'Reilly Media. ISBN: 0470192747. Print.
[Lenz 2005]
Lenz, Evan.
XSLT 1.0 Pocket Reference (Pocket Reference).
1st edition, 19 August 2005: pp 23-40. Published by O'Reilly Media. ISBN: 0596100086.
Chapter 3 (pages 23-40) is available online: http://lenzconsulting.com/how-xslt-works/.
Print.
[Mangano 2005]
Mangano, San.
XSLT Cookbook, 2nd Edition, Solutions and Examples for XML and XSLT Developers
2nd edition, 21 December 2005: pp 686-732. Published by O'Reilly Media. ISBN: 0596009747.
Print.
[XPath 3.0]
Robie, Jonathan; Chamberlin, Don; Dyck, Michael; Snelson, John.
XML Path Language (XPath) 3.0 Recommendation
8 April 2014. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xpath-30/.
Web.
[XProc]
Walsh, Norman; Milowski, Alex; Thompson, Henry S.
XProc: An XML Pipeline Language
11 May 2010. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xproc/.
Web.
[XSLT Latest Version]
Kay, Michael.
XSL Transformations (XSLT) Version 3.0, Latest Version
Undated. W3 Consortium. W3C Working Draft / Last Call Working Draft / Candidate Recommendation
/ Proposed Recommendation / Recommendation.
http://www.w3.org/TR/xslt-30/.
Web.
[1] The ancestor-or-self axis is available during
streaming and requesting properties on nodes on that
axis is allowed. However, it is not possible to navigate
away from that axis, doing so would result in a
free-ranging and roaming expression.
[5] See, for instance, his answer in question 16631213 at StackOverflow, which uses recursion on element
nodes, it's an example where functions are arguably a better choice than
template matching.
[6] See section 5.6
Patterns, specifically the part on predicate
patterns in the XSLT 3.0 Working Draft.
[7] Before user-defined stylesheet functions were allowed to be streamable and
take on streamed nodes, the rules were as in the current Working Draft:
For a call to a stylesheet function, the general streamability rules
apply. There is one operand role for each argument in the function
signature, and its operand usage is the type-determined usage based on the
declared type of that argument., see section 19.8.7.12 Streamability of Function Calls in XSLT 3.0 Working Draft.
Please note that this and other references to the WD will change when the next
version of the specification comes out.
[8] See section 19.8.1. in XSL Transformations 3.0.
[9] See section 19.8.4.17 Streamability of xsl:for-each in the XSLT
3.0 Working Draft, specifically the rule The posture of the
instruction is the posture of the contained sequence constructor,
assessed with the context posture and context item type set to the
posture and type of the select expression..
[11] The rule that a child axis is not guaranteed
streamable if applied on an axis other than an axis with
striding posture follows from 19.8.7.7 Streamability of Axis Steps in the XSLT Working Draft,
in the table under item 6 in the list.
[12] A higher-order operand is an
operand that can change focus, such as xsl:for-each
or xsl:for-each-group, see the definition of higher-order
operand in the XSLT 3.0 Working
Draft.
[13] See 19.8.7.6 Streamability of Axis Steps in XSL Transformations 3.0.
[14] See 19.8.7.7 Streamability of Axis Steps in XSL Transformations 3.0.
[15] A public bug entry Bug
25185 tries to resolve this by allowing limited buffering of crawling
postures in a absorption contexts such as these, if that bug is accepted and
resolved, overlapping nodes as with crawling posture is allowed with usage
absorption, at least in the general case.
[16] The XSL Transformations 3.0 Working Draft does not use this terminology, but
it helps differentiate different situations for this discussion.
[18] I use the term recursion point for the
place in the code where the recursion is invoked, but this is not a term
that is used in the specification.
[21] One exception is allowed for selections that only select childless nodes, such
as attribute nodes. These are climbing but do
not contain children and are henceforth allowed.
[36] The next version will probably be a Candidate
Recommendation, considering that the current status is Last Call Working Draft, meaning that the
specification is ready for implementors and only bugs found that come from
implementation issues will be fixed. No new features will be added.
[37] Any streamable construct will work exactly the same in a non-streaming
scenario. Functions or modes written with streaming in mind can be used with
both streaming and non-streaming input without alterations.
Braaksma, Abel. Efficient XML processing with XSLT 3.0 and higher order functions: pp 23-40.
Presented at XML Prague 2013, a conference on markup languages and data on the web, Prague, Czechia, Feb 8-10, 2013.
In XML Prague Proceedings 2013.
http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf.
Web.
Braaksma, Abel.
Streaming for the masses, an introduction to streaming with XSLT: pp 29-80.
Presented at XML Prague 2014, a conference on markup languages and data on the web, Prague, Czechia, Feb 14-16, 2014.
In XML Prague Proceedings 2014.
http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf.
Web.
Grust, Torsten and Ulrich, Alexander.
First-Class Functions for First-Order Database Engines
Presented at International Symposium on Database Programming Languages.
In Proceedings of the 14th International Symposium on Database Programming Languages
(DBPL 2013), Trento, Italy.
arXiv:1308.0158:
http://arxiv.org/pdf/1308.0158v1.
Gerstbach, Peter.
Generating Structured Documents to Create Reports by Integrating Data from CMS/DMS
and EAI Systems: pp 27-28.
Master thesis. At: Softwaretechnik und Interaktive Systeme der Technischen Universit¨at
Wien. May 2006.
http://www.gerstbach.at/2006/thesis/. Web.
Mangano, San.
XSLT Cookbook, 2nd Edition, Solutions and Examples for XML and XSLT Developers
2nd edition, 21 December 2005: pp 686-732. Published by O'Reilly Media. ISBN: 0596009747.
Print.
Walsh, Norman; Milowski, Alex; Thompson, Henry S.
XProc: An XML Pipeline Language
11 May 2010. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xproc/.
Web.
Kay, Michael.
XSL Transformations (XSLT) Version 3.0 W3C Last Call Working Draft
12 December 2013. W3 Consortium. Last Call Working Draft.
http://www.w3.org/TR/2013/WD-xslt-30-20130201/.
Web.