Notes
Updates
This paper discusses very recent changes to the XSLT 3.0 specification that are still under discussion in the related BugZilla bug entries. When the specification is updated, rules laid out in this paper need updating as well. I will publish those updates at http://exselt.net/papers in DocBook and PDF formats.
Prerequisites
To read and understand this paper, a basic understanding of XSLT 3.0 and streaming is desirable. For an introduction on streaming, you can refer to two earlier papers in this series, Braaksma 2014a, about the Ten rules of thumb of streaming and Braaksma 2014b, about Streaming XSLT design patterns.
Disclaimer
This paper relies on information that can be found in the public XSLT Last Call WD at the time of this writing, and in some of the public XSLT 3.0 BugZilla bugs (see W3C BugZilla). While I am an invited expert for the XSL Working Group, I do not speak at there behalf, and any thoughts I lay out in this paper are my own and are not necessarily the thoughts of the XSL Working Group.
Changes to the Last Call Working Draft will be discussed through the same BugZilla and will ultimately result in the publication of a new version of the specification in XSLT Latest Version. Where this paper refers to XPath, XPath functions and operators or the XDM, it uses XPath 3.0, XPath and XQuery F&O 3.0 and XQuery and XPath Data Model 3.0.
Since the XSLT 3.0 specification is not final yet, it is possible that syntax or semantics of constructs used in this paper change in the future or are dropped in their entirety. Large parts of this paper rely on discussions and conclusions of functionality reported in the public section of BugZilla. Where the text of this paper deviates from the public Working Draft and where these changes are publicly available through BugZilla, I will state so by using footnotes.
Introduction
Streamability analysis is a complex subject, yet the basics are, as is often the case, relatively trivial: just write your stylesheet in such a way that it only uses the child axis, plus an occasional escape to node properties such as attributes on the ancestor or self axes, and you have essentially written a streamable stylesheet.
But in practice, it is not so trivial. Rules in the specification are complex and even hard-core spec-readers have trouble following them. They are meant for implementors and not necessarily for programmers or users, which is why these rules have to take care of every possible corner case and yet make sure that for general use, the rules work with the least possible surprises.
This paper tries to fill in the gap between spec-prose and tutorial. While there currently are no tutorials on the subject of streamable stylesheet functions, this paper aims to explain the basics and several rather advanced concepts such as recursive streamable functions, in a way that it becomes understandable for a larger public.
A streamable stylesheet function is a function that can take a streamed node from a streaming input document, while still passing the tests for being guaranteed streamable. Being able to write streamable stylesheet functions is import for package designers, so that they can create functions that work alike for streaming and non-streaming scenarios.
This paper first introduces you to part of the terminology used in the specification and at several sections in this paper, see section “Streaming terminology”. It then explains typical use-cases for streamable functions in section “A case for streamable functions”, after which it explains the challenges we face when dealing with streamability analysis and streamable functions in section “The challenge of streamable stylesheet functions”.
Streamability of standard, non-recursive functions is explained in section “In pursuit of flexible rules for guaranteed streamable stylesheet functions” and section “Multiple arguments that can take nodes”; and the rather complex subject of streamability for recursive functions is dealt with in depth in section “Recursive streamable functions” and its sub-sections. Futher sections deal with packages and function inheritance with respect to streamability (see section “Non-final streamable functions in packages”) and a proposal for a general improvement to the analysis of the posture in section “Streaming terminology”. And in section “Miscelleneous” several related subjects are covered on inline functions, partial function application, named function refverences and dynamic function calls.
The current state of the specification is discussed in section “Streamability of stylesheet functions according to the specification” and the current state of processors with regard to streamability analysis of stylesheet functions in section “Status of current processors with respect to streamable stylesheet functions”.
Streaming terminology
This section will briefly discussed terminology used throughout this paper. This overview does not try to be complete.
-
Guaranteed streamable: determines whether a particular construct, instruction or declaration is streamable according to the streamability rules in section 19 of the XSLT 3.0 specification. If it is, any processor that is conformant with the streamability feature will be able to process your input using streaming, that is, by reading the input in forward-only mode without keeping the whole document in memory.
-
Posture: the posture of a construct determines the state of a streamed node in the output of that construct. Often, this will be the same as the input posture. It is used in streamability analysis to determine whether a construct returns streamed nodes and in what way. Many constructs are capable of returning streamed nodes, but if all rules are followed, the usages of these constructs are limited in such a way that they can only return a subset that is compatible with streaming. The different postures are:
-
Grounded: the construct does not return a reference to a streamed node. An expression or instruction operating on a grounded posture is allowed to be free-ranging. Examples:
fn:currentDateTime
,fn:copy-of
,xsl:copy-of
, a variable reference. -
Climbing: the construct can return nodes that refer climb the tree through the ancestor or attribute axis. Examples:
parent::foo
,author/@name
,..
. -
Striding: the construct can return non-overlapping nodes, typically only on the child axis. Examples: any child-select expression, any match pattern,
fn:zero-or-one
,fn:outermost
. -
Crawling: the construct can return overlapping nodes on a downward axis. Examples:
foo/descendant-or-self::bar
,./b//c
. -
Roaming: the construct can return nodes that can be anywhere in the input tree. Examples:
following::node()
,ancestor::foo/bar
,preceding-sibling::price
.
-
-
Sweep: the sweep of a construct determines how the current read position of the input stream is changed as a result of executing the construct. The different sweeps are:
-
Motionless: the read position does not change. This is true for expressions that only operate on grounded nodes or data, and for expressions that request a property of a node or the ancestor axis[1]. Examples:
fn:has-children(.)
,@name
,if(./name()) then ...
,ancestor-or-self::para/@xml:lang
, any constant and any expression not operating on a streamed node. -
Consuming: the read position changes in a forward-only direction. Any expression that requires read-ahead is considered a consuming expression, unless the current node is a childless node, such as text- and comment nodes. Consuming constructs form the heart of streamability analysis. Any construct may have at most one consuming construct. Examples:
fn:count(foo)
,x/y
,<xsl:value-of select="name" />
. -
Free-ranging: the construct cannot be evaluated by using forward-only movement of the read pointer of the input stream, for instance when evaluation of a filter expression requires look-ahead. Typically, a free-ranging construct is also roaming. Examples:
foo[bar]
,parent::author/book
, anyxsl:sort
on streamed nodes,@* | foo
,if(a) then b else c
,fn:reverse(index)
.
-
-
Usage: the usage determines what an operand of a construct does when it receives a streamed node as argument. The following usages are defined:
-
Inspection: if the argument is a streamed node, it will inspect the node, but not consume the node, resulting in a motionless sweep (assuming no other part of the expression or construct consumes the input). An inspection operand is the only operand that can take a climbing, motionless expression as its argument.
-
Absorption: if the argument is a streamed node, it will consume the node, resulting in a consuming sweep. Examples are arguments of
fn:data
,fn:string
, constructor functions,xsl:value-of/@select
,xsl:apply-templates/@select
and text value and attribute value templates. -
Transmission: if the argument is a streamed node, the result will also be a streamed node. Depending on the function or construct, this can result in any kind of sweep. Examples are arguments of
fn:outermost
,fn:remove
,fn:subsequence
,xsl:sequence/@select
. -
Navigation: if the argument is a streamed node, there is no guarantee that calling the construct will be streamable. Typically, a navigational operand will result in free-ranging and roaming streamability analysis results. Examples are arguments of
xsl:call-template
(the implicit context item, making it impossible to use this instruction with streamed nodes), first argument offn:fold-left
, last argument offn:key
,fn:innermost
, any untyped argument to a stylesheet function not specifically marked streamable.
-
The combination of posture, sweep and usage determines whether a certain construct is guaranteed streamable or not. Constructs can be nested, and constructs have operands. What exactly constitutes a construct and operands is outside the scope of this paper[2], except for stylesheet functions, which will be explained in the following sections.
A case for streamable functions
Functions are a nice machinery in XSLT 2.0 that have surplaced most of the
xsl:call-template
scenarios from XSLT 1.0. The benefit of being able to
create a function and call it directly inside an XPath expression or in a pattern
filter
has shown its use-cases. More often than not, functions tend to operate on atomic
types,
such as strings and integers, because the natural habitat for nodes processing is
done
declaratively by template matching. This is often called The XSLT
way of doing things, see Welker 2008, Gerstbach 2006 and Lenz 2005, and even today, this is still
often advocated on the XSL Mailing List[3] and on discussion fora such as StackOverflow[4] and Experts-Exchange.com. However, even advocates of using templates over
(recursive) functions, such as Dimitre Novatchev, sometimes choose a function over
a
template based scenario because it is clearer or because it is simply a better tool
for
that particular job[5], or in the words of Jeni Tennison in Tennison 2001:
“If the result follows the structure of the source, then a push method is more natural — the source drives the process. If the result has a substantially different structure from the source, then a pull method is more natural — the result drives the process.”
It should be said: in XSLT 3.0, the natural way of doing things with nodes from an input tree is to use template declarations. This is no other than previous XSLT versions. Perhaps it is even stronger now that it is possible to apply templates on atomic values such as sequences of strings as well[6]. But this does not help with complex patterns, which can only be replaced by functions. Also, it does not help in scenarios where reusing a result in an expression is important, which is where functions come into play: they can be used inside expressions and call-template nor apply-template can do that (unless wrapped inside a function, that is).
The current working draft, which is in Last Call, does not have a provision for streamable functions. That is, it is impossible in any which way to pass a streamed node to a function. Dimitre Novatchev's example from the StackOverflow question would therefore not be possible in a streaming scenario. Now that more and more people have gotten used to write functions, and with the advent of (precompiled) XSLT packages containing lots and lots of library functions, it seems unfair that they cannot continue doing so in streamable scenarios, nor does it seem fair that library vendors will not have any means to create library packages with functions that are streaming-aware.
The XSL Working Group has considered this and as the solution of Bug 25679 shows, the proposal was considered and adopted into the internal working draft. The following sections will explain my personal analysis on the situation and section “Streamability of stylesheet functions according to the specification” will summarize which parts of this analysis match the specification. Since the discussion on this bug report is not final yet, anything in that section is very preliminary and may change without prior notice.
Stylesheet functions in streaming scenarios before they were allowed to be streamable
In the current XSL Transformations 3.0 Last Call Working Draft, calls on stylesheet
functions are analysed based on the declared type of the arguments. If an argument
is
typed as an atomizing type, such as xs:string
or xs:integer
,
the usage of that argument will be absorption, in other
words, it will absorb a streamed node, which in most cases results in a consuming sweep. The result of such function calls is defined
as grounded. If the argument is untyped or is a type
that can take nodes, it is disallowed to pass a streamed node to the function call,
because there is no way of knowing statically what will happen with that node inside
the
function. I call this the safe bet: disallowing
streamed nodes as arguments makes streamability analysis easier. Example:
In this example, the function f:square
takes an argument $i
that is of type xs:integer
. Upon calling the function, the processor will
have to atomize the value or node. To do so, it must process all its children, which
is
why such a usage is considered consuming. In this
case, the element number
will be atomized.
This is a typical way of writing functions and it works perfectly well with streaming. However, if you want the argument to be a node and get some properties of a node or process the node's children, it gets trickier, since passing a node to a function is disallowed with streaming. Consider the following:
In this example, the call to the function is not streamable. We pass on a node to
the
function and the rules in the current Working Draft state that it has usage navigation[7], which has the effect that the whole expression, in this case the match
pattern author[f:hasname(.)]
, will be roaming and free-ranging, in other
words, it is not guaranteed streamable.
It is possible to call the function in some situations, by using the
fn:copy-of
or fn:snapshot
functions. But that means that
users of your function must have enough knowledge to know what atomizing or grounding function to use before passing on an argument. In
this particular case, it would make little sense to use either of these functions,
because a match pattern must be motionless, so it is not possible to use the function
in
a match pattern predicate expression. This is essentially true for any function that
does not consume the input tree. With the current rules in place, even if a function
does not consume the input tree, you still need to create a copy of the tree to call
that function. Hardly efficient and in many cases impossible with streaming, because
the
input node may simply be too big to be copied.
Were you to use it in another context, you still have to choose between
fn:copy-of
and fn:snapshot
. This function would not work
with fn:copy-of
, because that function does not copy the ancestor axis. In
other words, you must know the internals of the function to find out that you need
to
call it with a call to fn:snapshot
. And even then, if the function requires
information from the original document, such as the base-uri or other properties that are not copied with either of those
functions, you are out of luck.
In essence, these rules disallow stylesheet authors to write a function that take nodes as arguments, and in the case of library package authors, they will not be able to write functions that take nodes that work the same way in a streaming scenario and a non-streaming one. They would have to educate library function users how to use their functions in a streaming scenario and many functions will never work in streaming scenarios because their usage would always be consuming even though the function body does not actually consume the input node. In other words, it makes writing functions for use in streamable stylesheets next to useless.
The challenge of streamable stylesheet functions
It is often trivial to see at a glance that a function is motionless and that it would not hurt using it on a streamed input node.
Take, for instance, Code listing: get attribute of ancestor from the previous section. It only uses
the ancestor axis and tests whether one exists with an attribute @name
.
Since walking the ancestor axis is motionless (the processor is required to keep a
stack
of the ancestor nodes and their properties) and the attribute axis is too, any call
on
that function would be motionless with respect to its argument.
This shows us one thing: it is possible to write functions that can be statically
analyzed to be motionless. But this function has
special properties: the body of the function is motionless and the result of the function is grounded. That means, it can never return any nodes, it can either
return true()
or false()
. But can we always statically
determine that that is the case? Let us write a slightly different function, this
time
we return the attribute node, instead of testing for it:
The function still takes a node as its argument, but this time it returns a reference to an attribute of the ancestor axis of that node. The attribute axis has climbing posture, which is limited in that you cannot navigate downwards again from it. In the first usage of this function, in the predicate of the pattern, the input is the current node, but a predicate only has to be true (nodes are there) or false (no nodes). After that, there is no navigation away from the climbing posture, so this could be considered streamable.
If we take a look at the second example, we see there an
xsl:apply-templates
. By definition, assuming the rest of your
stylesheet is guaranteed streamable, any apply-templates will be an atomizing construct:
templates must be grounded and therefore, passing on nodes to a template will atomize
those nodes. Just as we saw before, atomizing means that the node is consumed. In
streamability terms this means it has usage absorption. But to consume a node, its children must be visited, which
is a downward movement. The argument has a climbing
posture because it ends with f:hasname(.)
, which we know returns attribute
nodes, which are climbing.
There is a caveat, however: if the climbing expression returns childless nodes, consuming such a node will not harm the streaming process: there are no children to consume. The General Streamability Rules[8] in the specification have a special provision for this: childless nodes with a climbing posture are allowed in an absorption context.
So it can be made streamable, right? Wrong! For a construct to be streamable it must
be statically determinable to be so. Here, we have a function with an unspecified
return
type. If we had written as="attribute()"
, the processor would be able to
detemine that the returned nodes would always be childless and it could be used in
an
absorption context.
But this is only one example. What happens when the stylesheet function returns a climbing node and we pass it a crawling expression? What happens if the function returns crawling (overlapping) nodes and the input is climbing? And how about recursive functions, or within packages, with abstract or overriden functions?
Let us look at some possible approaches to tackle these issues.
Approach 1: analyzing stylesheet functions statically on their own
Arguably the easiest approach from both the user's point of view and for implementors alike is to try to find a method to analyze the streamability of stylesheet functions statically, without taking into account any possible context the stylesheet function is called in.
Suppose you have a stylesheet function my:for-each
which traverses
all nodes in a node set and returns a certain aggregate result. By its definition,
traversing the node set, it will consume the input tree, just like
xsl:for-each
would. Writing such a function that it works equally
well for different kinds of input postures is hard, perhaps even impossible. A
function call like my:for-each(head//section)
may have to traverse over
overlapping nodes, the expression in the argument is a crawling expression and looping over a crawling node set is not allowed in streaming because certain
buffering is required to cache the overlapping nodes. If the loop is not going to
consume the individual nodes, this is alright[9], but otherwise, it is not guaranteed
streamable.
For such a function to be statically analyzable without knowing what context it is used in, it is allowed to traverse the input tree, but it is not allowed to consume the individual nodes of the streamed input tree. A function that can potentially work with any input posture, albeit climbing, striding or crawling might look like the following:
This example loops over all the nodes, but does not consume the individual nodes, it only requests a property of those nodes using the attribute axis, which is motionless and its result is climbing. And a motionless expression on either a climbing, striding or crawling context is always allowed[10].
Even though this example is exemplary for showing the possibility of writing a
function that can take any posture, it limits our possibilities greatly. Suppose we
want to write an implementation of my:for-each
that does consume the individual nodes in the streamed node
set, for instance as in the following example:
Whether or not this example would use the child axis within the
xsl:for-each
or directly without the loop, for the streamability
analysis this does not matter. The child axis, if applied on a climbing or crawling axis, is
always roaming and has a sweep of free-ranging[11]. This means that the expression my:for-each2(child::book)
is allowed, but the expressions my:for-each2(listing//book)
and
my:for-each(title/ancestor::book)
are not, because the latter two
have arguments with crawling and climbing postures respectively and can therefor not be
consumed.
This leaves us with essentially two options for static analysis of streamable stylesheet functions, without having to take the function calls into account:
-
Option 1: force stylesheet authors to only write functions that are capable of dealing with any input posture. This means that the body of functions must be motionless, or if they are not, they must use a higher-order operand[12] such as the shown
xsl:for-each
construct in the example. -
Option 2: limit the allowed posture to only one posture, so that caller and callee are guaranteed to have the same input posture, always. Most likely candidate is the striding posture, because it is the most common posture when processing an input document. This would mean that the argument that can take nodes always has its context posture set to striding and that the only allowed argument posture when calling the function must also be striding.
Both options have severe drawbacks, but they have the advantage of being very clear-cut and easy to communicate or explain. As it turns out, the current approach taken by the XSL Working Group, as explained in XSL Bug 25679, takes the approach of Option 2: the result posture of a function must be striding or grounded and the argument to a function must also be striding or grounded. To accomplish this, the parameter that can take nodes, which can at most be one parameter, will have a context posture of striding and is by itself motionless.
Approach 2: analyzing stylesheet functions from the function call
Radically different from the previous approach is to analyze the function's body only when we actually know what the context is of the argument that takes a streamed node. For this approach, we simply ignore analyzing the declaration of the function on itself.
If we take the example from Code listing: streamable my:for-each failing with argument postures climbing and crawling and call it with a
striding argument, for instance
my:for-each2(*/books/book)
, the processor, upon encountering such
function call, takes the posture of the argument and sets the context posture of
$node-set
to this posture, in this case: striding. With this posture, the processor can now analyze the
function body and return a result of that analysis, not surprisingly the same as in
the previous section: a consuming sweep and a
grounded result posture (because the function
does not return any nodes, the result is grounded).
If user were to call the function with a crawling
argument, say my:for-each2(*/books//book)
(mark the extra slash), the
parameter $node-set
is set to a context posture of crawling and the resulting analysis of the whole function
body will be roaming and free-ranging. Exactly as expected. And with a climbing argument, or even a roaming argument, the result would be the same.
Doing the same analysis for the first example, Code listing: streamable my:for-each working with any argument posture, the result would be different each time, showing the advantage of this approach:
Table I: Posture and sweep of my:for-each1 function, depending on argument
Argument posture | Function posture my:for-each1 |
Resulting posture for function-call | Resulting sweep for function call |
---|---|---|---|
Grounded | Grounded | Grounded | Motionless |
Climbing | Climbing | Climbing | Motionless |
Striding | Climbing | Climbing | Motionless or consuming |
Crawling | Climbing | Climbing | Motionless or consuming |
Roaming | Roaming | Roaming | Free-ranging |
The advantage is clear, by postponing analysis of the function body until it is
called, the programmer has more flexibility in creating the function and the user
will get an error if he uses the function with an invalid argument. This is also the
disadvantage: without a clear rule how to call the function, the user may have
trouble understanding why one argument works and the other does not. The previous
table showed a function, my:for-each1
, that works with any posture as
argument input (except roaming, but that posture
will always result in an unstreamable result, whether it is a user-defined function,
a build-in function or any other construct). The following table shows the drawback
of this approach, where only upon calling the function it becomes apparent that
certain postures are disallowed:
Table II: Posture and sweep of my:for-each2 function, depending on argument
Argument posture | Function posture my:for-each2 |
Resulting posture for function-call | Resulting sweep for function call |
---|---|---|---|
Grounded | Grounded | Grounded | Motionless |
Climbing | Roaming | Roaming | Free-ranging |
Striding | Striding | Striding | Consuming |
Crawling | Roaming | Roaming | Free-ranging |
Roaming | Roaming | Roaming | Free-ranging |
As we can see, only one argument posture (bar grounded, which is always allowed), striding, is allowed with the function my:for-each2
,
and its sweep is consuming. This is similar to our
expectations, as in the previous section, we already saw that this function only
worked with a striding argument posture. But in the
previous approach, for Option 2, we considered this
the only allowed posture for function arguments to stylesheet functions, which made
it pretty clear for end users. Here, the user does not know beforehand what the
argument posture can be, in fact, he needs intricate knowledge of the function body
to find out what posture is allowed.
While this approach allows more flexibility, it also introduces more room for error, at least at the static development stage. Of course, the documentation of a function could contain information of how it should be used, but who reads documentation anyway?
Another advantage, though, is the potential that many existing functions might "just work". In fact, quite some functions that can be found online are often short and meant for a certain given input. As it turns out, if you take, for instance, a look at the functions on www.xsltfunctions.com that can take nodes as their arguments, quite some of them are streamable right out of the box if we would allow this approach.
Approach 3: analyzing stylesheet functions based on static posture assessments
If we try to combine the previous two approaches into one, and try to achieve both
static analysis of function bodies and clear information for the users of the
functions, we achieve that by decorating the functions with an extra attribute that
can take one or more postures. Let us call the attribute
argument-posture
and add it to xsl:param
, with at most
one parameter allowed to have that attribute (the challenge of multiple arguments
that can take nodes is discussed in section “Multiple arguments that can take nodes”). The value
for the attribute is set to be one or more of grounded climbing striding
crawling roaming
. The postures grounded
and roaming are redundant, because the former is
always allowed and results in a motionless sweep
and the latter is never allowed, resulting in a free-ranging sweep.
If we take the previous example, my:for-each1
and rewrite it with
this additional decorating attribute, it looks like this:
This tells the processor that the programmer declares the function to be streamable with each and every one of these postures. The processor can then statically assess the function body with each posture one by one and report back whether or not this assessment is correct or not. In this case, it would compile successfully, because all postures yield correct results.
Still, you might argue, the different input postures do not necessarily mean that
the sweeps are all the same. True, but the sweep is something that can never be
statically determined for a declaration, as it is dependent on the context of the
caller. For instance, if the function is called with a motionless but striding argument,
the resulting sweep is also motionless, because the
xsl:for-each
with a motionless
argument and its body containing a motionless
sequence constructor all together yield a motionless call to the function, even though it is striding. This is
true, for instance, for the following call:
This example shows the power of the current rules in the XSL Working Draft. The
existing rules allow a function call inside a predicate in a pattern. Because the
context item expression, (.)
is motionless and takes the context posture as its result posture,
which in this case is striding, the result of
calling the function with this argument is also motionless and has a result posture of climbing (which follows from our table above). Finally, the result
must be atomized to be able to compare it to a string. For a climbing expression that returns an attribute, atomization is
allowed and motionless.
As we can see, this third approach gives us more flexibility, but the drawback is that it requires programmers to understand what a posture is. One of the design goals of the streamability rules is that they can be applied without knowing all the internals, and bringing one of those internals, the posture of a construct, to the surface, may not be desirable.
In the next sections we dive deeper in the analysis of stylesheet functions and try to come up with a set of rules that works for different inputs, not only striding, and that works also with recursion or non-final packaged functions, subjects that have not yet been touched in this section.
In pursuit of flexible rules for guaranteed streamable stylesheet functions
The previous section introduced three possible approaches. In the following sections we will research these approaches further. In fact, the focus will be on Approach 2 above.
In the specification, under the rules on path expressions[13] and axis steps[14], a table shows what happens if the output posture of one part of the path expression is the input posture of the other part of it. Stylesheet functions are not much difference. If we take the lessons learned in that section and we reproduce that table here and adjust it slightly for stylesheet functions, we can deduct that the following rules apply:
Table III: Posture and sweep of stylesheet functions
Argument posture | Function posture | Resulting posture | Allowed sweepa |
---|---|---|---|
a) This is the highest allowed resulting sweep of the function body resulting in the given result posture, if the sweep is higher then the sweep of the function body is effectively free-ranging. b) The term "direct-transitional" is not a posture, it applies to constructs such as "$n/self::foo" or "$n", where $n is the streamed parameter. Such expressions are motionless and take the posture of their input as their output posture. c) The current rules on streamability do not allow a striding or crawling posture after a climbing or crawling posture, the result of such analysis is always roaming and free-ranging. |
|||
Grounded | Any | Grounded | Motionless |
Any | Grounded | Grounded | Sweep of function body |
Any | Direct-transitionalb | argument posture | Motionless |
Climbing | Climbing | Climbing | Motionless |
Climbing | Stridingc | Roaming | Free-ranging |
Climbing | Crawlingc | Roaming | Free-ranging |
Striding | Climbing | Climbing | Motionless |
Striding | Striding | Striding | Consuming |
Striding | Crawling | Crawling | Consuming |
Crawling | Climbing | Climbing | Motionless |
Crawling | Stridingc | Roaming | Free-ranging |
Crawling | Crawlingc | Roaming | Free-ranging |
Roaming | any | Roaming | Free-ranging |
To determine the posture of a function body, it is easiest to consider the argument
that can take nodes similar to the context item expression "."
. This
expression is motionless on itself, but if used with usage absorption it may become consuming
depending on the input posture of the expression or instruction it appears in, called
the context posture. For functions that means, it takes the
posture of the argument to that function, in other words, the context posture of the parameter that can take streamed nodes is the
posture of the argument to the function. Example:
The body of the function f:thirdchild
has a posture striding because of the child axis step. The usage of the
function is transitional, which means it can return
nodes. Using the table above, we can lookup the result posture and sweep of any call
to
the function. The first use in this example is in xsl:apply-templates
,
which we have seen before takes a usage of absorption, meaning that overlapping nodes are not allowed[15]. The operand here is a call to our function with the current node as
argument. Inside any template, by definition the context posture, and thus the posture
of the context item expression, is striding. The
table then gives us for argument is striding,
function body is striding, then result posture is
also striding and the sweep is consuming.
Looking at the second use of the function we see another
xsl:apply-templates
, this time with a crawling path expression (the descendant axis is returns a crawling
posture, it can have overlapping nodes) as argument to the function. From the table
this
gives us for argument is crawling, function body
striding the result posture of roaming and a sweep of free-ranging. Even without the table we can see from the argument that
it returns overlapping nodes and that a processor will have to potentially look back
and
forward to find the third child of each overlapping node. If there is no overlap,
this
problem would not arise, but statically we do not know that, and the processor has
to
assume the worst.
Multiple arguments that can take nodes
With streaming, it is not possible to have one construct take multiple downward select
expressions, or more generally, multiple consuming expressions. Suppose we were to
allow
a function call to take multiple arguments that are streamed nodes, this rule would
backfire. An function call such as f:equals(foo, bar)
has two downward
select expressions, foo
and bar
. Because there is no way the
processor can know in what order these appear in the input document, it is possible
that
foo
appears before bar
, or the other way around, which
means that the processor may have to look back to find the other child. This is the
general reason why multiple downward selects are not allowed in any single
construct.
We could argue that one argument can be a grounded
node and another a streamed node. But that does not help us much further, because
it
hits the same problems as with the non-streamable functions we showed earlier: how
can
we tell the user of the function to choose between fn:snapshot
and
fn:copy-of
and how can he determine what argument can take a streamed
node and what argument cannot?
Another argument to allow multiple arguments that can take nodes is that one argument can be motionless in the function body and the other is not. Together they would be consuming, or if both are motionless, together they would be motionless. But this still does not deal with the fact that to get a node into the function, that node first needs to be consumed, unless it is the context node or an ancestor of the context node. These restrictions would be very limiting and also very hard to define in spec language.
The resolution is to keep things simple. Instead of allowing multiple arguments that take nodes, streamable functions can have a maximum of one parameter that can take nodes. The other arguments must either be non-node types, or absent.
Recursive streamable functions
In Table III: Posture and sweep of stylesheet functions in section “In pursuit of flexible rules for guaranteed streamable stylesheet functions” we have seen that it is possible to write functions with any kind of result posture. The result posture is then dependent on the argument passed to the function. This works for the trivial case where the function body is simple and can be analyzed. But what happens if we try to apply those rules with a function body that is recursive?
For this section I consider four kinds of recursion[16]:
-
Direct recursion: function calls itself inside the body of the function.
-
Indirect recursion: function calls itself indirectly through another function it calls.
-
Indeterminate recursion: function applies templates and a template at some level calls the function.
-
Dynamic recursion: a dynamic function call resolves, or can resolve to a function which is currently executed.
Direct recursion
To understand the challenges when dealing with recursive functions it is enough to look at a most trivial example, which is clearly streamable and uses infinite direct recursion:
This function does not have an exit-clause for the recursion and will run indefinitely, likely causing a stackoverflow error. But it is legal XSLT and it is streamable in the sense that it no movement occurs on the streamed argument.
If we analyze this function using Table III: Posture and sweep of stylesheet functions, we run into a problem:
-
First we need to determine the posture of the argument to the function, here it is striding.
-
This sets the posture of
$n
to striding as well. The next step is to determine the posture and sweep of the function body. -
To determine that, we need to determine the posture and sweep of the contained sequence constructor, which is a single
xsl:sequence
with here one operand, the select-expression. -
The posture and sweep of
xsl:sequence
is equal to the posture and sweep of its select expression[17], heref:recur($n)
. -
We know the posture of
$n
, but to determine the posture of the function callf:recur
we need to analyze the function body again, which we already did, but without having a conclusive sweep or posture yet. We've reached an eternal analysis loop, the function is non-analyzable.
It is not trivial to resolve this dead end situation. One way forward could be to
split the function body artificially into two operands and analyze each
individually: one for the part without the recursion point[18], setting the recursive function to the empty sequence ()
,
and one for the argument to the recursion point. This will help somewhat, but feels
wrong and is hard to proof right. What would happen if a function returns streamed
nodes and gets a climbing input and the output is
of a different posture?
Looking back at Table III: Posture and sweep of stylesheet functions we can deduct that any function call with a climbing or crawling argument must yield a climbing or crawling result. The only argument that is allowed to return a different result posture is an input with a striding posture. This makes sense, because the natural process for streaming is to go through the input document depth-first, which is the striding posture and once you change direction upwards, you cannot go down again: it would make a construct roaming. To summarize, we can consider that:
-
A climbing input argument must remain climbing to be streamable, so if the input is climbing, the recursive argument must also be climbing and the body cannot contain an expression in another direction.
-
A striding input argument can change to climbing but not to crawling in recursion, because that would validate the next point. If it changes to climbing the whole function body must be motionless, because you cannot both climb and consume, and if it stays striding the whole function body must be striding and can be either motionless or consuming.
-
A crawling input argument can only take motionless expressions, it cannot turn magically back into striding[19] nor is it allowed to have a crawling expression operate on another crawling expression.
-
Anything roaming will always stay roaming.
The following table shows what this means for different argument postures of calls to a function, what the recursive argument (the one at the recursion point) can be and the result posture of the recursive call is and what the posture of the function body must be (it must be the same) and what the sweep is of the recursive call. The last column stresses the maximum sweep of the function body with the given arguments. Note that all these analyses are in the context of the initial function call, not the recursive call, which makes it possible to allow one function body to take different postures as input:
Table IV: Posture and sweep of recursive stylesheet function calls
Argument posture | Recursive argument posture | Recursive function body & call posturea | Recursive function call sweep | Maximum function body sweep |
---|---|---|---|---|
a) The function body must match the recursive call posture or be grounded, otherwise the posture is roaming. b) In fact, if the argument is grounded, the recursive argument will also be grounded. c) Because on a next iteration, the argument will be crawling, not striding anymore, it would be a crawling expr. on a crawling expr. which is not guaranteed streamable. |
||||
Grounded | Any (grounded)b | Groundedb | Motionless | Motionless |
any | Direct-transitional | Posture of argument | Motionless | Consuming |
Climbing | Climbing | Climbing | Motionless | Motionless |
Climbing | Non-climbing | Roaming | Free-ranging | n/a |
Striding | Climbing | Climbing | Motionless | Motionless |
Striding | Striding | Striding | Consuming | Consuming |
Striding | Crawling | Roamingc | Free-rangingc | n/a |
Crawling | Climbing | Climbing | Motionless | Motionless |
Crawling | Non-climbing | Roaming | Free-ranging | n/a |
Roaming | any | Roaming | Free-ranging | n/a |
For this table to work in practice, we can allow at most one recursion point, or each recursion point must have the same posture and at most one can be consuming. Once you have found the posture and sweeps of the recursive function call, you can continue the analysis of the function body and then apply the previous Table III: Posture and sweep of stylesheet functions.
The rules in this table are laid out in such a way that it is not possible to have a climbing expression as the argument to a recursive function at a recursion point and somewhere else in the same function body a consuming construct or a non-climbing result. This is necessary, because if we would allow the combination of a climbing recursion point and a striding function body, then on either the initial call (with a climbing argument) or the recursive call (if the initial call would have been striding for instance) would call the function with an incompatible climbing posture, which would then be consumed. And a climbing posture can never be consumed. More specifically, it prevents a function like the following to be considered streamable:
Glancing over the example might already give a hint as to why it cannot possibly
be streamable: the xsl:value-of
can consume a streamed node and that
streamed node is likely to be at a climbing axis because of the climbing posture of
the recursive argument. If we take the item in the table into account and consider
calling this function with a striding argument, as with f:climbstr(a/b)
then we find in the entry for striding and
climbing that the body of the function must
at most be motionless. It is consuming, because of the xsl:value-of
so
the analysis fails and the function is roaming
and free-ranging.
If we go back to our original example (see Code listing: trivial direct recursion) and apply
the rules of this table, we have one recursion point, f:recur($n)
, the
argument is direct-transitional, that is, it is
motionless and takes the posture of its caller, in this case the caller of the
function. According to the table above, we can use any posture as input, which will
then become the result posture, and the sweep of the function body follows as
motionless and grounded. In our case, the call has a child-select expression in
f:recur(y)
, which has posture striding and sweep consuming. The
result posture and sweep of the whole template is then grounded and consuming.
Indirect recursion
With indirect recursion there are extra layers of calls before the recursion takes place, as in A calls B calls C calls A again. The question that rises is whether the rules in section “Direct recursion” also hold if a function calls itself indirectly. Indirect recursion is a type of recursion that can also be determined statically. Let us look at an example:
The example is a very verbose and clumsy way of getting all attributes of all
ancestor-or-self nodes that have a certain name. For instance, with the
following input and stylesheet, the output will be 3.14 1.64
:
But whether it is clumsy or not is not the matter. It serves as a basic example of indirect recursion and the function is clearly streamable because it only traverses the ancestor and attribute axes, which are motionless. However, the question is, of course, whether this example is streamable by following the rules from Table IV: Posture and sweep of recursive stylesheet function calls.
If we try that a first time, we quickly find that it is not immediately obvious
how to apply it, because the functions above are not directly recursive. Recall that
we said in section “Direct recursion” we wrote we can allow at most
one recursion point, or each recursion point must have the same posture and at
most one can be consuming
. There are two recursion points here, the one
inside f:filterattr
that calls f:take-ancestor
and the one
inside f:take-ancestor
that calls f:filterattr
. The
posture of the first one takes the input posture of the argument, it is Direct-transitional, the second one traverses the
ancestor axis, which gives it a climbing posture.
Finally, the argument inside the template example in Code listing: indirect recursive function call example
has itself a posture of striding, because the
context item expression takes the context posture
which in the case of xsl:template
is striding.
In short, if we start at the template, we start with posture striding, then striding again
(because it is direct-transitional), then climbing and on the next recursive iteration this
repeats, but now from climbing as input to
f:filter-attr
, then climbing again
(because direct-transitional takes over the posture
of the argument) and so it continues climbing until all recursive iterations have
finished.
This breaks the rule we set to ourselves: the posture changed from striding to climbing.
Yet, we also established that the example functions were streamable. The conflict
arises from the argument $n
being direct-transitional, which works like a cameleon and changes the
posture along the way. One resolution to this specific situation is to allow
direct-transitional, provided the rest of the
body of the function is motionless, in which case it does no harm. Another, perhaps
easier solution is to force the same posture in, same posture
out rule and rewrite the argument to this function as follows:
The change to using the ancestor-or-self axis also changes the initial posture to
climbing, which in turn changes the posture of
$n
to be climbing as well, making
all postures in all recursion points climbing.
While this relatively trivial example seemed to show an edge-case because it used a recursive argument posture of direct-transitional, it taught us two things: one, even a trivial example can quickly appear hard once we try to apply streamability rules on them and two, to keep things simple, it is best to stick a set of ground rules, even though that potentially means that we slightly limit certain use-cases.
In practice, indirect recursion is found pretty rarely in XSLT programming, even though XSLT is itself a functional language, it is much more common to use direct recursion. But if a scenario requires indirect recursion, it is possible, even with streaming.
Indeterminate recursion
Another challenge comes from indeterminate recursion as in the following example:
In this situation, the analysis of the function declaration and the template
declaration can be done independently, without ever finding the recursion point.
However, when this code is run, the recursion kicks in (assuming the argument is an
element). There is, however, one big advantage to this type of recursion: the rules
on
templates define that the result of a template declaration must be grounded[20]. Therefore, by definition, the result of any
xsl:apply-templates
is also grounded, which gives the function body above a grounded posture as well. In turn, the call to f:indeter
results in a grounded posture, which concludes the
circle. In addition, any select-expression in xsl:apply-templates
must be
atomizable, which in streamability terms means that it cannot be climbing or crawling[21]. This leads us to a temporary conclusion: if the arguments to a function are
striding they can be used in indeterminate
recursive functions. If the arguments are not striding, the analysis fails and makes the call on the function
roaming and free-ranging.
Following Table III: Posture and sweep of stylesheet functions and taking the same approach that the body of the function should be analyzed during the call to the function, we can conclude that it is impossible for a function body to apply templates on something else then a striding posture. The one exception that is allowed in the General Streamability Rules[22]is for a climbing posture of childless nodes, which only applies to the attribute axis. This means that we can consider the following example streamable as well, even though we pass a climbing, or even a crawling expression to the function:
Whether or not that function will be called indeterminate-recursively or not is irrelevant, because on the second iteration, the selection will be empty, because attribute nodes do not have children.
In short: indeterminate recursion does not have influence on the streamability analysis, nor does it break it. The existing rules suffice.
Dynamic recursion
Dynamic recursion looks a bit like indeterminate recursion in the sense that whether or not recursion takes place cannot be determined statically. In XPath 3.0, it is possible to bind a variable to a function[23], this is commonly referred to as higher order functions. The concept is not new, already in 2001, Dimitre Novatchev showed in his paper Novatchev 2001 that writing higher order functions is possible in XSLT 1.0 by means of what he calls template references[24], a concept also applied to the functional coding patterns in Mangano 2005, which was further popularized as the Novatchev technique in Kay 2008. It took until XSLT and XPath 3.0 that function items became first-class citizens, see my own paper Braaksma 2013 for an extended coverage of that subject for XSLT and Grust 2013 for a coverage for XQuery and databases.
For the rest of this section, we will assume you have a basic understanding of higher order functions in XSLT and XPath. The closure of higher order functions cannot contain the context item, so for our analysis we should only worry about arguments that can be streamed nodes, just like with regular functions.
The nature of a dynamic function call is that the function it refers to is only known at runtime, when the variable is bound to a function item. As a result, there is not much we can say of the posture and sweep of the function it refers to until it is actually invoked. Following the rules on streamability, the result of binding a variable to a function item is always grounded. This makes sense, because at the moment of binding, a function item is assigned to a variable and no streamed nodes are possibly involved in the result.
Once a function item is evaluated, its arguments are known. Provided that the arguments themselves are not roaming and free-ranging, the analysis will thenceforth look at the bound function. If this function is streamable, we can do the analysis from the previous sections as if it was a direct function call. The biggest difference lies in the fact that the analysis is done during execution. This means that an error for whether a call is not streamable will be a dynamic error.
This is no different for dynamic recursion. If dynamic evaluation leads to a recursive call, the posture of the argument, the recursive call and the function body must be matched to Table IV: Posture and sweep of recursive stylesheet function calls. If the result is roaming and free-ranging, the recursive call is not guaranteed streamable.
While it is theoretically possible to allow such dynamic streamability rules, the rules in the specification are meant to guarantee streamability statically. As a result, any potential dynamic streamability is prohibited. Processors may allow dynamic streamability as a vendor extension, but the specification will consider passing a node to a dynamic function call not guaranteed streamable, dynamic recursion, as suggested in this section, is therefor not possible with the current set of rules on static guaranteed streamability.
Analyzable and non-analyzable functions
Analyzable stylesheet functions are functions that can be analyzed by themselves, as we saw in . Non-analyzable stylesheet functions are those that can only be analyzed from the function call, because the body of the function on itself cannot be analyzed without knowing what the argument is. Most of the previous sections was about non-analyzable function calls.
Spec-wise, the term non-analyzable refers to the set of functions that cannot be analyzed statically and are given very conservative assumptions on streamability. This is true for recursive stylesheet functions and non-final streamable functions (see section “Non-final streamable functions in packages”). The term non-analyzable does not necessarily mean that the function is not analyzable, however, the general rule is that it requires a call, or knowledge of the context of a call to the function to find out whether or not the function is streamable. We have seen in the previous sections that it is possible to allow quite a variety of postures, even for non-analyzable functions, but in the specification such functions are limited to a posture of striding and are always considered consuming even if in fact they are not.
Non-final streamable functions in packages
In XSLT 3.0, a new feature, packages allows
programmers to create a collection of modes, functions, named and matching templates,
accumulators etc and pack them in a, potentially pre-compilable, package, which can then be included in other packages or stylesheets
with xsl:use-package
[25]. Components in that package can have a visibility attribute that can be set to public, private, final or abstract. If the visibility is
set to private it can only be used in the local
package, and if it is set to final the declaration
cannot be overridden, but can be used also in other packages. Visibility abstract means there is no implementation yet, and it must be
implemented by a using package (one that includes the package). And public, the default, means that a declaration can be
overridden by using xsl:override
, which will then take precedence over the
original implementation. This can be considered similar to virtual
methods in object-oriented languages like C++ and C#.
The result for determining whether a function can be guaranteed streamable or not cannot be determined statically at the moment that a package is compiled, because it is possible that it will later be overridden and any calls to that function can be impacted if the implementation, and the streamability, changes.
As briefly explained in the previous section, if a function is defined with visibility abstract or public, in other words, if they are non-final and non-private, then they are considered non-analyzable by the specification. This means that the processor cannot, and does not have to, try to analyze function calls too deeply: it only expects a striding posture and it will not try to determine whether or not the function is consuming or motionless when you call the function: it assumes the worst and will consider the call consuming.
A proposal from me in the bug entry [26]awaits reaction from the working group. It shows a way that processors can
use to statically detemine the posture of non-final streamable functions. However,
if
that proposal were excepted, it will pose limitations on overriding non-final streamable
stylesheet functions. I think, however, that not being able to override a function,
and
not being able to write non-final functions, is too severe a limitation. Even if a
function is non-final, if you override it to make it final, you will have to rewrite
the
whole function body, you cannot use xsl:original()
, because that would call
into the non-final function again, which takes too much of conservative approach.
An alternative approach is the approach taken for the recursive functions in the previous sections. For all analysis there, we always considered the function call itself to get the whole picture. For functions that are non-final, such an approach can also work. Because the processor only knows what function to call when it encounters the function call itself, it should include the body of that function at that moment into the analysis. This can be done statically, there is no need to first run the stylesheet.
A potential problem, however, still arises if someone writes a package function that
works correctly with a climbing result posture, but
cannot be used if that is changed into a function override with a striding result posture. Even more, a motionless function could be changed into a consuming function, which will greatly impede existing calls to that
function. One way to remedy that is to disallow overriding functions with a different
result posture or with a different sweep. That means that any given call will need
to be
analyzed against the new function and the xsl:original()
function call, as
if it still existed in scope, otherwise it will not be possible for the processor
to
determine whether there is a change in posture or sweep.
Whether or not this turns out to be a feasible option in practice remains to be seen. After all, there is always a trade-off between usability of a feature and the complexity of the rules or the complexity for implementations. If the rules become too complex for anybody to understand, then there is littly chance it will be used in practice, and there will be little incentive for existing implementors to implement such feature, let alone write specification-tests for it.
An improvement: type-determined posture
In the past few sections we have seen that analyzing a function that can return nodes
can be quite hard and intricate. However, if you know what type your function will
return, you can use the as
-attribute on the xsl:function
declaration. This will tell the processor that the result must be converted into the
type defined in that attribute.
In cases where the type is not a node, the processor can go much further with static
analysis than in situations where the type is not known, because if the type is not
known, the processor must assume the worst. Suppose you declare the function with
as="xs:integer"
, there is no way your function can return nodes. For
recursion this is a big help, because the processor now knows the result of the
recursion will not contain nodes. Whether or not there is overlap in the argument
expression is now less relevant and the posture of any function call, even the recursive
ones, will always be grounded. This makes the
requirement to match the argument with the result posture go away and simplifies both
writing and analysis.
Let us look at an example, adopted from an example by Michael Sperberg-McQueen:
The example takes a node, processes its parents recursively and returns the names
in
reverse order (deepest first) separated by slashes. Because the function is declared
to
return a grounded result, namely
xs:string
, there is no need to assume the function can return nodes,
because it cannot. Through Table IV: Posture and sweep of recursive stylesheet function calls we find
that if the argument posture is striding and the
recursive argument posture is climbing, as is the
case here, the recursive function call is considered climbing as well and the function body must then result in either a
climbing or grounded posture.
In this case, the function call itself does not have to be set to be climbing, because we know from the xs:string
result type that the result will always be grounded.
This simplifies the analysis in that we can now consider the function argument to
be of
either absorption or inspection usage. If the function body as a whole consumes the node
referenced by $n
it will be absorption and
the analysis fails, because a climbing expression with absorption usage is not allowed. Because the only other operands on the
streamed argument are fn:count
and fn:name
, both having
inspection operand usage and both being motionless, the result of the analysis as
a
whole is that the function is guaranteed streamable, does not consume the streamed
argument which we can summarize as having usage inspection, and while it is direct-recursive we know it can be called
with any argument, albeit climbing, striding or crawling.
We could expand the type-determined analysis further by including whether or not the streamable argument has a cardinality of zero-or-one or has a cardinality of zero-or-more. In the case of zero-or-one there is no way that overlapping nodes can be passed on to the function, which can simplify analysis of a function call with a crawling argument, because it will fail dynamically if the crawling argument returns more than one node. This special-case scenario has meanwhile been tackled and been generalized for any situation and cardinality[27], which can also by applied to stylesheet function call arguments.
Miscelleneous
Inline functions and streamability
Inline functions are a new XPath 3.0 capability[28] that allows you to write an anonymous function inside an XPath
expression. The definition of inline function expressions prohibits referencing the
context item, which makes it not directly possible to refer to the context. However,
the closure of an inline function contains all local variable bindings, which makes
it possible to refer to a variable outside of the inline function, which itself
could be bound to a streamed node. The streamability rules for
xsl:variable
and xsl:param
prohibit referencing a
streamed node, but one exception is inside a streamable stylesheet function, where
it is possible to have a parameter bound to a streamed node.
There are two solutions possible. One is to come up with complex rules for inline functions referencing the streamed parameter, the other is to simply prohibit using a streamed parameter inside an inline function body. This in itself would not be consistent with the definition in XPath, which states that all local variable bindings are available in the dynamic context of the inline function, which leads us to a third option: simply disallow inline function expressions within the body of a streamable stylesheet function.
This third option would be in line with the current streamability rules for inline function declarations[29], which states that they are, by definition, grounded and motionless, simply because it is not possible to pass a streamed node into the closure of the inline function body.
Partial function application and streamability
Partial function application is also a new XPath 3.0 capability by writing down a
question mark as a placeholder for an argument[30], and apply the other arguments already. The result is a function item
with less arguments than the original and some arguments already filled in. An
example is the expression index-of(?, ?, "http://my-collation")
, which
returns a function item that presets the collation argument to
http://my-collation
, resulting in a two-argument function that
always uses the same collation.
The current rules are under the rules on function calls[31] state that, unless the function is focus-dependent, that the general streamability rules apply, which means that it is treated as a normal function call, but the placeholder arguments are ignored.
If we take into consideration that binding a streamed node to a variable and passing it around is prohibited, a similar rule should be applied to partial function application: if the function is a streamable stylesheet function, the argument that can take a streamed node, must be the placeholder, or a grounded node must be supplied, otherwise it would become a roaming and free-ranging expression.
In other words, partial function application works with user-defined streamable stylesheet functions, as long as you do not try to bind a streaming node from the current context to it. You should do that at a later stage, when you actually call the function.
Named function references and streamability
Named function references are another XPath 3.0 feature[32]: you can create a function item of an existing function by using the
syntax FunctionName#ArgCount
, where ArgCount
is a literal
giving the arity of the function. For instance, count#0
will give a
reference to the zero-argument function of count
and
my:filter#3
will give a reference to a three-argument function
my:filter
.
The function item returned by such an expression can be called as a normal function by applying parentheses and arguments, which in turn will call the function you referred to to begin with. In practice, this syntax is useful for binding variables to existing named functions.
Creating a named function reference is an atomic action and does not involve references to nodes. As a result, a named function reference itself is always grounded. There is one exception, if you try to create a reference to a function that is focus-dependent. In such cases, just like in previous rules, the result is roaming and free-ranging. This is in fact the current rule for streamability of named function references in the specification[33].
Since stylesheet functions by definition cannot be focus-dependent, this exception does not apply to named function references that refer to a (streamable) stylesheet function.
Dynamic function calls and streamability
A dynamic function call, also a new XPath 3.0 feature[34], is a call to a function item that is bound to a variable. Suppose you
have <xsl:variable name="fref" select=" 'name#1' " />
, which binds
the one-argument version of the name-function to $fref
, then you can
call that function by adding parentheses and arguments, the same way you would have
done if the variable were the actual function:
$fref(child::*[1])
.
Since it is not possible to know at runtime what function the variable is bound to, analysis can only take place once the function is actually called. The current rules[35] state that all arguments have operand usage navigation, which means that you can only call a dynamic function when you actually create a grounded copy of a streamed node. In case you wanted to apply it to a streamed node you are out of luck.
The rules make an exception in case the signature of the function item is known, in which case type-determined usage, as in part explained in section “An improvement: type-determined posture”, can be used. That means that, if an argument is declared as a non-node type, the usage typically becomes absorption.
Streamability of stylesheet functions according to the specification
The current approach taken by the rules in Bug 25679 is a pessimistic one. For stylesheet functions to return nodes, they must be striding and are allowed to be either motionless or consuming. As a further limitation, which actually follows from forcing the striding posture on function bodies, is that the argument to the function must itself also be striding. In fact, it is not possible, by the current definition, to have a streamed node with crawling or climbing posture act as an argument to a streamable stylesheet function.
By limiting the result posture of the stylesheet function to one allowed posture, it is easier to write rules for both analyzable and non-analyzable stylesheet functions, see section “Analyzable and non-analyzable functions”. However, because the bug entry is quite fresh and the discussion is still ongoing, there is little conclusive I can tell about the rules that will eventually make it into the next public version of the specification[36]. For instance, in Comment#1 of the same public bug entry I have proposed some of the rules from Table III: Posture and sweep of stylesheet functions. Whether or not any or all of these suggestions will make it into the specification will remain to be seen.
Even if the specification will only allow relatively pessimistic stylesheet functions, vendors are still allowed to use wider rules on streamability. In the case of functions, Exselt will allow any function that is streamable, as explained in the previous section, by using an optimistic operational mode at user option.
The streamable
attribute on xsl:function
In none of the previous sections in this paper have I mentioned the
streamable
attribute on an xsl:function
declaration.
Since most of the paper discussed streamability of a stylesheet function in the
context of the function call, whether or not a function is streamable then depends
on its implementation and there is no need for the processor to know beforehand what
functions to analyze and what not.
The streamability rules in the specification are made such that the processor must
statically determine whether or not a function declaration is streamable or not.
Since it is likely that any given stylesheet will have both streamable and
non-streamable stylesheet functions, it is necessary to tell the processor which are
and which are not streamable. For that, the xsl:function
declaration
gets a new attribute, similar to the attribute of xsl:accumulator
and
xsl:mode
, which determines whether or not a function should be
analyzed for streamability.
The attribute takes a value of yes
or no
. If the value
is yes
, then the function must be guaranteed
streamable and the function can be called with a streamable node as
an argument.
I think that such an attribute is not necessary if we take the approach in this paper and analyze the entire function only upon the actual function call. If that function call is inside a streamable context and a streamable node is passed as an argument, the body of the function is analyzed with the context posture of the streamable argument set as the posture of the streamable parameter reference.
Status of current processors with respect to streamable stylesheet functions
Since streamable stylesheet functions are a relative recent addition to the specification and lots of it is not yet publicly available, there are currently no processors available that fully support streamable stylesheet functions.
At the time of this writing, of the two streaming processors that I know of, Exselt and Saxon, the former currently has a full streamability analysis for all constructs but a limited ability to analyze stylesheet functions for streamability. However, the available set of tests is still growing and the rules on streamability may change between now and the actual presentation of this paper, or even afterwards, since the specification is not final yet and work on streamable stylesheet functions may change the current rules. Apart from analyzing a function for streamability, Exselt chooses an optimistic streaming approach: if you write your stylesheet in a correct way with forward expressions only, your stylesheet will process streaming input in a streamable way.
For Saxon, I do not know the actual current status on streamable stylesheet functions, but I do know that they plan to do it in the not-so-far future.
If you are interested in streaming, or more specifically in streamable stylesheet functions, keep an eye out on the website of Exselt and Saxon as it is likely that in the near future, both processors will support this feature.
Conclusion
The previous sections have shown that writing streamable stylesheet functions is not that hard, as long as you do not require recursion, or want to write overridable non-final functions. But if you do require recursion, the simplest rule to remember is same posture in, same posture out. As long as you stick to that rule, and forget about the exceptional cases, you are in safe streaming waters.
We have also seen that the current resolution in the specification taken to allow streamable stylesheet functions is far more limiting, allowing only a posture of striding for the input arguments and in the case of recursive functions or non-final functions, always concludes that a call to such a function is consuming, even if it is not. This conservative approach has a valid reason, though: it tries to keep things simple and the rules understandable and implementable. But even with this limitations in mind, it gives stylesheet authors quite a wide range of possibilities to write streamable stylesheet functions.
This paper has shown how a relative small change to the specification opens up the way for stylesheet and library package writers to write streamable stylesheet functions, which can be used in both streamable and non-streamable scenarios alike[37]. This change was an important and vital one and has brought us very close to making the streamability rules as mature as they can be and ready for widespread use in stylesheets and packages. Hopefully this paper has given potential library writers a strong hint that they should take the extra step to write their library packages with streamability in mind to be useful for both the streaming and non-streaming markets.
References
[Braaksma 2013]
Braaksma, Abel. Efficient XML processing with XSLT 3.0 and higher order functions
: pp 23-40.
Presented at XML Prague 2013, a conference on markup languages and data on the web, Prague, Czechia, Feb 8-10, 2013.
In XML Prague Proceedings 2013.
http://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf.
Web.
[Braaksma 2014a]
Braaksma, Abel.
Streaming for the masses, an introduction to streaming with XSLT
: pp 29-80.
Presented at XML Prague 2014, a conference on markup languages and data on the web, Prague, Czechia, Feb 14-16, 2014.
In XML Prague Proceedings 2014.
http://archive.xmlprague.cz/2014/files/xmlprague-2014-proceedings.pdf.
Web.
[Braaksma 2014b]
Braaksma, Abel.
Streaming Design Patterns or: How I Learned to Stop Worrying and Love the
Stream
: pp 24-53.
Presented at XML London 2014, UK XML Conference, London, United Kingdom, June 7-8, 2014.
In XML London Proceedings 2014.
http://xmllondon.com/2014/xmllondon-2014-proceedings.pdf. Web.
doi:https://doi.org/10.14337/xmllondon14.braaksma01
[Grust 2013]
Grust, Torsten and Ulrich, Alexander.
First-Class Functions for First-Order Database Engines
Presented at International Symposium on Database Programming Languages.
In Proceedings of the 14th International Symposium on Database Programming Languages
(DBPL 2013), Trento, Italy.
arXiv:1308.0158:
http://arxiv.org/pdf/1308.0158v1.
[XPath and XQuery F&O 3.0]
Kay, Michael.
XPath and XQuery Functions and Operators 3.0 Recommendation
8 April 2014. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xpath-functions-30/.
Web.
[Gerstbach 2006]
Gerstbach, Peter.
Generating Structured Documents to Create Reports by Integrating Data from CMS/DMS
and EAI Systems
: pp 27-28.
Master thesis. At: Softwaretechnik und Interaktive Systeme der Technischen Universit¨at
Wien. May 2006.
http://www.gerstbach.at/2006/thesis/. Web.
[Kay 2008]
Kay, Michael.
XSLT 2.0 and XPath 2.0 Programmer's Reference
2nd edition, 5 May 2008: pp 251+. Published by O'Reilly Media. ISBN: 0470192747. Print.
[Lenz 2005]
Lenz, Evan.
XSLT 1.0 Pocket Reference (Pocket Reference)
.
1st edition, 19 August 2005: pp 23-40. Published by O'Reilly Media. ISBN: 0596100086.
Chapter 3 (pages 23-40) is available online: http://lenzconsulting.com/how-xslt-works/.
Print.
[Mangano 2005]
Mangano, San.
XSLT Cookbook, 2nd Edition, Solutions and Examples for XML and XSLT Developers
2nd edition, 21 December 2005: pp 686-732. Published by O'Reilly Media. ISBN: 0596009747.
Print.
[Novatchev 2001]
Novatchev, Dimitre.
The Functional Programming Language XSLT — A proof through examples
November 2001.
Seen at: Sommer Semester 2010 course material University of Magdeburg 2010
http://edu.cs.uni-magdeburg.de/EC/lehre/sommersemester-2010/funktionale-programmierung/uebungen/gruppe-2/aufgabenblatt-12/XSLTasFP.pdf.
Online reference:
http://fxsl.sourceforge.net/articles/FuncProg/Functional%20Programming.html. Print.
[Tennison 2001]
Tennison, Jeni.
Rescuing XSLT from Niche Status.
.
17 February 2001.
MulberryTech XSL Mailing List Archive.
http://www.biglist.com/lists/xsl-list/archives/200102/msg01143.html. Web.
[W3C BugZilla]
Various contributors.
Bugzilla - Public W3C Bug / Issue tracking system
1997 - 2014. W3 Consortium.
https://www.w3.org/Bugs/Public/.
Web.
[Welker 2008]
Welker, Eddie.
Advantages of push-style XSLT over pull-style
25 November 2008.
http://www.eddiewelker.com/2008/11/25/push-style-xslt-vs-pull-style/.
Web.
[XQuery and XPath Data Model 3.0]
Walsh, Norman; Berglund, Anders; Snelson, John.
XQuery and XPath Data Model 3.0 Recommendation
8 April 2014. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xpath-datamodel-30/.
Web.
[XPath 3.0]
Robie, Jonathan; Chamberlin, Don; Dyck, Michael; Snelson, John.
XML Path Language (XPath) 3.0 Recommendation
8 April 2014. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xpath-30/.
Web.
[XProc]
Walsh, Norman; Milowski, Alex; Thompson, Henry S.
XProc: An XML Pipeline Language
11 May 2010. W3 Consortium. W3C Recommendation.
http://www.w3.org/TR/xproc/.
Web.
[XSLT Latest Version]
Kay, Michael.
XSL Transformations (XSLT) Version 3.0, Latest Version
Undated. W3 Consortium. W3C Working Draft / Last Call Working Draft / Candidate Recommendation
/ Proposed Recommendation / Recommendation.
http://www.w3.org/TR/xslt-30/.
Web.
[XSLT Last Call WD]
Kay, Michael.
XSL Transformations (XSLT) Version 3.0 W3C Last Call Working Draft
12 December 2013. W3 Consortium. Last Call Working Draft.
http://www.w3.org/TR/2013/WD-xslt-30-20130201/.
Web.
[1] The ancestor-or-self axis is available during streaming and requesting properties on nodes on that axis is allowed. However, it is not possible to navigate away from that axis, doing so would result in a free-ranging and roaming expression.
[2] See section 19 of the XSL 3.0 specification.
[4] See, for instance, StackOverflow question 2835567 on templates and recursion.
[5] See, for instance, his answer in question 16631213 at StackOverflow, which uses recursion on element nodes, it's an example where functions are arguably a better choice than template matching.
[6] See section 5.6 Patterns, specifically the part on predicate patterns in the XSLT 3.0 Working Draft.
[7] Before user-defined stylesheet functions were allowed to be streamable and
take on streamed nodes, the rules were as in the current Working Draft:
For a call to a stylesheet function, the general streamability rules
apply. There is one operand role for each argument in the function
signature, and its operand usage is the type-determined usage based on the
declared type of that argument.
, see section 19.8.7.12 Streamability of Function Calls in XSLT 3.0 Working Draft.
Please note that this and other references to the WD will change when the next
version of the specification comes out.
[8] See section 19.8.1. in XSL Transformations 3.0.
[9] See section 19.8.4.17 Streamability of xsl:for-each
in the XSLT
3.0 Working Draft, specifically the rule The posture of the
instruction is the posture of the contained sequence constructor,
assessed with the context posture and context item type set to the
posture and type of the select expression.
.
[10] That the attribute axis in any context results in a posture of climbing and a sweep of motionless follows from the rules in section 19.8.1 General Rules for Streamability, rule 2.e and the rules on the axis steps itself, which can be found in 19.8.7.7 Streamability of Axis Steps.
[11] The rule that a child axis is not guaranteed streamable if applied on an axis other than an axis with striding posture follows from 19.8.7.7 Streamability of Axis Steps in the XSLT Working Draft, in the table under item 6 in the list.
[12] A higher-order operand is an
operand that can change focus, such as xsl:for-each
or xsl:for-each-group
, see the definition of higher-order
operand in the XSLT 3.0 Working
Draft.
[13] See 19.8.7.6 Streamability of Axis Steps in XSL Transformations 3.0.
[14] See 19.8.7.7 Streamability of Axis Steps in XSL Transformations 3.0.
[15] A public bug entry Bug 25185 tries to resolve this by allowing limited buffering of crawling postures in a absorption contexts such as these, if that bug is accepted and resolved, overlapping nodes as with crawling posture is allowed with usage absorption, at least in the general case.
[16] The XSL Transformations 3.0 Working Draft does not use this terminology, but it helps differentiate different situations for this discussion.
[18] I use the term recursion point for the place in the code where the recursion is invoked, but this is not a term that is used in the specification.
[19] See the table under 19.8.7..7 Streamability of Axis Steps in XSL Transformations 3.0.
[20] See section 6.6.3, Streamable Templates in XSL Transformations 3.0, second
bullet-point: The sequence constructor contained in the body of the
xsl:template element is grounded
.
[21] One exception is allowed for selections that only select childless nodes, such as attribute nodes. These are climbing but do not contain children and are henceforth allowed.
[22] See Section 19.8.1 General Streamability Rules in XSL Transformations 3.0, item 1.b.iii.A.I.
[23] See section 3.1.5.1 Evaluating Static and Dynamic Function Calls in the XPath 3.0 Recommendation.
[24] See page 4 of his paper Novatchev 2001.
[25] See section 3.6.1 Dependencies between Packages in XSLT 3.0 Working Draft.
[26] See Bug 25679 in W3C's BugZilla, which discusses the implications of allowing streamable stylesheet functions.
[27] See the resolution to Public XSLT Bug 25185 on how crawling posture is now allowed in atomizing contexts.
[28] See section 3.1.7 Inline Function Expressions in the XPath 3.0 Recommendation.
[29] See section 19.8.7.14 Streamability of Inline Function Declarations in the XSLT 3.0 Working Draft.
[30] See section 3.1.5 Static Function Calls, definition for Partial Function Application in the XPath 3.0 Recommendation.
[31] See section 19.8.7.12 Streamability of Function Calls in the XSLT 3.0 Working Draft.
[32] See section 3.1.6 Named Function References in the XPath 3.0 Recommendation.
[33] See section 19.8.7.13 Streamability of Named Function References in the XSLT 3.0 Working Draft.
[34] See section 3.2.2 Dynamic Function Call in the XPath 3.0 Recommendation.
[35] See section 19.8.7.9 Streamability of Dynamic Function Calls in the XSLT 3.0 Working Draft.
[36] The next version will probably be a Candidate Recommendation, considering that the current status is Last Call Working Draft, meaning that the specification is ready for implementors and only bugs found that come from implementation issues will be fixed. No new features will be added.
[37] Any streamable construct will work exactly the same in a non-streaming scenario. Functions or modes written with streaming in mind can be used with both streaming and non-streaming input without alterations.