Micropipelining in context
Pipelining may play a role in XSLT development on at least three levels: 1) internally,
within the XSLT processor; 2) between or across stylesheets; and 3) within a single
stylesheet. We describe these below by way of providing a broad context and overview,
but the
focus of this report is on only the last of the three, on pipelining within a single
XSLT
stylesheet, which we call micropipelining.
1. Pipelining within the XSLT processor
To the typical XSLT developer, pipelining within the XSLT processor is a black box.
Not
only can optimizations within an XSLT processor transform XSLT instructions into internal
operations that achieve the same results by following different execution logic, but
the
largely declarative XSLT programming paradigm means that the order employed by the
processor
to execute parts of a transformation is not entirely under the control of the XSLT
developer. Among other things, these properties enable an XSLT processor to improve
performance by executing operations in parallel, a feature with consequences that
sometimes
surprise XSLT developers who may be used to thinking within an imperative paradigm.
For
example, many new XSLT programmers have tried at some point to profile a stylesheet
by
inserting the current-dateTime()
or current-time()
function before
and after an operation of interest, only to be surprised when the operation appears
to take
no time at all! This happens because there is no promise in the XSLT processing model
that
functions will be executed in the order in which they appear in the XSLT code, and
for that
reason XSLT was designed to follow a functional programming paradigm here:
In XSLT 2.0 it is defined that multiple calls on current-dateTime()
and
the other two [time and date] functions will return the same result every time they
are
called within a single transformation. This means that you can’t call the function
at
the beginning and end of the transformation to measure the elapsed time. The reason
for
this rule is that XSLT is rather purist about being a strictly functional language,
and
in a strictly functional language, calling the same function twice with the same
arguments always returns the same result. This property makes life much simpler for
optimizers. [Kay 2009, p. 738]
The functional nature of XSLT means that although an XSLT developer might write code
that is amenable to pipelined execution, how the pipeline is executed is at least
partially
dependent on processor-internal code to which the XSLT developer has no direct access.
Michael Kay’s alliteratively titled You pull, I’ll push: on the polarity of
pipelines
, cited above, continues that:
For performance, the stages of a pipeline should communicate by means of streams of
events. Traditionally, this can be done either by writing components with pull polarity
(the consumer calls the supplier when data is required), or by components with push
polarity (the supplier calls the consumer when data is available). Performance problems arise when components with different polarity are mixed within
the same pipeline. [Kay 2009, emphasis added]
As important as this polarity is for performance, to the extent that it is determined
within the XSLT processor, rather than within the XSLT code, it is at least partially
outside the control of the XSLT developer. For that reason, in this XSLT-developer-focused
report, pipelining within the XSLT processor will not be discussed further.
2. Pipelining between and across XSLT stylesheets
Pipelining between or across XSLT stylesheets is much more under the control of the
XSLT
developer. A complex transformation might be broken down into simpler stages, where
each
stage is implemented in a separate XSLT stylesheet, and interim results are written
to disk
by one process and then read from disk by another. This strategy may make the individual
stylesheets easier to read because each may implement only a small and coherent aspect
of
the overall transformation. This approach also makes it easy to examine the intermediate
output during development, and to develop each step of the pipeline without distraction
by
what precedes or follows. But disk I/O is much slower than in-memory access, and the
cost of
writing the intermediate files is likely to compromise the efficiency of the transformation.
Saving intermediate stages to disk and processing them as separate transformations
also
entails a cost in re-launching the transformation engine; for example, running the
Java
implementation of Saxon has to create a new Java Virtual Machine (JVM) and then start
a new
Saxon process. Whether JVM or I/O or process initialization or other overhead deserves
consideration is likely to depend on the volume of data; augmenting the execution
time by an
order of magnitude may not matter to the developer if the overall time is still subjectively
brief.
The overhead of disk I/O can be avoided by using pipelining utilities, whether broader
purpose (such as Apache Cocoon [Cocoon]) or more narrowly XML specific
(XProc [XProc]). Pipelining tools avoid the overhead of disk I/O, although
whether they build complete intermediate in-memory structures or decompose (and perhaps
parallelize or stream) the transformation in a functional manner is implementation-specific.
These technologies are well know to the Balisage audience: XProc has been the subject
of six
presentations by six different authors or teams of authors at Balisage alone (see
https://www.balisage.net/Proceedings/topics/XProc.html), and Cocoon has also
made several appearances, including as part of the toolkit used to produce the Balisage
proceedings you are reading now
(https://www.balisage.net/Proceedings/vol17/acknowledgements.html).
Pipelining between or across XSLT stylesheets with technologies like Cocoon or XProc
does
not prohibit writing intermediate output to disk for examination during debugging,
but it
does not require it, and XProc in particular incorporates additional features that
may be
useful in complex transformations of XML documents (see
https://www.w3.org/TR/xproc/#pipeline-concepts). The focus of the current
report is on micropipelining, that is, on pipelines implemented within a single XSLT
stylesheet, and we will not explore XProc or other technologies for pipelining between
or
across stylesheets, except to note that whether to implement a pipeline within a single
stylesheet or between or across stylesheets is a separate consideration that lies
outside
the within-stylesheet scope of the present report.
3. Micropipelining: pipelining within a single XSLT stylesheet
As mentioned above, we use the term micropipelining to refer to pipelining within
a
single XSLT stylesheet. Micropipelines implemented within a single stylesheet can
always be
refactored and spread across multiple stylesheets, but in situations where the steps
are
conceptually part of what a human would consider a single operation, and where intermediate
stages are unlikely to be of interest except for debugging purposes during development,
combining them within a single XSLT stylesheet that performs that single operation
may make
for easier implementation and maintenance.
The application that motivated this exploration is connected to our use of XML
technologies to identify and analyze meter and rhyme in Russian verse. We presented
a report
on the identification of meter at Balisage 2015 [Birnbaum and Thorsen 2015], and in
order also to identify rhyme we need to convert normal Russian orthography to a phonetic
representation. As long as the place of stress is known, the phonetic conversion, although complex, turns out to be easier in Russian
than in English because Russian orthography, although not phonetic, is closer to the
phonetics of the language than is the case with English.
It is common in historical linguistics to represent language change in terms of the
application of ordered rules as a pipeline that accepts an initial form as input,
pipes it
(that is, passes it along) through a sequence of individual sound-change rules, and
generates transformed output. Constraints on the order in which rules apply are
traditionally described using alimentary and sanguinary metaphors, and those metaphors
emerge from the linguistic reality that sound change may happen at a certain point
in
history and then cease to operate, that is, cease to be a productive process:
-
Feeding order. The output of one rule creates new
input for another rule. For example, if in Rule A x → y
(read as x is transformed into y) and in Rule B y → z, Rule A feeds Rule
B by creating new input for it.
-
Bleeding order. The output of one rule modifies a
form that would otherwise have served as input for another rule. For example, if in
Rule
A z → q and in Rule B x → y / _
z (read as x is transformed into y when it precedes z), Rule A
is said to bleed Rule B because it reduces the application of Rule B by the destroying
the environment it requires.
-
Counter-feeding order. The output of one rule does
not create new input for another rule, but were the order of the two rules reversed,
they would observe a feeding relationship. For example, if in Rule A y → z and in Rule B x → y, the
two stand in a counter-feeding relationship, since Rule B creates what would have
been
new input into Rule A, except that it fails to do so because Rule A operates first,
and
is no longer active when the potential new input is created by Rule B.
-
Counter-bleeding order. The output of one rule does
not bleed forms that might otherwise have served as input for another rule, but were
the
order of the two rules reversed, they would observe a bleeding relationship. For
example, if in Rule A x → y / _ z and in Rule B
z → q, the rules observe a counter-bleeding order
because were the order reversed, Rule B would reduce the applicability of Rule A by
destroying the context it requires.
Note that feeding and bleeding are not opposites, and that some rules may not be
critically ordered with respect to others because their inputs, outputs, and conditioning
environments do not intersect with one another. These types of relationships among
rules are
important because of their role in directing or constraining linguistic change (see,
for
example, Kiparsky 1973). In the context of the present report, these
relationships also provide a vocabulary for discussing the interaction of XPath functions
and XSLT templates when those are used to implement transformations that model linguistic
processes. Feeding, bleeding, and their opposites are appropriate for modeling historical
language change because history imposes a real chronological order, but a similar
model that
uses similar terminology may also describe other linguistic phenomena, including the
synchronic derivation of surface forms from more abstract underlying forms. In the
present
instance, we employ this framework to describe, as a pipeline of individual rules,
the XSLT
implementation of a transformation of standard Russian orthography (which is not phonetic)
to a phonetic representation that can serve as the basis for identifying rhyme in
verse.
Micropipelining: a closer look
Background and context
When we began implementing in XSLT the machine-assisted identification and analysis
of
Russian rhyme, we were aware that pipelining was required because our algorithm for
converting natural Russian orthography (enhanced by information about stress, which
is not
part of normal Russian writing) was conceptualized as a series of rules of insertion,
deletion, and replacement, the order of which was constrained by feeding and bleeding
relationships. As we have been crafting bespoke solutions to these types of pipelining
tasks
over the years, we have wished for a tutorial that would describe the options, say
something
useful about the advantages and disadvantages of each, point out where they behave
similarly
or differently from one another, and perhaps make recommendations for deciding among
them in
the field. The principal goal of the present report is to fill the gap by describing
and
exploring the alternatives that we considered, including the one we eventually adopted
for
the Russian rhyme task.
Convenience variables
As the name implies, convenience variables are
variables created for the benefit of the programmer, rather than because they necessarily
improve the efficiency or operation of the program. The purpose of convenience variables
is
that typically they improve the legibility of the code, which makes it easier to develop
(one piece at a time) and maintain.
Variables that improve the efficiency of the transformation process itself are not
(or:
are not entirely) convenience variables (although they may also be convenient). For
example,
if we need to calculate the same result more than once in an XSLT stylesheet, performing
the
calculation only once and saving it in a variable that we can then reuse may reduce
the
amount of computation and improve the performance of the transformation. On the other
hand,
when we calculate a value that we will use only once, save it in a variable, and then
use
the variable, not only is there no similar saving, but we actually risk reducing the
efficiency of the transformation by imposing the overhead needed to create the variable
as
an intermediate structure. Within XSLT, this is especially the case when variables
are
declared without explicit typing using the @as
attribute, the omission of which
results in the expensive creation of a temporary in-memory document (that we may not
need)
to hold the value.
As an example of a convenience variable, consider an XSLT transformation to SVG where
we
need to calculate the x
and y
coordinates of an object and then
plot it. If those calculations are complex, packing them into the declaration of the
object
may compromise its transparency. Compare the version with convenience variables:
<xsl:variable name="xPos" as="xs:double" select="position() * $xScale"/>
<xsl:variable name="yPos" as="xs:double" select=". * $yScale"/>
<rect x={$xPos} y="-{$yPos}" width="{$width}" height="{$yPos}"/>
to one without:
<rect x={position() * $xScale} y="-{. * $yScale}" width="{$width}" height="{. * $yScale}"/>
The more complex the calculation, the more the legibility may be improved through
the
use of convenience variables.
Convenience variables may play a role in pipelining when the result of one function
is
passed to another. For example,
<xsl:value-of select="djb:func2(djb:func1($x))"/>
may be rewritten as:
<xsl:variable name="y" select="djb:func1($x)"/>
<xsl:value-of select="djb:func2($y)">
It is common, while developing and debugging pipelines, to insert, delete, alter,
or
rearrange the individual steps in order to fine-tune their interaction. Removing a
step, for
example, with nested functions means finding the function name, the open parenthesis
(which
is always immediately after the function name), and the closing parenthesis. This
last step
may be annoyingly difficult if the nesting is deep and if the functions have multiple
parameters; if they have only one, the closing parentheses are all bunched together
at the
end, and it doesn’t matter which one we remove, but if they are not all consecutive,
the
developer has to find exactly the right one. With convenience variables, removing
a step
requires not only removing the step itself, but also adjusting a variable reference
elsewhere. Consider first the nested version:
<xsl:sequence select="djb:func3(djb:func2(djb:funct1($input)))"/>
If we wish to remove func2()
from the pipeline, we remove the string
“func2(” and any one of the closing parentheses. Now consider the version with convenience
variables:
<xsl:variable name="first" select="djb:func1($input)"/>
<xsl:variable name="second" select="djb:func2($first)"/>
<xsl:sequence select="djb:func3($second)"/>
If we remove the middle line in the interest of removing djb:func2()
from
the pipeline, we can find the entire function easily, and we don’t have to look separately
for the closing parenthesis. But our modifications are not limited to just one line;
we must
also change the variable reference in the last line from $second
to
$first
. This is not an unmanageable burden, but it is an extra step, and an
extra opportunity for error.
The difference between nesting functions directly inside one another and nesting inside
one function a variable containing the result of evaluating another function is discussed
in
greater detail below. The distinction between convenience variables and those that
contribute to the quality of the code in ways that are not limited to human legibility
is
important because the costs and benefits of creating variables to hold intermediate
values
must take into consideration whether using the variable conveys advantages beyond
convenience, that is, beyond the legibility that supports ease of maintenance.
Nested functions
One of the most natural types of micropipelining involves the nesting of functions,
as
in the example earlier that combines normalize-space()
and
upper-case()
by nesting one inside the other. This simple nested structure
involves only two functions, each of which accepts only one parameter, but as the
number of
functions and the number of parameters grows, it is easy to lose one’s place in a
thicket of
parentheses. Furthermore, the normalize-space()
and upper-case()
functions do not participate in feeding or bleeding relationships, which means that
once we
have determined that we need to use them, there is no risk of nesting them in a way
that
produces the wrong output, and that therefore requires reordering. That matters because
part
of the challenge of multiple nested functions, and of the multiple pairs of parentheses
that
come with them, involves legibility during development. The fact that here the nesting
order
doesn’t matter obviates a potential need to transplant a contained function into a
containing position, and vice versa, because we inadvertently nested them incorrectly
the
first time.
Consider, for example, a situation using the Wordhoard TEI edition of Hamlet, where our task is to construct a comma-separated list of distinct speakers (from
the <speaker>
element children of the <sp>
speech
elements) in Act 4. We can retrieve a deduplicated list of all <speaker>
values with a single XPath function wrapped around a single path expression:
distinct-values(//body/div[4]//speaker)
. The catch is that this list includes
not only Rosencrantz
and Guildenstern
, both of whom speak
alone, but also Rosencrantz and Guildenstern
(as the string content of a
single <speaker>
element) where they speak in unison, and our result
should recognize that Rosencrantz and Guildenstern
is not a unique speaker.
It is possible to perform the operation in a single line using nested functions:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of
select="distinct-values(tokenize(string-join(//body/div[4]//speaker/replace(., ' and ', ' '), ' '), '\s+'))"
/>
</xsl:template>
</xsl:stylesheet>
Here we pipe the output of the innermost replace()
function into
string-join()
, and then into tokenize()
, and then into
distinct-values()
. The order of functions has some flexibility, but the following constraints
apply:
-
string-join()
feeds tokenize()
-
tokenize()
feeds distinct-values()
-
replace()
may occur at any time before tokenize()
(that
is, before or after string-join()
), or we can eliminate
replace()
entirely and instead use a predicate to exclude the eventual
token with the value and
.
In favor of this strategy, nesting is the most direct possible representation of the
pipeline relationship, since the input parameters to each function are included between
its
parentheses and each function statement evaluates to its output. Against this strategy,
the
depth of the nesting and the fact that several of the functions require multiple parameters
greatly compromise the legibility, and thus the development and subsequent maintenance,
of
the code. Writing this code is messy and error-prone, and, recalling that code is
read much
more often than it is written, nobody would regard it as easy to read. We might try
to
improve the legibility through indentation:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:value-of
select="distinct-values(
tokenize(
string-join(
//body/div[4]//speaker/replace(
.,
' and ',
' '
),
' '),
'\s+')
)"
/>
</xsl:template>
</xsl:stylesheet>
although most developers would not consider this very legible either. There is good reason that the functional programming language Lisp, known for
its deeply nested parenthesized constructions, has been glossed humorously as Lots of
Irritating Silly Parentheses
and in other equivalent ways.
We can improve the legibility by rewriting the code to use convenience variables:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"
xpath-default-namespace="http://www.tei-c.org/ns/1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:variable name="first" as="xs:string+"
select="//body/div[4]//speaker/replace(., ' and ', ' ')"/>
<xsl:variable name="second" as="xs:string" select="string-join($first, ' ')"/>
<xsl:variable name="third" as="xs:string+" select="tokenize($second, '\s+')"/>
<xsl:variable name="fourth" as="xs:string+" select="distinct-values($third)"/>
<xsl:sequence select="$fourth"/>
</xsl:template>
</xsl:stylesheet>
This improvement comes at a price, though. Much as the complexity of the compound
operations makes it easy for a developer inadvertently to nest the functions incorrectly
in
the nested syntax, it is also easy to order them incorrectly in the version that employs
convenience variables. A debugging fix that is simple to describe (revise the order
in which the functions are applied
, which might incorrectly suggest that this
could be mapped onto rearrange the lines in which the variables are created
)
turns out to be error prone because each line refers to two variable names, the one
it is
creating with its @name
attribute and the one that it is using to determine the
value, which is part of the value of its @select
attribute. Changes in the
order in which the functions are executed thus becomes far more complex and error-prone
than
dragging the lines into a different order. One way to conceptualize the challenge
is that
the variable declarations exist in a feeding relationship not just by virtue of the
order in
which they are declared (see immediately below about the XSLT prohibition against
using a
variable before it has been declared), but also by virtue of the way each declaration
uses a
value created by a different declaration.
Given that variables in XSLT cannot be redeclared, one might wonder why XSLT requires
that sibling variable declarations within a stylesheet be arranged in conformity with
the
order in which they are referenced. That is, one might wonder why XSLT prohibits ordering
sibling <xsl:variable>
elements in arbitrary order, much as it is
possible to apply templates within one template that references another regardless
of the
order in which the two are listed in the XSLT code. Whatever the reason for this
restriction, it means that rearranging the order in which the functions are applied
involves
not only changing the references among them, but also modifying the order of the
<xsl:variable>
elements so that the order of declaration follows the
order in which the values are used.
Function chaining
One reason that deeply nested functions with multiple parameters are difficult to
read
is that code is written from left to right, which tacitly encourages humans to read
it from
left to right, but the feeding order (and thus the execution) of nested functions
flows from
the inside out. This means that the function names flow from right to left (but not
starting
the far right of the line, which is filled with closing parentheses and possibly function
arguments), yet because functions may have multiple parameters, it is necessary to
look to
both the left and the right when moving out to the next level. In languages that support
object-oriented programming, the dotted method notation
avoids this clash because both the notation and the execution flow from left to right.
For
example, the following Python example returns the same results as the XSLT above:
names = ['King','Gertrude','King','Gertrude','King','Gertrude','King','Hamlet','Rosencrantz and Guildenstern',\
'Hamlet','Rosencrantz','Hamlet','Rosencrantz','Hamlet','Rosencrantz','Hamlet','Rosencrantz','Hamlet',\
'Rosencrantz','Hamlet','Rosencrantz','Hamlet','Guildenstern','Hamlet','King','Rosencrantz','King',\
'Rosencrantz','King','Rosencrantz','King','Hamlet','King','Hamlet','King','Hamlet','King','Hamlet','King',\
'Hamlet','King','Hamlet','King','Hamlet','King','Hamlet','King','Hamlet','King','Hamlet','King','Fortinbras',\
'Captain','Fortinbras','Hamlet','Captain','Hamlet','Captain','Hamlet','Captain','Hamlet','Captain','Hamlet',\
'Captain','Hamlet','Captain','Rosencrantz','Hamlet','Gertrude','Gentleman','Gertrude','Gentleman','Horatio',\
'Gertrude','Ophelia','Gertrude','Ophelia','Gertrude','Ophelia','Gertrude','Ophelia','Gertrude','Ophelia',\
'King','Ophelia','King','Ophelia','King','Ophelia','King','Ophelia','King','Gertrude','King','Gentleman',\
'Gertrude','King','Laertes','Danes','Laertes','Danes','Laertes','Gertrude','Laertes','King','Laertes','King',\
'Gertrude','King','Laertes','King','Laertes','King','Laertes','King','Laertes','King','Danes','Laertes',\
'Ophelia','Laertes','Ophelia','Laertes','Ophelia','Laertes','Ophelia','Laertes','Ophelia','Laertes','King',\
'Laertes','King','Horatio','Servant','Horatio','Sailor','Horatio','Sailor','Horatio','King','Laertes','King',\
'Laertes','King','Messenger','King','Messenger','King','Laertes','King','Laertes','King','Laertes','King',\
'Laertes','King','Laertes','King','Laertes','King','Laertes','King','Laertes','King','Laertes','King',\
'Laertes','King','Laertes','King','Laertes','King','Gertrude','Laertes','Gertrude','Laertes','Gertrude',\
'Laertes','King']
result = set(' '.join(names).replace(' and ', ' ').split())
print(result)
The set()
function is used to remove the duplicates and its input is
contained in the parentheses that immediately follow the function name, but otherwise
the
order of execution and the syntactic expression both flow from left to right using
dotted
method notation: from join()
(equivalent to XSLT string-join()
)
into replace()
(also called replace()
in XSLT) into
split()
(equivalent to tokenize()
in XSLT).
While dotted method notation is natural within an object-oriented model, the notation
is
also not inherently incompatible with functional programming paradigms. For example,
pipelines with dot notation when working within a functional model in Python are supported
by the PyFunctional package [PyFunctional], which makes creating
data pipelines easy by using chained functional operators
, e.g.:
seq(1, 2, 3).map(lambda x: x * 2).reduce(lambda x, y: x + y)
XPath 3.1 introduced the “arrow operator” (=>
) for this purpose, so that:
tokenize((normalize-unicode(upper-case($string))),"\s+")
can be rewritten as
$string => upper-case() => normalize-unicode() => tokenize("\s+")
[
XPath 3.1, §3.16] One limitation to this XPath operator is that the value that is
passed can be used only as the
first argument to the next
function, which means that traditional methods must be used when the value supplies
any
non-initial parameter.
Variables, templates, and functions
The description of nested functions above was based on functions from the core XPath
library, but much more complex operations can be embedded in user-defined functions,
which
can then be nested on a single line, similarly to the nesting in the basic XPath library
example. Indeed, a user-defined function can transform an entire input document into
an
entire intermediate document, optionally assigning the value of the function to a
variable,
and then apply another function to the intermediate document. If we create two (imaginary)
user-defined functions, one that translates a document from Russian into English and
another
that tokenizes the English translation and returns a report (perhaps an HTML table)
of the
distribution of Scrabble values of the words, the starting point of the transformation
might
look like:
<xsl:template match="/">
<xsl:variable name="intermediate" as="document-node()" select="djb:translate_into_english(.)"/>
<xsl:sequence select="djb:create_scrabble_values($intermediate)"/>
</xsl:template>
As was the case in the discussion of core XPath library functions above,
$intermediate
is a convenience variable, and the functions can,
alternatively, be nested directly:
<xsl:template match="/">
<xsl:sequence select="djb:create_scrabble_values(djb:translate_into_english(.))"/>
</xsl:template>
The two user-defined functions might call other functions, apply templates, or otherwise
perform operations of arbitrary complexity.
Insofar as functions and templates have similar properties (and are compiled to the
same
type of operations within the processor), instead of creating user-defined functions,
it is
also possible to use convenience variables with templates:
<xsl:template match="/">
<xsl:variable name="intermediate" as="document-node()">
<xsl:apply-templates select="."/>
</xsl:variable>
<xsl:apply-templates select="$intermediate"/>
</xsl:template>
Dimitre Novatchev provided real examples of this type of chaining in XSLT 1.0 and
2.0 in
a StackOverflow response in 2010. [Novatchev 2010]
The use of intermediate variables with either functions or templates neither requires
nor precludes serializing all or part of the intermediate results with
<xsl:result-document>
or <xsl:message>
if that is
needed for development or for production. But the use of templates with intermediate
variables for pipelining does exhibit one of the principal limitations discussed above
of
using functions with variables for pipelining: editing the pipeline requires, beyond
the
insertion, deletion, or rearrangement of lines, changing the value of the
@select
attribute when the intermediate variable is used. For example, in the
code above, if between the translation into English and the creation of the Scrabble
value
output we want to pluralize all words that are capable of being pluralized, not only
do we
need to create another variable, but we also need to change the value of the
@select
attribute of an <xsl:apply-templates>
elements in
a different line (modifications associated with the insertion are highlighted):
<xsl:template match="/">
<xsl:variable name="intermediate" as="document-node()">
<xsl:apply-templates select="."/>
</xsl:variable>
<xsl:variable name="pluralized" as="document-node()">
<xsl:apply-templates select="$intermediate"/>
</xsl:variable>
<xsl:apply-templates select="$pluralized"/>
</xsl:template>
Inserting or deleting a step requires, in addition to the insertion or deletion itself,
one adjustment to the value of a @select
attribute in a different line.
Rearranging steps requires more adjustments. There is also no easy way of changing
the
location where intermediate results are serialized, whether for development or for
production. And because the use of intermediate variables requires additional physical
lines
of code for legibility, it becomes harder to see the entire pipeline as an uninterrupted
sequence. What would be more robust, flexible, and maintainable would be a pattern
that has
the following properties:
-
Each step of the pipeline fits on a single short line of XSLT code, with no
wrappers, no intervening statements, and nothing else that intrudes into a bare-bones
list of steps.
-
Each step of the pipeline is entirely self-contained. Inserting, deleting, or
rearranging steps does not require any modification to any code pertaining to other
steps. This self-containment has two subcomponents:
-
The code that specifies the order of operations in the pipeline does not
mention or pass any values explicitly. This means that changing the sequence of
steps does not require changing variables in the arguments.
-
The code blocks that perform the pipeline operations all expect the same
parameters and return the same result types. This means that changing the sequence
of steps does not require any changes within any of the steps.
-
Intermediate results can be dumped at any time (or at more than one time) as a step
in the pipeline, whether for development or for production purposes. Dumping is a
part
of the pipeline that happens between transformative operations on the data; it is
not
part of any individual transformative operation within the pipeline.
-
Optional: Steps can be added to the list before the code that performs them has been
created, without requiring any adjustment to the parameters.
Nested template calls
As was noted above, functions and templates have similar capabilities, but because
the
nesting syntax is different, performing the same complex operation by using nested
template
calls instead of nested functions addresses some of the preceding desiderata. In the
example
below, we pass the original input, saved in the $russian-text
variable, as a
parameter into a translate-into-english
template, the return value of which
becomes the value of an $english-text
parameter, which we then pass into a
create-scrabble-values
template. That last template returns our
results:
<xsl:variable name="scrabble_values">
<xsl:call-template name="create-scrabble-values">
<xsl:with-param name="english-text" as="document-node()">
<xsl:call-template name="translate-into-english">
<xsl:with-param name="input" select="$russian-text" as="document-node()"/>
</xsl:call-template>
</xsl:with-param>
</xsl:call-template>
</xsl:variable>
This method addresses some, but not all, of the desiderata listed above. Its primary
advantage is that it keeps the parameters adjacent to the template calls that use
them,
which is an improvement over scattering them, as often happens with nested functions.
But
insertion, deletion, and especially rearrangement are not one-line operations, and
neither
is dumping interim results during development, and the hierarchy becomes harder to
follow as
the number of steps, and therefore the degree of indentation in pretty-printed output,
increases. Below we examine how the visitor pattern addresses these concerns.
The visitor pattern
The visitor pattern in an XSLT context
The visitor pattern is usually defined and
illustrated in an object-oriented context that may be difficult to map onto the XSLT
programming paradigm. James Sugrue provides some real-world analogies:
A real world analogy always helps with the understanding of a design pattern. One
example I have seen for the Visitor pattern in action is a taxi example, where the
customer […] orders a taxi, which arrives at his door. Once the person sits in [it],
the
visiting taxi is in control of the transport for that person.
Shopping in the supermarket is another common example, where the shopping cart is
your set of elements. When you get to the checkout, the cashier acts as a visitor,
taking the disparate set of elements (your shopping), some with prices and others
that
need to be weighed, in order to provide you with a total.
…
In summary, if you want to decouple some logical code from the elements that you’re
using as input, visitor is probably the best pattern for the job. [Sugrue 2010]
Because XSLT is not object-oriented, logical operations on objects (e.g., nodes of
various types, atomic values) are naturally decoupled from the objects themselves.
Michael
Kay, writing about the simulation of higher-order functions in XSLT 2.0, situates
the
visitor pattern in a functional context:
[Y]ou might write a function that does a depth-first traversal of a tree and
processes each node in the tree using a function that is supplied as an argument.
This
makes it possible to use one tree-walking routine to achieve many different effects
(you
might recognize this as the Visitor Pattern).
In XSLT (and XPath), functions are not first-class objects, so they cannot be passed
as arguments to other functions. However, the template-matching capability of
<xsl:apply-templates>
can be used to simulate higher order
functions.
The use of <xsl:apply-templates>
has been developed to a fine art
by Dimitre Notatchev in his FXSL library (http://fxsl.sourceforge.net/),
which provides many examples of the technique. [Kay 2008]
Defining the order of operations
The transformation is driven by a sequence of operations, each of which is represented
by an empty element in a custom namespace:
<xsl:variable name="operations" as="element()+">
<djb:prepareWords/>
<djb:lexical/>
<djb:proclitics/>
<djb:enclitics/>
<djb:tsa/>
<djb:palatalize/>
<djb:jot/>
<djb:romanize/>
<djb:finalDevoice/>
<djb:regressiveDevoice/>
<djb:regressiveVoice/>
<djb:palatalAssimilation/>
<djb:consonantCleanup/>
<djb:vowelReduction/>
<!--
diagnosticOutput writes the string to stderr as an <xsl:message>
move it around or comment it out, as appropriate
-->
<!--<djb:diagnosticOutput/>-->
<djb:stripSpaces/>
<djb:rhymeString/>
</xsl:variable>
Note that <djb:diagnosticOutput>
can be inserted wherever the
developer wishes to examine interim results, including in multiple locations.
Starting the transformation
The transformation is run once for each <line>
element in a poem.
It operates through sibling recursion, applying templates to the first member of the
$operations
variable defined in the example above and passing as parameters
the current line ($input
) and the sequence of remaining parameters
(remove($operations, 1)
):
<xsl:apply-templates select="$operations[1]" mode="operate">
<xsl:with-param name="input" select="."/>
<xsl:with-param name="remaining" select="remove($operations, 1)"/>
</xsl:apply-templates>
Note that we apply templates not to the input (the <line>
element),
but to the operation, and we pass along to the operation as parameters the input and
the
sequence of remaining operations. None of the operations refers to any other explicitly,
and none refers to any specific parameters, which means that inserting, deleting,
and
rearranging operations involves changing only the list of operations itself. Because
the
sibling recursion strategy will walk through the operations in order (see below),
one
operation feeds the next without having to know its identity.
Template matching
When we apply templates to an element in the sequence of operations, a single template
matches all elements in the custom namespace, and it dispatches the processing to
the
appropriate location within an <xsl:choose>
element. That single
template begins as follows:
<xsl:template match="djb:*" mode="operate">
<xsl:param name="input" as="xs:string" required="yes"/>
<xsl:param name="remaining" as="element()*"/>
<xsl:variable name="results" as="xs:string*">
<xsl:choose>
<!-- ******************************************* -->
<!-- djb:lexical: Idiosyncrasies in pronunciation (including -ogo) -->
<!-- ******************************************* -->
<xsl:when test="self::djb:lexical">
<xsl:sequence select="djb:lexical($input)"/>
</xsl:when>
<!-- ******************************************* -->
<!-- djb:proclitics: Merge proclitics with bases -->
<!-- ******************************************* -->
<xsl:when test="self::djb:proclitics">
<xsl:sequence select="djb:proclitic($input, 1)"/>
</xsl:when>
<!-- ******************************************* -->
<!-- djb:enclitics: Merge enclitics with bases -->
<!-- ******************************************* -->
<xsl:when test="self::djb:enclitics">
<xsl:sequence select="djb:enclitic($input, 1)"/>
</xsl:when>
<!-- ******************************************* -->
<!-- djb:tsa: Convert ть?ся$ to тса -->
<!-- ******************************************* -->
<xsl:when test="self::djb:tsa">
<xsl:sequence select="replace($input, 'ться$', 'тса')"/>
</xsl:when>
<!-- ******************************************* -->
<!-- djb:palatalize: Capitalize all palatalized consonants (including unpaired) -->
<!-- ******************************************* -->
<xsl:when test="self::djb:palatalize">
<xsl:variable name="result1" as="xs:string+">
<xsl:analyze-string select="$input"
regex="([бвгдзклмнпрстфх])([яеиёюЯЕИЁЮь])">
<xsl:matching-substring>
<xsl:sequence
select="concat(upper-case(regex-group(1)), regex-group(2))"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:sequence select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:sequence select="translate(string-join($result1, ''), 'чйщ', 'ЧЙЩ')"/>
</xsl:when>
The pipline operations are the only elements in the djb:
namespace, so
this template matches all of them, and nothing else. It accepts the two parameters
and
then creates a variable called $results
, inside of which it uses
<xsl:choose>
to identify the operation that is being processed. Some
of these operations (e.g., <djb:lexical>
,
<djb:proclitics>
, <djb:enclitics>
) call
user-defined functions (in a separate function library) to do their work. Others use
core
XPath library functions (e.g., <djb:tsa>
). Others are longer and
perform more complex processing (e.g., <djb:palatalize>
). Each
<xsl:when>
element, which processes a specific operation, returns a
sequence as the value of the (interim) $results
variable that wraps the
<xsl:choose>
element. How this variable contributes to the result of
the surrounding template is described below.
Unmatched operations
At different moments in the development we implemented different treatments of
operations that are not matched in the single processing template. One option is to
let
operations listed in the sequence but not associated with processing code pass their
logical input, unchanged, along to the next step in the pipeline. This is the simplest
form of a stub; the name of the operation is present, but the default behavior if
the
operation is not defined is to pass:
<!-- ******************************************* -->
<!-- Default to continue processing if operate step has no code-->
<!-- ******************************************* -->
<xsl:otherwise>
<xsl:apply-templates select="$remaining[1]" mode="visit">
<xsl:with-param name="input" select="$input"/>
<xsl:with-param name="remaining" select="remove($remaining, 1)"/>
</xsl:apply-templates>
</xsl:otherwise>
The <xsl:otherwise>
default applies templates to the next operation
in the sequence, passing along the input it received unchanged.
Alternatively, when the development is complete, we might prefer to raise a fatal
error if the sequence of operations includes any that have not been defined:
<!-- ******************************************* -->
<!-- Default to terminate if operate step has no code-->
<!-- ******************************************* -->
<xsl:otherwise>
<xsl:message terminate="yes">Unmatched visitor element <xsl:value-of
select="local-name()"/></xsl:message>
</xsl:otherwise>
One or the other of these two defaults should be commented out at all times.
Processing the next operation
Except for the <xsl:otherwise>
option that raises an error and
terminates, each operation creates a sequence that becomes the value of the
$results
variable that wraps the <xsl:choose>
element.
In some cases the value is a single string; in others it is a sequence of strings
(thus
the plural name). After the $results
variable has been constructed, it is used to
construct a $result
(singular), typed as a single string. The template then
passes that value to the next operation, using sibling recursion:
<xsl:variable name="result" as="xs:string" select="string-join($results, '')"/>
<xsl:apply-templates select="$remaining[1]" mode="#current">
<xsl:with-param name="input" select="$result"/>
<xsl:with-param name="remaining" select="remove($remaining, 1)"/>
</xsl:apply-templates>
Termination
The sibling recursion pattern removes each operation after it has been processed and
then passes the sequence of remaining operations along as a parameter. When the last
operation has been processed and removing it from the sequence of operations leaves
an
empty sequence, the pipeline has finished, and returns gracefully:
<xsl:if test="empty($remaining)">
<xsl:sequence select="$result"/>
</xsl:if>
Evaluation
The paradigm described here meets all of the desiderata enumerated earlier and
repeated here:
-
Each step of the pipeline fits on a single short line of XSLT code, with no
wrappers, no intervening statements, and nothing else that intrudes into a bare-bones
list of steps.
-
Each step of the pipeline is entirely self-contained. Inserting, deleting, or
rearranging steps does not require any modification to any code pertaining to other
steps.
-
The list of steps does not mention or pass any parameters explicitly. Parameter
passing is handled within the processing steps (whether functions or templates), and
changing the sequence of steps does not require changing the code that manages
parameters within any of the steps.
-
Intermediate results can be dumped at any time (or at more than one time) as a
step in the pipeline, whether for development or for production purposes. Dumping
is a
part of the pipeline that happens between transformative operations on the data; it
is
not part of any individual transformative operation within the pipeline.
-
Steps can be added to the list before the code that performs them has been
created, without requiring any adjustment to the parameters.
Adapting the visitor pattern to use <xsl:iterate>
The introduction of <xsl:iterate>
in XSLT 3.0 suggests an alternative
way to implement the visitor pattern. Consider the following example, which combines the processing pipeline and the
text to be transformed in a single XML document:
<example>
<pipeline xmlns="Pipeline">
<tokenize/>
<lower-case/>
<swap s1="fox" s2="dog"/>
<swap s1="quick" s2="lazy"/>
<pipeline>
<replace match="b" replace="z"/>
<replace match="o" replace="a"/>
</pipeline>
<replace match="lazy" replace="eager"/>
<capitalize/>
<reverse/>
<string-join separator="-"/>
</pipeline>
<source>
<div>
<p>The quick BROWN fox jumps OVER the lazy dog</p>
</div>
</source>
</example>
As in the original visitor pattern example, the pipeline steps are specified as
elements, in a custom namespace, to which templates are applied.
<pipeline>
elements can be nested; this has no effect on the execution
(the nesting is flattened), but it can simplify modular development and inclusion.
The
following XSLT stylesheet applies the pipeline steps in order using
xsl:iterate
:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="xs math"
xmlns:p="Pipeline" version="3.0">
<xsl:output method="xml" indent="true"/>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="/">
<xsl:apply-templates select="example/source" mode="#current">
<xsl:with-param name="pipeline" select="example/p:pipeline" tunnel="true"/>
</xsl:apply-templates>
</xsl:template>
<xsl:template match="text()[matches(., '\S')]">
<xsl:param name="pipeline" tunnel="true"/>
<xsl:call-template name="pipeline">
<xsl:with-param name="input" select="." as="xs:string"/>
<xsl:with-param name="pipeline" as="element()" select="$pipeline"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="pipeline" as="item()*">
<xsl:param name="input" as="item()*"/>
<xsl:param name="pipeline" as="element()"/>
<xsl:iterate select="$pipeline/*">
<xsl:param name="input" select="$input"/>
<xsl:on-completion>
<xsl:sequence select="$input"/>
</xsl:on-completion>
<xsl:next-iteration>
<xsl:with-param name="input" as="item()*">
<xsl:apply-templates select=".">
<xsl:with-param name="input" select="$input"/>
</xsl:apply-templates>
</xsl:with-param>
</xsl:next-iteration>
</xsl:iterate>
</xsl:template>
<xsl:template match="p:tokenize" as="xs:string*">
<xsl:param name="input" as="xs:string?"/>
<xsl:variable name="split" select="(@split, '\s+')[1]"/>
<xsl:sequence select="tokenize($input, $split)"/>
</xsl:template>
<xsl:template match="p:string-join" as="xs:string">
<xsl:param name="input" as="item()*"/>
<xsl:variable name="sep" select="(@separator, ' ')[1]"/>
<xsl:sequence select="string-join($input ! string(.), $sep)"/>
</xsl:template>
<xsl:template match="p:lower-case" as="xs:string*">
<xsl:param name="input" as="item()*"/>
<xsl:sequence select="$input ! string(.) ! lower-case(.)"/>
</xsl:template>
<xsl:template match="p:capitalize" as="xs:string*">
<xsl:param name="input" as="item()*"/>
<xsl:sequence
select="$input ! string(.) ! (upper-case(substring(., 1, 1)) || lower-case(substring(., 2)))"
/>
</xsl:template>
<xsl:template match="p:reverse" as="item()*">
<xsl:param name="input" as="item()*"/>
<xsl:sequence select="reverse($input)"/>
</xsl:template>
<xsl:template match="p:replace" as="xs:string*">
<xsl:param name="input" as="item()*"/>
<xsl:variable name="match" select="(@match, ' ')[1]"/>
<xsl:variable name="replace" select="(@replace, ' ')[1]"/>
<xsl:sequence select="$input ! string(.) ! replace(., $match, $replace)"/>
</xsl:template>
<xsl:template match="p:swap" as="xs:string*">
<xsl:param name="input" as="item()*"/>
<xsl:variable name="s1" select="@s1"/>
<xsl:variable name="s2" select="@s2"/>
<xsl:for-each select="$input">
<xsl:variable name="parts" as="xs:string*">
<xsl:analyze-string select="string(.)" regex="{$s1}|{$s2}">
<xsl:matching-substring>
<xsl:sequence
select="
if (. eq $s1) then
$s2
else
$s1"
/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:sequence select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:sequence select="string-join($parts)"/>
</xsl:for-each>
</xsl:template>
<xsl:template match="p:pipeline" as="item()*">
<xsl:param name="input" as="item()*"/>
<xsl:call-template name="pipeline">
<xsl:with-param name="input" select="$input"/>
<xsl:with-param name="pipeline" select="."/>
</xsl:call-template>
</xsl:template>
</xsl:stylesheet>
The output is:
<source>
<div>
<p>Fax-Quick-The-Aver-Jumps-Dag-Zrawn-Eager-The</p>
</div>
</source>
The <xsl:iterate>
strategy uses iteration over the children of the
<pipeline>
element instead of sibling recursion. The example above also
relies on separate templates for every element in the Pipeline
namespace
instead of using a single template and <xsl:choose>
, a decision that is
independent of the choice between iteration and recursion. The recursion implementation
relies on explicit surgery on the list of pipeline steps with the help of the XPath
remove()
function, as well as a somewhat indirect test for the end of the
pipeline, which uses the XPath empty()
function to check whether any pipeline
steps remain. The iterative approach may therefore feel more natural, since it uses
elements
with more transparent, self-documenting, names, such as
<xsl:next-iteration>
and <xsl:on-completion>
.
Nested calls to the replace()
functions
Use cases for nested replace()
functions
Developers who work with uncommon writing systems (ancient, medieval, minority, etc.)
often need to convert documents from one character coding to another. Sometimes the
writing system is the same but the character set is different, such as when pre-Unicode
legacy documents must be converted to Unicode. Sometimes the conversion is from one
writing system to another, such as when Cyrillic texts are Romanized for library catalog
purposes, or when linguistic field data in domain-specific notation are converted
to the
International Phonetic Alphabet (IPA).
If the mapping from one character representation to another is one to one, it can
be
implemented with a single translate()
function. The implementation of
translate()
has the advantage of not suffering from feeding or bleeding
conflicts. For example, translate($input,'AB','BA')
replaces A with B and B with A simultaneously, so that
old instances of A
that are translated to B
do not then wind
up feeding the translation of B
to A
, or vice versa.
But mappings that are one to many, many to many, and many to one cannot be managed
with translate()
because translate()
can only map exactly one
character to exactly one character (or to zero characters). For this reason, mappings
that
are not one-to-one are performed using the replace()
function, and if there
are many such replacements, the transformation requires a pipeline of
replace()
operations (which must be ordered in a way that is sensitive to
feeding and bleeding issues). Even if we bracket ordering constraints, the necessity
of
pipelining replace()
functions (in our work, often more than two dozen)
raises the legibility and maintainability issues discussed above. We avoid those issues
by
implementing a table-driven approach that separates the processing with
replace()
from the match and replacement values. The example below is drawn
from a Uyghur-to-IPA mapping project.
The mapping table
The mapping table consists of pairs of match and replacement strings, grouped as those
that are not one to one (<multiple>
) and those that are
(<unitary>
):
<pairs>
<multiple>
<pair>
<orth>ch</orth>
<ipa>ʧʰ</ipa>
</pair>
<pair>
<orth>gh</orth>
<ipa>ɣ</ipa>
</pair>
<pair>
<orth>ng</orth>
<ipa>ŋ</ipa>
</pair>
<pair>
<orth>sh</orth>
<ipa>ʃ</ipa>
</pair>
<pair>
<orth>zh</orth>
<ipa>ʒ</ipa>
</pair>
</multiple>
<unitary>
<pair>
<orth>a</orth>
<ipa>a</ipa>
</pair>
<pair>
<orth>e</orth>
<ipa>ɛ</ipa>
</pair>
<pair>
<orth>b</orth>
<ipa>b</ipa>
</pair>
<pair>
<orth>d</orth>
<ipa>d</ipa>
</pair>
<pair>
<orth>é</orth>
<ipa>e</ipa>
</pair>
<pair>
<orth>f</orth>
<ipa>f</ipa>
</pair>
<pair>
<orth>g</orth>
<ipa>g</ipa>
</pair>
<pair>
<orth>h</orth>
<ipa>h</ipa>
</pair>
<pair>
<orth>x</orth>
<ipa>χ</ipa>
</pair>
<pair>
<orth>i</orth>
<ipa>i</ipa>
</pair>
<pair>
<orth>j</orth>
<ipa>ʤ</ipa>
</pair>
<pair>
<orth>k</orth>
<ipa>k</ipa>
</pair>
<pair>
<orth>q</orth>
<ipa>q</ipa>
</pair>
<pair>
<orth>l</orth>
<ipa>l</ipa>
</pair>
<pair>
<orth>l</orth>
<ipa>ɫ</ipa>
</pair>
<pair>
<orth>m</orth>
<ipa>m</ipa>
</pair>
<pair>
<orth>n</orth>
<ipa>n</ipa>
</pair>
<pair>
<orth>o</orth>
<ipa>o</ipa>
</pair>
<pair>
<orth>ö</orth>
<ipa>ø</ipa>
</pair>
<pair>
<orth>p</orth>
<ipa>pʰ</ipa>
</pair>
<pair>
<orth>r</orth>
<ipa>r</ipa>
</pair>
<pair>
<orth>s</orth>
<ipa>s</ipa>
</pair>
<pair>
<orth>t</orth>
<ipa>tʰ</ipa>
</pair>
<pair>
<orth>u</orth>
<ipa>u</ipa>
</pair>
<pair>
<orth>ü</orth>
<ipa>y</ipa>
</pair>
<pair>
<orth>w</orth>
<ipa>w</ipa>
</pair>
<pair>
<orth>y</orth>
<ipa>j</ipa>
</pair>
<pair>
<orth>z</orth>
<ipa>z</ipa>
</pair>
<pair>
<orth>‘</orth>
<ipa>ˀ</ipa>
</pair>
<pair>
<orth>'</orth>
<ipa>ˀ</ipa>
</pair>
</unitary>
</pairs>
Processing mappings that are not one to one
Because some of the characters in the <multiple>
portion of the
mapping table also appear in the <unitary>
portion, bleeding
constraints require that mappings that are not one to one be processed before those
that
are, and that processing requires the replace()
function. This processing is
performed with a single user-defined function:
<xsl:function name="djb:multireplace" as="xs:string">
<!-- this function handles all replacements that aren't one-to-one -->
<xsl:param name="instring" as="xs:string"/>
<xsl:param name="pairs" as="element(pair)*"/>
<xsl:choose>
<xsl:when test="empty($pairs)">
<xsl:sequence select="$instring"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="result"
select="replace($instring, $pairs[1]/orth, $pairs[1]/ipa)"/>
<xsl:sequence select="djb:multireplace($result, remove($pairs, 1))"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
When the function is called for the first time on a string, $pairs
has
been declared as a sequence of all <pair>
elements that are children of
<multiple>
. The function (in the <xsl:otherwise>
clause)
recruits the first <pair>
elements in that sequence and uses its
<orth>
child as the match pattern and its <ipa>
child as
the replacement. It then removes the <pair>
it has just used from the head
of the sequence of <pair>
elements and calls itself recursively; when
there are no <pair> elements left to process, the recursion terminates and the last
string value is returned. The separation of the replace()
function from the
arguments on which it operates makes it easy for the developer to insert, delete,
or
rearrange the pairings in the mapping file.
Processing mappings that are one to one
Mapping that are one to one can be processed with a single instance of the
translate()
function:
<xsl:variable name="unitary-orth" select="string-join($unitary//orth,'')" as="xs:string"/>
<xsl:variable name="unitary-ipa" select="string-join($unitary//ipa,'')" as="xs:string"/>
<xsl:sequence select="translate($temp,$unitary-orth,$unitary-ipa)"/>
The table-driven design all but eliminates the opportunity for off-by-one errors when
aligning the individual characters in the match argument with those in the replacement
argument.
References
[Adams and Birnbaum 1997] Adams, Lawrence D. and
David J. Birnbaum. Text technology 7, 1 (1997): 1–17.
Corrected reprint available at
http://poetry.obdurodon.org/reports/adams-birnbaum.xhtml
[Birnbaum and Thorsen 2015] Birnbaum, David J.,
and Elise Thorsen. Markup and meter: Using XML tools to teach a computer to think about
versification.
Presented at Balisage: The Markup Conference 2015, Washington, DC,
August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Birnbaum01.
https://www.balisage.net/Proceedings/vol15/html/Birnbaum01/BalisageVol15-Birnbaum01.html
[Cocoon] Apache Cocoon.
http://cocoon.apache.org/
[Gamma 1994] Gamma, Erich, Richard Helm, Ralph
Johnson, and John Vlissides. Design patterns: elements of reusable object-oriented
software. Boston, MA: Addison-Wesley, 1994.
[Kay 2008] Kay, Michael. XSLT 2.0
and XPath 2.0 programmer’s reference, 4th edition. Indianapolis: Wiley/Wrox,
2008.
[Kay 2009] Kay, Michael. You pull, I’ll push: on
the polarity of pipelines.
Presented at Balisage: The Markup
Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup
Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Kay01.
https://www.balisage.net/Proceedings/vol3/html/Kay01/BalisageVol3-Kay01.html
[Kiparsky 1973] Kiparsky, Paul. Abstractness,
opacity, and global rules. In Three dimensions of
linguistic theory, O. Fujimura, ed, pages 57–86. Tokyo: TEC, 1973.
[Knuth 1974] Kunth, Donald E. Structured
programming with go to
statements.
Computing surveys 6(4): 261–301. doi:https://doi.org/10.1145/356635.356640.
http://web.archive.org/web/20130731202547/http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf
[Koenig 1995] Patterns and
antipatterns.
Journal of object-oriented programming 8(1): 46-48
(1995)
[McIlroy 1964] The origin of Unix
pipes.
Bell Labs document archive.
http://doc.cat-v.org/unix/pipes/
[Novatchev 2010] Novatchev, Dimitre. Response to
Is daisy chaining xslt an accepted practice?
http://stackoverflow.com/questions/4571956/is-daisy-chaining-xslt-an-accepted-practice
[Piez 2016] Piez, Wendell. Framing the problem:
building customized editing environments and workflows.
Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5,
2016. In Proceedings of Balisage: The Markup Conference
2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Piez01.
https://www.balisage.net/Proceedings/vol17/html/Piez01/BalisageVol17-Piez01.html
[Piez 2014] Piez, Wendell. Response to Ignoring
ambiguous
matches.
https://www.oxygenxml.com/archives/xsl-list/201402/msg00085.html
[PyFunctional] PyFunctional (Python package).
https://github.com/EntilZha/PyFunctional
[Saxonica <xsl:iterate>] xsl:iterate
. Saxon 9.8 Documentation.
http://www.saxonica.com/html/documentation/xsl-elements/iterate.html
[Sugrue 2010] Sugrue, James. Visitor pattern
tutorial with Java examples.
https://dzone.com/articles/design-patterns-visitor
[Why] Why is join() a string method instead
of a list or tuple method?
Part of the [Python] Design and
history FAQ.
https://docs.python.org/3/faq/design.html#why-is-join-a-string-method-instead-of-a-list-or-tuple-method
[XPath 3.1] XML Path Language (XPath) 3.1. W3C
Recommendation, 21 March 2017. https://www.w3.org/TR/xpath-31/
[XProc] XProc: An XML Pipeline Language. W3C
Recommendation, 11 May 2010. https://www.w3.org/TR/xproc/