Introduction
Class attributes, sometimes known by different names and with slightly different purposes,
are ubiquitous in
all common vocabularies. In DocBook they are called “role”, in TEI they are called
“type” or “rend”, in HTML and
DITA they are actually called “class”, and in JATS they are called “content-type”,
“list-type”, “sec-type”, etc.
With the exception of TEI’s @type
attribute, they may not contain just a single value, but
space-separated tokens. TEI is special in that it distributes what goes into @class
over (at least)
four attributes, @type
, @subtype
, @rend
and @rendition
. In the
generic attributes, for example @role
, @class
, and @content-type
, each of
these tokens can mean anything. Sometimes (often) they influence whether or how the
element is displayed or
converted. In TEI, this is the mainly the purpose of the @rend
attribute.
There are typically no strict sets of available values. If there were, the vocabulary
might as well offer a
dedicated element with that name. (Customized schemas may restrict these values though.)
DITA, in this regard, is quite extreme, as it moves much of the semantics from the
element and attribute names
into class attribute tokens. Processing documents with many space-separated class
attribute tokens can incur
a significant performance penalty, as John Lumley presented at XML London in 2015 [Lumley Kay 2015].
Creating class attributes, on the other hand, is not particularly difficult or demanding
in terms of computing
power. So why this paper?
In his JATS-Con 2010 contribution [Piez 2010], Wendell Piez described two lanes of
“vertical customization“ for what would later become the JATS Preview Stylesheets:
One lane for CSS and the other
for XSLT customizations. He calls them vertical because both pile overridden or additional
rules upon
off-the-shelf basic rules. CSS adaptation is often the first customization method
at hand, he argues, but
a diverse repertoire of XSLT customizations is often needed when CSS means have been
exhausted.
Class tokens are the most widespread and versatile tools available when it comes to
applying CSS formatting
to content. Therefore generating class tokens with XSLT can be seen as the middle
ground between pure CSS styling
and more elaborate XSLT content transformations.
So one should assume that manipulating class attributes can be done without writing
too much code. This may be
true if “code” means “new code”, but, as we will see when examining popular XML to
HTML conversion stylesheets, it
often involves copying, and only slightly changing, sometimes large extents of existing
code, which can become a
maintenance burden.
This paper is primarily about writing customizable stylesheets with a particular focus
on “don’t repeat
yourself” or, more specifically, on avoiding a copy&paste approach of re-defining
large functions or templates
in the importing stylesheets. The techniques presented in this paper are not particularly
new, but apparently they
are underappreciated in this problem space. They all work with XSLT 2. They are about
fine-grained templates that
match in auxiliary, aspect-oriented modes, where one “aspect” can be “creating class
attributes.”
Most conversions from one content-centric XML application to another follows rule-based
and sometimes
computational programming patterns, as described in Chapter 17 of Michael Kay’s XSLT
and XPath 2.0 book [Kay XSLT 2.0]. These are appropriate and versatile techniques to complete the task. However, it
turns
out that templates or functions in these stylesheets are often too coarse-grained
to allow, for example, the
addition of a custom, computed token to the class list of a result element. We will
look at these templates or
functions and occasionally suggest tweaks to the off-the-shelf stylesheets so that
they offer better customization
hooks to the custom, importing stylesheets.
This is not so much about rekindling a push vs. pull approach discussion. It is taken
for granted that the
commonly used XSLT stylesheets here operate in push mode (the transformation is driven
by the source document). It
is more about offering finer-grained, context-dependent customization hooks for rule-based
push-mode templates or
computational functions/named templates.
The proposed hooks are almost exclusively transformations of the context element in
a certain XSLT mode, for
example a mode that computes class attribute tokens from attributes and class attributes
with token lists from
elements, by processing their attributes.
The overall approach suggested in this paper is not limited to class attribute generation.
It can also be
useful, for example, when mapping element names between XML applications or in determining
whether a given element
is meant to be rendered inline.
Common Vocabularies and Conversion Stylesheets
We will examine in more or less detail how the class generation templates or functions
work for certain widely
used conversion stylesheets. We will in particular test whether it is easy to modify,
by means of
xsl:import
and overriding templates or functions, the class generation in certain ways:
-
Add a token that represents the source element’s (local) name;
-
make sure that this source element name token will be created even in the absence
of the source
attribute that will become a class attribute by default;
-
add a token for a subtype that stems from another attribute (and potentially suppress
processing this
attribute otherwise);
-
add a token that is somehow calculated by processing nodes relative to the context
(example: a table row
shall get a class token that toggles whether the row is hidden);
-
for certain elements, suppress certain values of the class token that is produced
by default.
DocBook to HTML
We transform this source document:
<section xmlns="http://docbook.org/ns/docbook" version="5.1">
<title>Title</title>
<para role="foo">Para</para>
</section>
using the XSLT 2.0 version of the DocBook stylesheets [DocBook XSLT 2.0].
The paragraph will be output as:
<p class="foo">Para</p>
If the input is a simpara
, the output will be the same. Now suppose that you generate the HTML
for a browser-based XML editor (that actually edits HTML and transforms it back to
DocBook) that offers
different context-dependent formatting controls for para
and simpara
. You want to add
a token with the element’s local name to the class list in order to invoke the appropriate
controls. You look at
the template that happens to render both elements:
<xsl:template match="db:para|db:simpara">
<xsl:param name="runin" select="()" tunnel="yes"/>
<xsl:param name="class" select="()" tunnel="yes"/>
<!-- irrelevant parts left out -->
<p>
<xsl:sequence select="f:html-attributes(., @xml:id, $class)"/>
<xsl:copy-of select="$runin"/>
<xsl:apply-templates/>
</p>
</xsl:template>
The function f:html-attributes()
accepts a string $class
that will be added to the
class attribute tokens, which is good. So in the importing stylesheet, one can use
this template:
<xsl:template match="db:para|db:simpara">
<xsl:next-match>
<xsl:with-param name="class" as="xs:string" select="local-name()" tunnel="yes"/>
</xsl:next-match>
</xsl:template>
For structural divisions such as chapter
, appendix
, or section
,
nothing has to be adapted. The resulting HTML element has both the local name and
any @role
that
they may have as a class token. The same holds for admonitions such as caution
.
<caution role="bar">
<para>Caution</para>
</caution>
will become
<div class="caution bar admonition">
<h3>Caution</h3>
<div class="admonition-body">
<p class="para">Caution</p>
</div>
</div>
Suppose that you want to add the content of the @condition
attribute to the resulting class
tokens. You will notice that
<caution role="bar" condition="foo">
<para>Caution</para>
</caution>
yields a result that is identical to the one we saw above. How can we make the token(s)
in
@condition
also appear in the class list of the resulting div
?
The class attribute that is generated for admonition-like elements is populated by
these two nested function
calls:
<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
f:html-extra-class-values(., 'admonition'))"/>
If we want to add the value of condition
to the class list, we could override the 15-line-long
admonition template:
<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
<xsl:choose>
<xsl:when test="$admonition.graphics">
<xsl:apply-templates select="." mode="m:graphical-admonition"/>
</xsl:when>
<xsl:otherwise>
<div>
<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
f:html-extra-class-values(., 'admonition'))"/>
<xsl:call-template name="t:titlepage"/>
<div class="admonition-body">
<xsl:apply-templates/>
</div>
</div>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
with this one:
<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
<xsl:choose>
<xsl:when test="$admonition.graphics">
<xsl:apply-templates select="." mode="m:graphical-admonition"/>
</xsl:when>
<xsl:otherwise>
<div>
<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
f:html-extra-class-values(., ('admonition', @condition)))"/>
<xsl:call-template name="t:titlepage"/>
<div class="admonition-body">
<xsl:apply-templates/>
</div>
</div>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
We are displaying the full templates here in order to give an impression of the redundancy
incurred. In the
author’s experience, copy&paste of lengthy functions or templates poses a significant
maintenance burden.
New features or bugfixes in the overridden code will not make it to the adapted code
unless someone takes the
time and compares/transfers the changes.
An alternative approach within the boundaries of the existing stylesheet design is
to override the
f:html-extra-class-values()
or f:html-attributes()
functions. Since the context
element is the first argument for both, one could build in a switch that checks whether
the context is an
admonition and then add @condition
to the resulting @class
. Or one could do this
change for all contexts, but that might introduce undesired class attribute tokens
in other places.
Overriding a generic function or a named template in order to effect specific changes
for certain contexts
is not advisable, for the reason given above (maintainability). If the task at hand
is to process documents in a
context-dependent fashion, matching templates, rather than named templates or functions,
naturally suggest themselves. And there is a straightforward way that allows keeping
the current function call
mechanism without the need to bloat the functions’ bodies with context-dependent conditionals.
We’ll look at
that in section “Making the DocBook to HTML Conversion More Extensible”.
TEI to HTML
As hinted at in section “Introduction”, TEI differentiates between types (including subtypes) and rendering
information. When it comes to generating @class
and
@style
attributes, the TEI XSL Stylesheets [TEI XSL]
primarily look at rendering information that may be declared at various locations
in the document, and it adds
the element’s local name to the class tokens by default. The strategy is described
in the code
documentation for the named template makeRendition
as follows:
Work out rendition. In order of precedence, we first look at
@rend
; if that does not exist, @rendition
and
@style
values are merged together; if neither of those exist, we look at default renditions
in
tagUsage
; if
default
is set to false, we do nothing; if default
has a value, use that for
@class
; otherwise, use the element name as a value for @class.
The template makeRendition
will call a function,
tei:processRend()
for processing what’s in @rend
,
and in its absence it will look at @rendition
and @style
and use different functions
to compute the HTML attributes @class
and @style
.
Although the @rend
attribute may contain arbitrary text, tei:processRend()
has
very specific expectations about the @rend
tokens that it will map to @class
tokens.
If none of the mappings catches on, the original token will be forwarded to the class
attribute. However, it is
not possible to add computed class tokens unless one is willing to override
tei:processRend()
, makeRendition
, or the templates that invoke
makeRendition
. They are sometimes as compact as
<xsl:template match="tei:gloss">
<span>
<xsl:call-template name="makeRendition"/>
<xsl:apply-templates/>
</span>
</xsl:template>
but there might be many of them that need customizing.
Given the gloss
example input from the TEI P5 Guidelines [TEI gloss]:
We may define <term xml:id="tdpv" rend="sc">discoursal point of view</term> as <gloss
target="#tdpv">the relationship, expressed through discourse structure, between the
implied author or some
other addresser, and the fiction.</gloss>
The output contains <span class="gloss">…
.
If the input has a rend="foo"
attribute on gloss
, the output will be
<span class="foo">…
.
In order to retain the token gloss
in the span’s class list, one needs to override the template
in an importing stylesheet:
<xsl:template match="tei:gloss">
<span>
<xsl:call-template name="makeRendition">
<xsl:with-param name="auto" select="local-name()"/>
</xsl:call-template>
<xsl:apply-templates/>
</span>
</xsl:template>
For simple templates this overriding is acceptable. However, makeRendition
is called 70 times,
and some of the calling templates comprise more than 50 lines of code. So if overriding
makeRendition
globally is not an option, potentially much redundancy will be created because the
stylesheets lack granularity or customization hooks.
So far we have looked at the rendering attributes, but what about @type
and
subtype
? If they are used on gloss
, they don’t show up in the result. We can add
them to the class list in the same customized template that we used for local-name()
.
For other typed elements, such as TEI’s div
, the @type
attribute will be converted
to a @class
attribute. However, the subtype
is missing in the resulting HTML
div
’s class list.
The classes of div-like elements will be created by a template named divClassAttribute
. It will
call the known template makeRendition
with the (non-tunneling) default
parameter set
to the value of the div
’s @type
attribute. One needs to redefine
divClassAttribute
(24 lines of code) in order to add @subtype
. If one wanted to add
it only for selected contexts, one would need to introduce conditionals in this otherwise
generic template,
blowing up the generic template even more—unless the template provided a hook that
allows context-aware creation
of the class token list.
At least these generic named templates make sure, for selected elements, that there
is a hook for overriding
default class attribute generation. The situation would be worse if class generation
relied on matching and
transforming existing @type
or @rend
attributes, when in their absence there wouldn’t
be a class attribute at all. But similar to the generic functions in section “DocBook to HTML”, if one
wants to supply non-default computed class tokens, one needs to either redefine possibly
many matching templates
from which the generic templates/functions are called or add conditional logic to
the generics.
JATS to HTML
The JATS Preview Stylesheets [JATS XSL] are less elaborate than the DocBook or
TEI stylesheets. They accept fewer parameters on invocation, they don’t support chunking,
etc. As Tony Graham
describes in his paper about customizing the XSL-FO rendering, this is so on purpose.
“Deliberately not
supporting every possible style permutation hasn’t precluded the JATS Preview stylesheets
from supporting other
people customizing the stylesheets nor does it stop you from using the stylesheets
as a base for customized
output.” [Graham 2014]
This is exactly the customization by xsl:import
and overriding templates that the current paper
is about. So how well do the JATS Preview Stylesheet fare?
There are two templates that process elements with @content-type
:
<xsl:template match="p | license-p">
<p>
<xsl:if test="not(preceding-sibling::*)">
<xsl:attribute name="class">first</xsl:attribute>
</xsl:if>
<xsl:call-template name="assign-id"/>
<xsl:apply-templates select="@content-type"/>
<xsl:apply-templates/>
</p>
</xsl:template>
and
<xsl:template match="named-content">
<span>
<xsl:for-each select="@content-type">
<xsl:attribute name="class">
<xsl:value-of select="translate(.,' ','-')"/>
</xsl:attribute>
</xsl:for-each>
<xsl:apply-templates/>
</span>
</xsl:template>
The class tokens for named-content
cannot be extended (for example, with a token
'named-content') without rewriting the complete template. It’s not difficult at all
but it implies redundant
replication of (more or less) complex functionality nonetheless.
In the case of the template that matches JATS’s p
element, there is an
<xsl:apply-templates select="@content-type"/>
, which does nothing by default, so this serves
as a hook for custom processing in the importing stylesheets. If one added
<xsl:template match="p/@content-type">
<xsl:attribute name="class" select="."/>
</xsl:template>
in the customization, one would lose the first
token that is created by default for paragraphs
without predecessor. So this is extra functionality that one needs to copy from the
original template. That’s
not too much of redundancy because the code snippet is small, but still this is redundancy
that will impede
maintenance.
Apart from the different type attributes that JATS elements may have, almost every
element may carry
a @specific-use
attribute. It is somewhat similar to DocBook’s @condition
attribute.
It may hold information that, among other purposes, the content is for a limited audience,
for a specific output
format, etc. This kind of information might be used by a rendering process to filter
out content, for instance.
But in the author’s experience, more often than not, what is in @specific-use
is useful in the HTML
@class
attribute, too. For example it might be required that content marked
specific-use="optional"
not be removed during the rendering, it rather be collapsed by CSS or
Javascript, based on a the class token “optional
”.
However, there is no hook in the preview stylesheets that would allow customizations
to include more tokens
in the @class
attribute. Other attributes, such as @list-content
or
milestone/@rationale
are candidates for inclusion in class token lists, too.
We didn’t see the amount of monolithic templates, underequipped with hooks, in the
JATS Preview Stylesheets
as we saw in the TEI stylesheets, but there is room for improvement in the JATS matching
templates, too.
Problem Summary
The problems with the HTML converters presented in this section fall into one or more
of these
classes:
-
Where functions or named templates are responsible for
creating class attributes:
-
They are sometimes complex.
-
Overriding them creates redundancy.
-
Hooks for context-aware processing are rarely provided.
-
Where matching templates are responsible for transforming source
elements:
-
They are sometimes complex.
-
Overriding them creates redundancy.
-
A requirement to create class attributes differently for many elements entails changing
matching
templates for many elements.
-
Where matching templates are responsible for transforming source
attributes:
-
If the respective source attribute is missing, no class attribute will be generated.
-
That makes it hard to generally add, for example, the source element’s name to the
resulting class
list
Addressing the Problems (and some more)
Transform Elements in a Dedicated Class Attribute Mode
Yes, that’s proposed here as a fix for almost all issues listed in the previous section.
Let’s refactor the first template in section “JATS to HTML”:
<xsl:template match="p | license-p">
<p>
<xsl:apply-templates select="." mode="class-att"/>
<xsl:call-template name="assign-id"/>
<xsl:apply-templates/>
</p>
</xsl:template>
The conditional first
token creation has disappeared. Instead, the element itself is
transformed in a newly introduced mode called class-att
.
The default template for all elements in this mode is:
<xsl:template match="*" mode="class-att" as="attribute(class)?">
<xsl:call-template name="make-class">
<xsl:with-param name="tokens" as="xs:string*">
<xsl:apply-templates select="@content-type, @list-type, @list-content,
@rationale, @sec-type, @specific-use" mode="#current"/>
</xsl:with-param>
</xsl:call-template>
</xsl:template>
By default, it transforms all kinds of …-type
attributes in the same class-att
mode. (This is not an exhaustive list of all possible attributes that may end up as
class tokens, just the
frequently occurring ones and some marginally important examples.)
By default, each of these attributes will become a string with the same value as the
attribute:
<xsl:template match="@*" mode="class-att" as="xs:string">
<xsl:sequence select="string(.)"/>
</xsl:template>
These strings, if there are any, will be joined into a
space-separated list of tokens in a newly created class attribute:
<xsl:template name="make-class" as="attribute(class)?">
<xsl:param name="tokens" as="xs:string*"/>
<xsl:if test="exists($tokens[normalize-space()])">
<xsl:attribute name="class" separator=" "
select="distinct-values($tokens[normalize-space()])"/>
</xsl:if>
</xsl:template>
Within this framework, it is now possible to selectively add the first
token to paragraphs that
lack predecessors, without the need to insert a conditional statement into the template
that processes them in
default mode:
<xsl:template match=" p[not(preceding-sibling::*)]
| license-p[not(preceding-sibling::*)]"
mode="class-att" as="attribute(class)?">
<xsl:attribute name="class" separator=" ">
<xsl:sequence select="'first'"/>
<xsl:next-match/>
</xsl:attribute>
</xsl:template>
The xsl:next-match
instruction will look for the next matching template in the same mode, be it
imported (lower import precedence) or be it in the same stylesheet and have lower
priority. In this case, it is
the template that matches all elements in class-att
mode. This next-matching template will produce
a @class
attribute that will be cast to a string. The current template will prepend string
'first'
and then turn this sequence of strings into a space-separated value for a newly generated
@class
attribute.
Using the powerful and elegant xsl:next-match
instruction, the resulting class list can be
selectively extended. Suppose you want to include the name of license-p
elements in the
@class
attribute, in order to be able to style it specially (this is a contrived example
because
former license-p
s will end up in what can be selected in CSS by div.metadata-chunk p
,
so there is no need to style p.license-p
by its own class, but the fundamental utility of this
lightweight ex-post class token decoration should be evident at this point):
<xsl:template match="license-p" mode="class-att" as="attribute(class)?">
<xsl:attribute name="class" separator=" ">
<xsl:sequence select="name()"/>
<xsl:next-match/>
</xsl:attribute>
</xsl:template>
The computed priority of this template, according to the XSLT specification [XSLT 3 priority],
is 0
. The template that matches *
has the computed priority −0.5
,
while the template that adds 'first'
has a predicate in the matching pattern and therefore
its computed priority is +0.5
. One would expect that for a license-p
, the first
token will be 'first'
, followed by 'license-p'
, followed by the attribute values
of @content-type
, @specific-use
, etc. Transforming this input:
<license>
<license-p specific-use="bar baz" content-type="foo">© 2020 Jane Smith</license-p>
</license>
will indeed yield this output:
<div class="metadata-area">
<p class="metadata-entry"><span class="generated">License: </span></p>
<div class="metadata-chunk">
<p class="first license-p foo bar baz" id="d2e13">© 2020 Jane Smith</p>
</div>
</div>
(The value of @content-type
precedes the value of @specific-use
because of the
order-preserving sequence concatenation operator (comma) in "@content-type, @list-type, …,
@specific-use"
).
It is quite easy to filter out certain tokens of, for example, @specific-use
:
<xsl:template match="license-p/@specific-use" mode="class-att" as="xs:string?">
<xsl:variable name="orig" as="xs:string?">
<xsl:next-match/>
</xsl:variable>
<xsl:sequence select="tokenize($orig)[not(. = 'baz')]"/>
</xsl:template>
Or if the tokens 'foo'
and 'bar'
shouldn’t appear in the class attribute no matter
where they came from:
<xsl:template match="license-p" mode="class-att" priority="1">
<xsl:variable name="orig" as="attribute(class)?">
<xsl:next-match/>
</xsl:variable>
<xsl:call-template name="make-class">
<xsl:with-param name="tokens" select="tokenize($orig)[not(. = ('foo', 'bar'))]"/>
</xsl:call-template>
</xsl:template>
Please note that if you are going to use this in XSLT 2, you might need to replace
tokenize($orig)
with tokenize($orig, '\s+')
. Recent versions of Saxon [Saxon], however, will accept
XPath 3.1 functions even if the stylesheet’s XSLT version is 2.0.
Also note, and this is an important thing to remember for XSLT novices, that if you
use this priority
1
template in an importing stylesheet, you may safely omit the priority
attribute (unless there are other priority clashes you need to address). This is because
templates that match
the same items always have precedence when they occur in importing stylesheets. They have a
higher import precedence [XSLT 3 precedence], which always trumps
priority.
Although this solution uses a named template, too, this named template is not monolithic
at all. It merely
creates class attributes from tokens. Also the matching templates in this solution
are less “ambitious” and more
fine-grained than the original templates.
Making the DocBook to HTML Conversion More Extensible
Let’s modify the admonition template of section “DocBook to HTML”, using a newly introduced
m:extra-class-values
mode:
<xsl:template match="db:note|db:important|db:warning|db:caution|db:tip|db:danger">
<xsl:choose>
<xsl:when test="$admonition.graphics">
<xsl:apply-templates select="." mode="m:graphical-admonition"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="extra-class-values" as="xs:string*">
<xsl:apply-templates select="." mode="m:extra-class-values"/>
</xsl:variable>
<div>
<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
f:html-extra-class-values(., $extra-class-values))"/>
<xsl:call-template name="t:titlepage"/>
<div class="admonition-body">
<xsl:apply-templates/>
</div>
</div>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
But that’s not the final state of optimization for greater maintainability. Let’s
take this change back
and call a f:html-extra-class-values()
function that only takes the context node as the single argument:
<xsl:sequence select="f:html-attributes(., @xml:id, local-name(.),
f:html-extra-class-values(.))"/>
The function was previously declared as follows:
<xsl:function name="f:html-extra-class-values" as="xs:string?">
<xsl:param name="node" as="element()"/>
<xsl:sequence select="f:html-extra-class-values($node, ())"/>
</xsl:function>
<xsl:function name="f:html-extra-class-values" as="xs:string?">
<xsl:param name="node" as="element()"/>
<xsl:param name="extra" as="xs:string*"/>
<xsl:variable name="classes" as="xs:string*">
<xsl:if test="$node/@role">
<xsl:sequence select="tokenize($node/@role, '\s+')"/>
</xsl:if>
<xsl:if test="$node/@revision">
<xsl:sequence select="concat('rf-', $node/@revision)"/>
</xsl:if>
<xsl:sequence select="$extra"/>
</xsl:variable>
<xsl:if test="exists($classes)">
<xsl:sequence select="string-join(distinct-values($classes), ' ')"/>
</xsl:if>
</xsl:function>
The class attributes can only be overridden in a context-dependent way if a different
$extra
argument is passed in each context. If we want to use the @condition
tokens for the
@class
attribute in caution
but not in the other admonition elements, we need to
insert a conditional switch in the admonition template or clone this template and
modify the clone only for
caution
.
The following refactoring will replace the single-argument invocation and also implement
the current behavior
of the two-argument invocation for admonitions:
<xsl:template match="*" mode="m:extra-class-values">
<xsl:apply-templates select="@*" mode="#current"/>
</xsl:template>
<xsl:template match="db:note | db:important | db:warning | db:caution | db:tip | db:danger"
mode="m:extra-class-values" as="xs:string*">
<xsl:sequence select="'admonition'"/>
<xsl:next-match/>
</xsl:template>
<xsl:template match="@*" mode="m:extra-class-values"/>
<xsl:template match="@role" mode="m:extra-class-values" as="xs:string+">
<xsl:sequence select="tokenize(.)"/>
</xsl:template>
<xsl:template match="@revision" mode="m:extra-class-values" as="xs:string">
<xsl:sequence select="concat('rf-', .)"/>
</xsl:template>
<xsl:function name="f:html-extra-class-values" as="xs:string*">
<xsl:param name="node" as="element()"/>
<xsl:variable name="tokens" as="xs:string*">
<xsl:apply-templates select="$node" mode="m:extra-class-values"/>
</xsl:variable>
<xsl:sequence select="distinct-values($tokens[normalize-space()])"/>
</xsl:function>
This is more code than it was initially, but if additional class attributes are needed
in certain
contexts, the additional templates in the importing stylesheet are much simpler:
<xsl:template match="db:caution/@condition" mode="m:extra-class-values">
<xsl:sequence select="tokenize(.)"/>
</xsl:template>
The important change that makes previously inflexible functions or named templates
versatile is to let a
template match in a dedicated mode from within the function or named template body.
Improving the TEI to HTML Conversion
The monolithic template makeRendition
in section “TEI to HTML” can be refactored in a
similar way. Inside the template, or inside other monolithic functions called from
there, such as
tei:processRendition()
, the context element will be transformed in a dedicated mode. In order to
indicate that the refactored function (or the named template) corresponds to the dedicated
mode, the mode’s name
may be identical to the function name. This is no technical requirement, but the author
recommends that you
follow this convention.
Also functions that do other things than generating class attributes may be refactored
in this way;
particularly tei:isInline()
, a function that accepts an element as its argument and decides whether
this element is inline or block-level, can replace its 128 xsl:when
branches with matching
templates in mode="tei:isInline"
while still keeping the same function signature.
This way, TEI customizations that use other @rend
attribute values than 'display'
or 'block'
can be declared block elements in a customization, without redefining this
140-lines-long function. This xsl:when
clause:
<xsl:when test="tei:match(@rend,'display') or tei:match(@rend,'block')">false</xsl:when>
will become:
<xsl:template match="*[tei:match(@rend,'display') or tei:match(@rend,'block')]" mode="tei:isInline">
<xsl:sequence select="false()"/>
</xsl:template>
and can be extended with:
<xsl:template match="*[tei:match(@rend,'list-item')]" mode="tei:isInline">
<xsl:sequence select="false()"/>
</xsl:template>
in an importing stylesheet.
Note
In order to mimic the previous xsl:choose/xsl:when behavior, it might be necessary to
add explicit priorities to some of the matching templates in tei:isInline
mode.
Mapping Element Names
This is an example of a LaTeXML to TEI conversion. Most source elements can be mapped
to target elements in
a linear fashion. The mapping may be either coded into a function with many case switches,
or it can be done by
matching templates:
<xsl:template match="*" mode="latexml2tei">
<xsl:variable name="new-name" as="xs:string">
<xsl:apply-templates select="." mode="latexml2tei-new-name"/>
</xsl:variable>
<xsl:element name="{$new-name}">
<xsl:apply-templates select="." mode="latexml2tei-style"/>
<xsl:apply-templates select="@*" mode="#current"/>
<xsl:if test="self::p">
<xsl:apply-templates select="../@xml:id" mode="#current"/>
</xsl:if>
<xsl:apply-templates mode="#current"/>
</xsl:element>
</xsl:template>
Note
Creating a function somenamespace:latexml2tei-new-name()
that transforms the element argument in
mode="latexml2tei-new-name"
would make storing the template output in a variable
dispensable. We didn’t think about this when we wrote the stylesheet in 2018.
Some sample templates in these modes:
<xsl:template match="enumerate" mode="latexml2tei-new-name">
<xsl:sequence select="'list'"/>
</xsl:template>
<xsl:template match="enumerate" mode="latexml2tei-style">
<xsl:attribute name="rend" select="'numbered'"/>
</xsl:template>
<xsl:template match="enumerate/item" mode="latexml2tei">
<xsl:apply-templates mode="#current"/>
</xsl:template>
Refactoring Monolithic Functions in a Hub to BITS Conversion
Another example for refactoring a monolithic function to something finer-grained is
taken from a Hub-to-BITS
conversion library [hub2bits]. (Hub XML is le-tex’s DocBook-derived intermediate XML
format. For this example, it can be assumed as equivalent to DocBook.)
There is one of several element name mapping functions, jats:part-submatter()
, that returns,
for a given DocBook context element, the target BITS element name. (The namespace
prefix is jats
although the target format is BITS and although JATS doesn’t have a namespace anyway;
this is because this
library is also used for DocBook-to-JATS conversions, and we simply use
xmlns:jats="http://jats.nlm.nih.gov"
as a prefix for functions, keys, and modes related to any of
the JATS family vocabularies. We do this because XSLT function names need to be namespaced
and we love namespaces.
There, we said it.) Before refactoring, the function was defined as follows:
<xsl:function name="jats:part-submatter" as="xs:string">
<xsl:param name="elt" as="element(*)"/>
<xsl:choose>
<xsl:when test="name($elt) = ('title', 'info', 'subtitle', 'titleabbrev')">
<xsl:sequence select="'book-part-meta'"/>
</xsl:when>
<xsl:when test="name($elt) = ('toc')">
<xsl:sequence select="'front-matter'"/>
</xsl:when>
<xsl:when test="name($elt) = ('bibliography', 'glossary', 'appendix', 'index')">
<xsl:sequence select="'back'"/>
</xsl:when>
<xsl:when test="name($elt) = 'section' and $elt[matches(dbk:title/@role, $jats:additional-backmatter-parts-title-role-regex)]">
<xsl:sequence select="'back'"/>
</xsl:when>
<xsl:otherwise>
<xsl:sequence select="jats:book-part-body($elt/..)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
After refactoring, it’s as simple as:
<xsl:function name="jats:part-submatter" as="xs:string">
<xsl:param name="elt" as="element(*)"/>
<xsl:apply-templates select="$elt" mode="jats:part-submatter"/>
</xsl:function>
The templates in mode="jats:part-submatter"
have been created in order to provide the previous
functionality, only in a much more extensible way:
<!-- additional advantage over xsl:choose in the function body with test="name($elt) = ('title', …)":
all the flexibility of matching patterns -->
<xsl:template match="dbk:title | dbk:info | dbk:subtitle | dbk:titleabbrev" mode="jats:part-submatter" as="xs:string">
<xsl:sequence select="'book-part-meta'"/>
</xsl:template>
<!-- this way, we can handle front matter appendices quite elegantly: -->
<xsl:template match="dbk:toc | dbk:appendix[following-sibling::dbk:chapter | following-sibling::dbk:part]"
mode="jats:part-submatter" as="xs:string">
<xsl:sequence select="'front-matter'"/>
</xsl:template>
<xsl:template match="dbk:bibliography | dbk:glossary | dbk:appendix |
dbk:section[matches(dbk:title/@role, $jats:additional-backmatter-parts-title-role-regex)]"
mode="jats:part-submatter" as="xs:string">
<xsl:sequence select="'back'"/>
</xsl:template>
<!-- previously xsl:otherwise: -->
<xsl:template match="*" mode="jats:part-submatter" as="xs:string">
<xsl:sequence select="jats:book-part-body(..)"/>
</xsl:template>
The css:content
Template
We at le-tex have been using the approach presented in this paper for some years now
in our JATS/BITS→HTML,
Hub/DocBook→HTML, TEI→HTML, and Hub/DocBook→JATS/BITS conversions.
In the beginning we wrote templates to convert attributes in the css
namespace
(@css:font-weight
, @css:background-color
, etc. [CSSa]) into
We then bundled them with other CSS-attributes-related mappings and wrappings to a template
called
css:content
[css:content]. It is called in order to transform attributes
and nodes for almost any element that is not metadata (the css
prefix and
xmlns:css="http://www.w3.org/1996/css"
is used for historic reasons, because we were dealing with
the @css:*
attributes primarily).
The only thing that this template doesn’t to is to map the source element name to
a target name and to
create the target element, or to unwrap the source element (and to ignore the generated
attributes in case of
unwrap).
This source element is transformed in a class-att
mode in order to compute the class attribute.
The @css:*
attributes are transformed in a mode hub2htm:css-style-overrides
; all CSS
attributes that are not discarded by this mode will be put together, semicolon-separated,
into an HTML @style
attribute. (For the DocBook→JATS conversion, these remaining attributes will be
either discarded, transformed in another way, or copied verbatim if allowed by a tweaked
target schema.)
The additional steps that css:content
performs are
-
create remaining attributes (copied verbatim or transformed)
-
create wrapper elements (b
, i
, sub
, sup
, …)
-
create other elements from attributes (generate a[@id]
from def-item/@id
when going from JATS to HTML’s unwrapped dt
, dd
sequences, for example)
-
transform the nodes in the #current
mode
-
make sure that the attributes are written to the result before the other nodes.
Whether a given attribute should be wrapped in the output is determined by transforming
the attributes
in the special mode css:map-att-to-elt
:
<xsl:template match="@css:font-style[. = ('italic', 'oblique')]" mode="css:map-att-to-elt" as="xs:string?">
<xsl:sequence select="$css:italic-elt-name"/>
</xsl:template>
The global variable $css:italic-elt-name
is defined as 'i'
in the HTML-generating
stylesheet and it is overridden to 'italic'
in the customization that is used to create JATS/BITS
from DocBook/Hub. If transformation of the attributes generates a sequence ('i', 'b')
, then the
transformed content will be wrapped like this: <b><i>content</i></b>
and the
wrapping-inducing CSS attributes will be removed from the attributes to be transformed.
This template heavily makes use of these dedicated token-generating modes for different
purposes. The
versatility of this approach is underlined by the fact that it could be adapted to
transformations between
different vocabularies with minimal customization, while giving full control to the
stylesheet customizer for
context-dependent class/style attribute creation and wrapper generation (not to speak
of the rest of the
transformation that happens in whatever mode is #current
, about which the stylesheet customizer
retains almost full control).
Final Thoughts
Functions or Named Templates?
Functions and named templates have been treated interchangeably in this paper so far.
It should be noted
though that functions should only be used when they help avoid redundancy in XPath
expressions (and only if
repeated evaluation of them in matching patterns won’t slow down template matching).
Many of the functions used in the DocBook (section “DocBook to HTML”) and TEI (section “TEI to HTML”) rendering stylesheets can be replaced with named templates, or, in the spirit of
this paper, with matching templates. If they need not be called in XPath expressions,
functions should be rather
written as named templates in the first place.
The reason is tunneling. We didn’t see much of tunneling in the examples in this paper.
On a non-public project, the author recently needed to filter out columns of tables
in BITS. The cells
should not be discarded, rather, a class token 'discarded'
should be added to them so that users
can toggle the display of these discarded cells. The column numbers were calculated
according to some criteria
taken from thead/th
and passed to the transformation of the whole table
as tunneled
integer parameters. A fact that not every XSLT developer knows: Even when switching
modes, from normal document
transformation to class-att
mode, tunneled parameters will be passed on. This made creation of the
'discarded'
tokens a very lightweight endeavor. This wouldn’t have been possible if the class
attributes had been created using functions, unless the cell matching templates caught
the tunneled parameter
and passed it to the function. This would have necessitated that the function accept
such a parameter, which is
unlikely for generic functions that create class attributes like the ones we have
seen.
Naming the Approach
Although it has been shown that there are more use cases for this design pattern than
creating class
attributes, one could call it “the class-att
approach.” Other candidates are “auxiliary modes
approach“, “micromode approach”, or “breakout mode approach.”
On the other hand, isn’t what this approach does just common sense? Writing monolithic
functions or
templates that lack context-dependent customization hooks might qualify as an antipattern,
but will doing the
opposite merit being called a pattern?
Maybe one can call the xsl:apply-template
hooks that calculate something small, like a string,
a token list, or an attribute, in a dedicated mode from within a formerly monolithic
function or template “mode
hooks”, and the dedicated modes such as class-att
may be called “hook modes”. Then an XSLT
developer can tell the other developer: “You should refactor this function so that
it only has a mode hook
inside, and then do the lifting in distinct matching templates in the hook mode.”
Caveats
Sometimes there are several levels of customization, and different XSLT developers
might be responsible for
maintaining these levels.
If the “hook mode” approach is chosen for a customizable stylesheet, then the people
who adapt (import) this
stylesheet need to be aware not to mix other approaches with the hook mode rules.
They should avoid something like this:
<xsl:template match="license-p[@content-type = 'foo']">
<p class="license-foo">
<xsl:apply-templates/>
</p>
</xsl:template>
If you import their stylesheet and try to modify the resulting @class
in
mode="class-att"
, nothing will happen. XSLT developers might get frustrated if they cannot rely
on this hook mode mechanism because intermediate imports spoiled it. Then use of this
approach will erode more
and more in each customization level and in each new customization they create on
top of “mode hook“ methodology
stylesheets. Therefore these auxiliary modes and the hooks for creating class attributes,
element names, etc.,
should be documented in the basic stylesheets.
References
[Lumley Kay 2015]
Lumley, John, and Kay, Michael. Improving Pattern Matching Performance in XSLT. XML London 2015.
https://www.saxonica.com/papers/xmllondon-2015jl.pdf. doi:https://doi.org/10.14337/XMLLondon15.Lumley01.
[Piez 2010] Piez, Wendell.
Fitting the Journal Publishing 3.0 Preview Stylesheets to Your Needs: Capabilities
and
Customizations. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010.
Bethesda (MD): National Center for Biotechnology Information (US); 2010.
https://www.ncbi.nlm.nih.gov/books/NBK47104/
[accessed 2020-07-02].
[Graham 2014] Graham, Tony.
Formatting JATS: as easy as 1-2-3. In: Journal Article Tag Suite Conference (JATS-Con)
Proceedings 2013/2014. Bethesda (MD): National Center for Biotechnology Information
(US); 2014.
https://www.ncbi.nlm.nih.gov/books/NBK189779/ [accessed 2020-07-02].
[Kay XSLT 2.0] Kay, Michael. XSLT 2.0 and XPath 2.0 Programmer’s Reference,
4th edition. John Wiley & Sons, 2008.
[DocBook XSLT 2.0]
Tovey-Walsh, Norman, Kosek, Jiří, et al. DocBook XSLT 2.0 Stylesheets.
https://github.com/docbook/xslt20-stylesheets [accessed 2020-07-02].
[TEI XSL]
Rahtz, Sebastian, et al. TEI XSL Stylesheets.
https://github.com/TEIC/Stylesheets
[accessed 2020-07-02].
[TEI gloss] TEI Consortium. Reference page for <gloss>. In
P5: Guidelines for Electronic Text Encoding and Interchange.
https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-gloss.html [accessed 2020-07-02].
[JATS XSL] Various contributors.
JATS Preview Stylesheets.
https://github.com/ncbi/JATSPreviewStylesheets [accessed 2020-07-02].
[XSLT 3 priority] Default Priority for Template Rules. In: Kay, Michael (ed.). XSL Transformations
(XSLT) Version 3.0. W3C Recommendation 8 June 2017. https://www.w3.org/TR/xslt-30/#dt-default-priority [accessed 2020-07-02].
[Saxon] Saxonica. Saxon XSLT Processor.
http://www.saxonica.com/products/products.xml [accessed 2020-07-02].
[XSLT 3 precedence] Stylesheet Import. In: Kay, Michael (ed.). XSL Transformations
(XSLT) Version 3.0. W3C Recommendation 8 June 2017. https://www.w3.org/TR/xslt-30/#dt-import-precedence [accessed 2020-07-02].
[hub2bits] Imsieke, Gerrit, Pufe, Maren, et al. hub2bits XSLT/XProc library.
https://github.com/transpect/hub2bits/commit/7c45174 [accessed 2020-07-02].
[CSSa] Imsieke, Gerrit.
Conveying Layout Information with CSSa. In: XML Prague
Proceedings 2013.
https://archive.xmlprague.cz/2013/files/xmlprague-2013-proceedings.pdf#page=73 [accessed 2020-07-02].
[css:content] Imsieke, Gerrit, et al. hub2html XSLT/XProc library.
https://github.com/transpect/hub2html/blob/master/xsl/css-atts2wrap.xsl [accessed 2020-07-02].