Why?
In the current world of the web, much of the textural
styling of XML-based documents is defined by declarations within Cascading Style
Sheets (CSS) CSS. These encourage a model where XML elements are defined as members of classes, and
the CSS stylesheets declare patterns
to match elements of given type, class, position, ancestry, siblings and progeny,
for which a set of values for properties (e.g. font, colour …), pre- and
post-ambles (numbering, text), layout type (block, none, inline.…) and possible geometry
(size, positioning...) are provided. Applicable patterns are
chosen through a relatively simple specificity and supersessional model. For many
simple presentational views of XML documents an agent who implements CSS
can produce acceptable results. Many XML dialects (e.g SVG, XHTML) describe properties
for many of their elements that are manipulable through CSS
styling, as an alternative to or overriding direct attributive properties.
Different views of the same document can be projected by binding different CSS stylessheets.
For example a 'table of contents' summary could be
generated by suppressing display (display : none
) of any element that is not a header or a descendant of a header. CSS works because
for most
cases it can provide a comparatively simple, and relatively easily understood, model
that mere mortals can work with. But generally CSS has one major
drawback for providing views of complex documents – it cannot, save for deletion or
minor addition of textual/image content, alter the topology of the
tree result, such as moving a caption element from the start to the end of the adjacent
table.
Note
It is possible in CSS to indicate geometric requirements, through directives such
as float:right
or position:absolute
,
which can alter the presentational order of components, but these are intimately tied
up with the CSS layout model, often supported in incompatible
ways and usually require both craft and measures to support multi-browser robustness.
Another feature is that CSS styling is through inheritance – it is not universally possible to 'block' application within a subtree; only by introducing subtree-scope overriding rules (which will still define properties) can a section of a document be protected from other CSS rules. (Mechanisms to revert to a parent-inherited value or the default initial value are being defined in later CSS versions (3, 4), but browser support is inconsistent.)
In more complex document creation from XML sources, XSLT3.0 provides vastly greater flexibility, by being effectively a full-blown (and higher-order capable) programming language with an XML tree as one of its principal datatypes. It has an arbitrary capability of manipulating a document and styling the result (e.g. through properties described at element attributes.)
Necessity is the Mother of Invention
– over recent years the author has had several occasions where document styling was
available in
CSS, but an XML document being processed in a full-blown XSLT setting needed some
of the CSS styling information applied within an XSLT execution. One
option would be to manually construct equivalent XSLT libraries (templates, attributes
etc.) which corresponded to the CSS intention and apply these to
the source. This would of course be subject to coherence drift
having to keep two styling masters in synchronisation. Another option was to
see whether some equivalent XSLT could be generated from the given CSS which would detemine the same properties for the document in
question, leaving the results principally as attributes on the result tree.
There are a number of cases where the effect of CSS styling might need to be evaluated before some final projection or display. Examples are:
-
When the effect of a particular CSS needs to be fixed on a subsection of a document, such that conflict with a wider scope CSS stylesheet is avoided. For example, an embedded SVG component within an XHTML page could be styled by its own CSS stylesheet. This could be in conflict with that used (or externally imposed) for the HTML page. In this case for document consistency we wish to fix the properties for the SVG. Similarly for example in the preparation of a paper for a conference that is styled by CSS, which needs to present embedded examples that are styled by another CSS[1] it can be helpful to project the localised effects completely, and keep the conference organisers from fiddling with the author's intent.
Note
Recent changes in the Balisage publication process, moving away from a final XHTML publication to one in PDF, has reduced some of the self-referential amusement this paper might offer the reader. However, if this had been presented at Balisage 2015, humour would have been restored.
-
The eventual processor for the document doesn't support CSS styling directly. For example XSL-FO doesn't support references to CSS resources, though it does use many attributive properties taken from the CSS2 model.
The goal of this work is to attempt to project as much of the effect of a CSS stylesheet onto a an XML document as possible, leaving the effects as either grounded property values as attributes on relevant elements, or modest sections of added content.
Note
The author is well aware that CSS is evolving (descriptions of parts of CSS3 date
back to 2001, the CSS2.2 spec is being continually re-drafted),
handled to differing degrees of compliance by a variety of tools, and that extensions
are numerous. This work is not indented to build a definitive CSS
→ XSLT converter (hence the title Approximate
), but show how a basic converter can be constructed and permit limited, but common,
CSS
styling to be projected in an XSLT environment. In the process, if I play somewhat
fast and loose with the CSS standard(s), it is partly because in
practice that is how CSS appears to be.
In this paper I will first present a very small SVG example styled by a CSS and show what an equivalent XSLT transform could be. Possible XSLT mechanisms that should support a variety of CSS rules are then discussed, followed by how the transform itself is generated. A larger example of its use (styling an ACM paper) is then presented. Finally I consider possible developments of the technique and draw some conclusions.
A Simple Example
Suppose we have a very simple SVG document:
Now we want to improve the attractiveness of the offering, by styling it with an externally defined CSS stylesheet:
(If you are viewing this directly in an HTML browser, where the SVG is referenced
by an img
element, as opposed to the
mediaobject
in the original DocBook source, then you won't see the pretty colours – this is one
of the issues this work is addressing!)
SVG extends the CSS property vocabulary for its own specific presentational properties,
such as fill
. In line with the don't
change topology
philosophy of CSS, these do not cover alterations to geometry, i.e. we can't change
the size of one of the rectangles through
a CSS stylesheet. If we now reference this SVG document from within another, such
as an HTML rendering of this paper, the CSS styling probably won't
come through[2]. A much more robust approach would be to fix the styling on the SVG and embed it
directly:
This paper is about how, and to what extent, this projection of a CSS stylesheet onto an XML document can be performed in an entirely XSLT execution environment.
Similar work
The problem of incorporating CSS into the XSL-FO mix has been addressed by a specific
product – CSSToXSLFO.
http://www.re.be/css2xslfo/index.xhtml which is specifically targetted to generated styled XSL-FO structures corresponding
to the box-model
of layout implicit in CSS. Internally it uses the Flute Java-based CSS Parser to generate
an internal (Java) rule structure. This structure is then
applied across the (arbitrary) input XML document to create a DOM tree with additional
namespaced attributes or elements (e.g.
@css:text-align
, css:before
) which define the appropriate property as determined by the CSS. This intermediate
document is
then converted to suitable XSL-FO by a generic XSLT stylesheet matching these determined
CSS directives attached to elements.
Externally my approach is perhaps similar, in that the CSS sheet is parsed into an action that determines CSS-defined styling properties and additional content, though these are used to overwrite any existing properties of the same name, and no further conversion is attempted. Internally it is quite different, developing a complete parse tree for the CSS (using regular expressions in XSLT), which is used to generate a suitable XSLT program.
The execution model
A single simple CSS rule declaration has the following generic form:
selector-pattern { property : value; property : value; … }
where for elements matching the selector pattern, values for the given style properties
are declared. Such property values might be superseded or
modified by properties defined in other rules with more specific matches to the given
element. In this way, stylesheets can cascade their effects. For
example a rule h1 {font-size: 130%;}
in a generic stylesheet merely declares that h1
elements should be rendered in a 30% larger
font than whatever is the default. A more specific CSS sheet can define what that
default font size should be or even force a specific font size on the
h1
element.
The selector patterns, whilst not as rich as XSLT's generic XPath patterns, do have
enough expressiveness to cover ancestry, class membership and
identity reference, sibling position, descendant contents, attribute values and so
forth. Pseudo-element selectors permit styling portions of the tree
that are not direct elements, such as adding preamble via the :before
selector.
Most properties are simple stylistic scalar values (e.g. color:
), but some relate to box-model display control (display:
)
and others declare additional content or counter variables, which are used to support
numbering. Later versions have started to incorporate declarations
about column placement, flow targets etc.
The precedence model between these CSS declarations is comparatively simple – a specificity
model for a given property examines patterns which would
effect it and score them in a hierarchy on the numbers of identity attribute, other
attribute and element name references in the pattern. Specific
@style
attributes win out and the universe of declarations is also split according to source
(user agent, user and author) and an
importance declaration (!important
).
Simple (and approximate) XSLT model
There are a number of possible architectures that could be used. In this section I discuss one where there is some direct correspondence between the CSS rules and the XSLT templates in the equivalent stylesheet – the approach the author took originally. Later on I'll discuss some other possibilities for the overall design.
I'll start by examining a very simple case, using the example CSS stylesheet above.
According to the CSS specificity rules, when rule 2 is matched,
then the @fill
property would be set to 'blue' as it is more specific than the (unclassed) rule
1. Technically then we should compute the
fill property for each element independently of the other properties, with reference
to these rules.
However, if for the moment we make some assumptions about the sort of CSS stylesheets being employed, we can build a fairly simple XSLT transform with the following templates that would have the same effect:
(using a default XPath namespace for SVG, see section “Namespaces” ) which generates
<svg xmlns="http://www.w3.org/2000/svg" width="250" height="150" fill="red" stroke="black"> <g alignment-baseline="baseline" fill="red" stroke="black"> <rect x="20" y="20" width="200" height="100" fill="red" stroke="black"/> <rect class="a" x="100" y="5" height="140" width="40" fill="blue" stroke="black" fill-opacity="0.5"/> </g> </svg>
which is the desired outcome.
We use XSLT templates to describe the entirety of the property applications within
each CSS rule, rather than using the set of rules to look-up a
required property in turn. This works by using the feature that XSLT can overwrite
the values of element attributes while still effectively constructing
the element node itself rather this its children – the last written with a given name
to the element head tag becomes the value in the result. This we
arrange by ensuring that the lowest precedence rules fire first followed by the successively
higher and higher precedence (specificity) rules in turn,
giving every one of our template rules defined and different priorities and using
the xsl:next-match
instruction to chain through to the
lower priorities for earlier evaluation.
This simple general technique makes the following assumptions which seem to be approximately good enough for many common uses of CSS[3]:
-
Evaluating matching rules for all properties en masse for a given element produces an equivalent end result to lookup of each relevant property through the rule structure in turn. It can be dependent on the fact that many final user agents will (should?) silently ignore additional extraneous (and usually attributive) information. For example the transform above, because of the * matching, would also add
@fill
and@stroke
attributes to anydesc
ortitle
elements (or their descendants) in an SVG document, even though such properties aren't defined for those meta-documental components within SVG[4]. How this problem might be alleviated will be described in a later section. -
No use is made of relative properties, i.e. where a property changes dependent upon that of some parent. In this case each rule can be considered in isolation. Later I will discuss how this issue might be solved.
To generate these rules first we need to determine equivalent XSLT patterns that will match elements according to the CSS selector grammar. Some examples are shown in the following tables:
Table I
CSS | XSLT pattern |
---|---|
* | * |
element | element |
.class | *[tokenize(@class,'\s+')=' class '] |
#id | *[@id=' id '] |
E > F | E / F |
E F | E // F |
E - F | F [preceding-sibling:: E ] |
E + F | F [(preceding-sibling::*)[1]/self:: E ] |
Table II
CSS | XSLT pattern | Notes |
---|---|---|
pattern[attr] | XSLT pattern [exists(@ attr )] |
|
pattern[attr=value] | XSLT
pattern [@ attr = value ] |
A suitable typed binding of value to an XPath literal (e.g. 1, 2, 'three' ...) will be needed. |
pattern:nth-child(even) | XSLT pattern [position() mod 2 = 0] |
|
pattern:nth-child(odd) | XSLT pattern [position() mod 2 = 1] |
|
pattern:nth-child(digit) | XSLT pattern [position() = digit ] |
@style property
As well as supporting grounded attributive properties (@fill
, @stroke-width
) many XML dialects permit a conglomerate
@style
attribute, whose value is a list of property-value pairs. Some dialects (e.g. XHTML)
only permit direct
document-borne styling though this means. To make this compound style attribute list,
we can use substantially the same mechanism with the following
modifications to the code:
We write the found attributes onto a temporary element H
so that the last values
win, then we sweep up all the
attributes into the property/value list. (If we only want the @style
property then the xsl:sequence select="$style-att"
instruction can be suppressed. This could be controlled on a source-language property
of the overall process.)
Generated content
CSS selectors can also include qualifiers (pseudo-elements) that add (textual) content to a given element. This is often used to support list labelling or numbering of headers or running entities such as figures or tables. (Sadly this isn't a feature of SVG, so instead of illustrating with pictures, I'll have to use dull XHTML instead.) As a very simple example:
The :before
and :after
selectors target effectively the beginning and end respectively of the contained
content of the
given element, using the content
property (which can either be text, source resource URI, smart
quotes, counters (see later)
or the value of an attribute on the source element) as the content to insert. [Technically
in CSS3 pseudo-elements such as these must use a double-colon
::
prefix to distinguish from pseudo-classes, but :before
and :after
are permitted in backwards compatibilty
for CSS1 and CSS2. See Selectors Level 3].
To process these we have a parallel set of templates that process the overall body
of the element, collecting before and after content sections in
turn being passed as tunnelled variables of xs:string?
type. (Content is as far as I'm aware not additive, so more specific content
overrules less specific.) The match patterns are the same as those used for any normal
styling for that element.
Table III
<xsl:template match="*" mode="body"> <xsl:param name="before" tunnel="yes"/> <xsl:param name="after" tunnel="yes"/> <xsl:copy> <xsl:apply-templates select="@*" mode="#current"/> <xsl:apply-templates select="." mode="css"/> <xsl:value-of select="$before"/> <xsl:apply-templates mode="#current"/> <xsl:value-of select="$after"/> </xsl:copy> </xsl:template> <xsl:template priority="1.501" mode="body" match="h1"> <xsl:param name="before" as="item()?" select="()" tunnel="yes"/> <xsl:variable name="content" as="item()*"> <xsl:text>HEADING: </xsl:text> <xsl:text>‟</xsl:text> </xsl:variable> <xsl:next-match> <xsl:with-param name="before" select="($before,string-join($content))[1]" tunnel="yes"/> </xsl:next-match> </xsl:template> |
<xsl:template priority="1.502" mode="body" match="h1[tokenize(@class,'\s+')='vital']"> <xsl:param name="before" as="item()?" select="()" tunnel="yes"/> <xsl:variable name="content" as="item()*"> <xsl:text>VITAL: </xsl:text> <xsl:text>‟</xsl:text> </xsl:variable> <xsl:next-match> <xsl:with-param name="before" select="($before,string-join($content))[1]" tunnel="yes"/> </xsl:next-match> </xsl:template> <xsl:template priority="1.503" mode="body" match="h1"> <xsl:param name="after" as="item()?" select="()" tunnel="yes"/> <xsl:variable name="content" as="item()*"> <xsl:text>”</xsl:text> </xsl:variable> <xsl:next-match> <xsl:with-param name="after" select="($after,string-join($content))[1]" tunnel="yes"/> </xsl:next-match> </xsl:template> |
Counters
CSS uses the same mechanism to declare, modify and interpolate the values of counters, which become important in styling sectioned documents.
When the counter names are the same as the elements they are counting, and same-name
elements don't nest, an equivalent structure for the
h2:before
rule can be built in the XSLT by altering the content to:
<xsl:variable name="content" as="item()*"> <xsl:variable name="last.reset" select="()[last()]"/> <xsl:value-of select="count(if(exists($last.reset)) then preceding::h1[. >> $last.reset] else preceding::h1) + 0"/> <xsl:text>.</xsl:text> <xsl:variable name="last.reset" select="((preceding::h1))[last()]"/> <xsl:value-of select="count(if(exists($last.reset)) then preceding::h2[. >> $last.reset] else preceding::h2) + 1"/> <xsl:text>. </xsl:text> <xsl:text>“</xsl:text> </xsl:variable>
which counts back through the relavent elements to the last element which reset the given counter[5]. Obviously more complex code can be developed to handle arbitrary counter names, and within an XSLT3.0 environment accumulators may offer a more elegant solution.
Using accumulator as counters
XSLT3.0 introduced accumulators to support the processing or retaining data from the
source tree for later use within streamed processing. They
have parallels with xsl:key
, but rather more flexibility, and can be considered as a map from source tree nodes
to an arbitrary value,
which is computed by a walk through the tree computing a pre-descent and a post-descent
value for each node. In the absence of any applicable rule
that changes the value of the accumulator, the value remains constant. Thus we could
compute the values of the h1
and h2
counters above (which increment by 1) with the following:
<xsl:accumulator name="h1" initial-value="0"> <xsl:accumulator-rule match="h1" phase="start" select="$value + 1"/> </xsl:accumulator> <xsl:accumulator name="h2" initial-value="0"> <xsl:accumulator-rule match="h1" phase="start" select="0"/> <xsl:accumulator-rule match="h2" phase="start" select="$value + 1"/> </xsl:accumulator>
and we retrieve the value of the counters by accumulator-before('h1')
and accumulator-before('h2')
respectively. Note
that the h2
rule gets its value reset to 0 by any h1
element, as declared by the h1
rule. CSS defines that
numbering will nest for descendants using the same counter. This can be accomodated
for a nested example like ol/li.. ol/li
by using the
accumulator to hold a stack of values and popping the stack on exit from the list:
<xsl:accumulator name="list" initial-value="()"> <xsl:accumulator-rule match="ol" phase="start" select="0, $value"/> <xsl:accumulator-rule match="ol" phase="end" select="if(ancestor::ol) then tail($value) else $value"/> <xsl:accumulator-rule match="li" phase="start" select="head($value) + 1,tail($value)"/> </xsl:accumulator>
This uses the post-descent
phase of processing an ol
to return to the previous level if it is within an outer
ol
and the value is now accessed with head(accumulator-before('list'))
.
Drawbacks, shortcomings and workrounds
The currrent technique of course falls well short of handling all CSS constructs, but there are some drawbacks and shortcomings that might be susceptible to different approaches:
Relative properties
Some numeric properties within a rule in CSS, especially font-size, can be defined
to be relative to the default
or
already-determined
value for that property, such as h2 {font-size: larger;}
or .superscript {font-size: 50%;}
The current technique would merely write this as the attribute's value, which would
probably be an (invalid) property value. In order to support this it
will be necessary to determine any computed value of that property for the parent.
In XSLT of course only the source tree can be examined directly by XPath – result trees need to be bound to local variables
to
examine those, and doing that from a child into the parent is tricky and fraught.
This suggests that computed properties for the parent need to be
passed down to child processing, probably in the form of tunnelled variables. A generic
possibility is to pass down a childless, but attributed,
instance of the altered element as a parameter for processing the children. In XSLT3.0
a map()
would be a much more suitable mechanism,
using something like:
<xsl:template match="span[tokenize(@class, '\s+') = 'a']" priority="1.5" mode="css:properties" as="map(*)"> <xsl:param name="css:properties" as="map(*)" tunnel="yes"/> <xsl:variable name="lower-props" as="map(*)"> <xsl:next-match/> </xsl:variable> <xsl:variable name="new-props" as="map(*)"> <xsl:map> <xsl:map-entry key="'color'" select="'blue'"/> <xsl:map-entry key="'font-size'" select="$css:properties('font-size') * 0.70"/> </xsl:map> </xsl:variable> <xsl:sequence select="map:merge(($lower-props, $new-props))"/> </xsl:template> <xsl:template match="span[tokenize(@class, '\s+') = 'a']" priority="1.5" mode="css:apply" as="attribute()*"> <xsl:variable name="props" as="map(*)"> <xsl:apply-templates select="." mode="css:properties"/> </xsl:variable> <xsl:attribute name="color" select="$props('color')"/> <xsl:attribute name="font-size" select="$props('font-size')"/> </xsl:template>
where the mode css:properties
collects a map of the properties (relying on map entries overwriting), which can
involve properties from
inheritance or less specific patterns, and css:apply
looks up the requested appropriate values and returns as attributes. Alternatively
in
XSLT3.0, accumulators, which compute across the tree, could use multiple rules to
track the values. For example with the following accumulator:
<xsl:accumulator name="font-size" initial-value="12"> <xsl:accumulator-rule match="div[tokenize(@class, '\s+') = 'small']" select="10"/> <xsl:accumulator-rule match="span[tokenize(@class, '\s+') = 'a']" select="$value * 0.7"/> </xsl:accumulator>
accumulator-before('font-size')
executed on a div.small span.a
would yield the value 10
.
Inheritance
The property value keyword inherit
can be used to declare that the value for the given property is the same as that
for its parent.
Thus * { color: inherit }
means that all descendants of a node have the colour property of the closest ancestor
that defines a colour[6]. Any solution to support relative properties would also be able to support such inheritance.
Again using a tunneled variable holding such
inheritance values may be be suitable.
Namespaces
Most CSS stylesheets are used in a namespace-ignorant
manner: patterns match local names of elements. Of course XSLT is far from such
ignorance and patterns should be suitably namespace-applicable. This can either be
done by using the local name with wildcarded namespace
(*:
element-name
) or when the document is in a single namespace, declaring the default XPath namespace,
which is used in the examples above. In CSS3 there are mechanisms to define default
and prefixed namespaces (@namespace svg
"http://www.w3.org/2000/svg";
, svg|circle { ... }
) so these could be detected and converted to XSLT trivially.
Style provenance and importance
Styling rules can come from a number of sources: the rendering agent, a user-imposed
stylesheet, an author-declared stylesheet and also attached to
@style
properties on a document element itself. In addition an important
indicator can preclude further overriding of a
given property. These would all have influence on the precedence order between rules
and associated templates, though !important
applies
only to a property, not to a rule. This would require properties to be processed individually
rather than as at present,
en masse.
Restriction to applicable elements
As suggested in section “Simple (and approximate) XSLT model”, the generated XSLT patterns are somewhat catholic, or even perhaps cavalier, in
their
applicability. Rather than an element of, say, type foo
only requiring properties a
and b
, and in CSS looking
through the ruleset to determine if any definitions for these two properties exist,
the XSLT approach means that the rules impose themselves on the
elements. Thus a CSS rule * {a:1; b:2; c:3;}
would impose the (attributive) property c
onto foo
elements, even if
that property were irrelevant to the foo
type. In our SVG example above SVG meta-document elements such as desc
or
title
could be so styled with irrelevant colouring or fonting. When tools behave as good XML citizens
this shouldn't be a
problem, though conceivably a property could be imposed (or overwritten) as an attribute
that was not stylable through CSS according to the domain
semantics.
If this is a problem, then a possible solution would be to exploit restrictions from any schema
available for the target
documents. The SVG schema for example restricts the permitted attributes
on desc
to the following: @id, @xml:base,@xml:lang, @xml:space, @class, @content
and @style
. This it would be
possible to guard the application of other properties within a * { … }
pattern with xsl:if test="not(self::desc)"
. A better
and more coherent option would be to use the schema actively either in the run-time
or the compile time. This is especially so given the structured
nature of many schemas – in the case of the SVG schema large attributeGroups
partition the sets of applicable properties into useful chunks.
Note
Sadly SVG 1.1 only has a DTD – the schema is from an earlier 2002 version.
Another drawback is that our CSS rules are applied regardless of whether the property involved is CSS-manipulable within the target language. For example we could write a rule:
rect.large { width: 200; }
which would be translated into the simple template:
<xsl:template priority="..." mode="css" match="rect[tokenize(@class,'\s+')='large']"> <xsl:next-match/> <xsl:attribute name="width">200</xsl:attribute> </xsl:template>
the consequence of which would be to set the width attribute of large rect
elements. However in SVG, unlike for example in XHTML, no
direct geometry properties are susceptible to CSS styling (see SVG's
styling properties), and normally the directive would be ignored, so in our (XSLT-styled) case we'd
get an incorrect, not just approximate,
result. This could be solved by a target-language-specific filtering of the CSS rules.
Note that using the SVG schema or DTD alone to determine
applicability would not be sufficient – @width
is a rather crucial component of svg:rect
!
Constructing the XSLT transform
In this section I'll discuss how the XSLT transform to perform the CSS styling is constructed. As this tool was built up incrementally as needed, some parts are certainly not as complete as they eventually should be.
The first step of this process is to parse the CSS stylesheet(s) to produce an XML equivalent parse tree, from which code equivalents can be generated through normal XSLT push-processing methods. It should be possible of course to use some form of full-blown parser, perhaps one generated by Gunther Rademacher's excellent parser generator REx REx[7]. Another option would be to build on the example of Invisible XML described by Pemberton Pemberton2013 which uses parsing CSS as an example. I'll return to these possibilities in the conclusions.
However, in the somewhat ad hoc environment within which this tool was originally used, and given the assumption
that the
complexity of the CSS stylesheets being processed is understood well and that content
is friendly (e.g. strings don't contain
{
or }
), compound use of regular expressions has sufficed so far.
CSS stylesheets can import other stylesheets through @import
placed at the directives at the head of a file – the semantics of the
collection are flat, so a recursive inclusion sweep through the implied URL network
concatentating text will yield a collection of statements that are in
the correct document order
.
After removing comments, the text of the CSS is split into separate rules by tokenization
against a
pattern. Further
regular expression tokenization gives us for each rule the selector text and a sequence
of property/value pairs. The selector text is then parsed using a
set of regular expressions (operating within an \}\s*
xsl:analyze-string
instruction) to produce a modified @match
attribute
containing a suitable XPath equivalent. For extensibility these regular expressions
are bound to variables – the following section of the converter shows
how they're used[8]:
<xsl:variable name="elementIdP">\w+#[\w\-]+</xsl:variable> <xsl:variable name="idP">#[\w\-]+</xsl:variable> <xsl:variable name="elementClassP">\w+\.[\w\-_\.]+</xsl:variable> <xsl:variable name="classP">\.[\w\-_\.]+</xsl:variable> <xsl:variable name="regex" select="string-join(($elementIdP,$idP,$elementClassP $classP),'|')"/> <pattern match="{.}"> <xsl:attribute name="match"> <xsl:analyze-string select="." regex="{$regex}"> <xsl:matching-substring> <xsl:choose> <xsl:when test="matches(.,$elementIdP)"> <xsl:value-of select="substring-before(.,'#')||'[@id='||$apos||substring-after(.,'#')||$apos||']'"/> </xsl:when> <xsl:when test="matches(., $idP)"> <xsl:value-of select="'*[@id='||$apos||substring-after(.,'#')||$apos||']'"/> </xsl:when> <xsl:when test="matches(., $elementClassP)"> <xsl:value-of select="substring-before(.,'.')||jwl:classPredicates(.)"/> </xsl:when> <xsl:when test="matches(., $classP)"> <xsl:value-of select="'*'||jwl:classPredicates(.)"/> </xsl:when> ... </xsl:choose> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> </xsl:attribute> <xsl:sequence select="$properties"/> </pattern>
These regular expressions are used twice – firstly within an overall match (when they
have been joined as a union) and then individually in specific
tests for particular components – the order of these choices can be used to match
more specific cases before more general, such as a named element with
class before a generic class. These pattern
structures not only can contain the selector pattern and any implied properties,
but also any
other qualifiers, such as being a :before
pseuo-element. As an example:
<pattern match="*"> <xsl:attribute name="font-family">arial, helvetica, sans-serif</xsl:attribute> </pattern> <pattern match="body"> <xsl:attribute name="counter-reset">h1</xsl:attribute> </pattern> <pattern match="h1"> <xsl:attribute name="color">red</xsl:attribute> </pattern> <pattern match="h2"> <xsl:attribute name="color">red</xsl:attribute> </pattern> <pattern match="h1" action="before"> <xsl:attribute name="counter-increment">h1</xsl:attribute> <xsl:attribute name="content">counter(h1) ". " open-quote</xsl:attribute> <xsl:attribute name="counter-reset">h2</xsl:attribute> </pattern> <pattern match="h2" action="before"> <xsl:attribute name="counter-increment">h2</xsl:attribute> <xsl:attribute name="content">counter(h1) "." counter(h2)". " open-quote</xsl:attribute> </pattern> <pattern match="h1" action="after"> <xsl:attribute name="content">close-quote</xsl:attribute> </pattern> <pattern match="h2" action="after"> <xsl:attribute name="content">close-quote</xsl:attribute> </pattern>
(Note that these pattern
elements contain xsl:attribute
instructions, which will then be copied into the final templates of the transform.) We now have
a sequence of all the rules within
the CSS. They should then be sorted according to their CSS specificity rank (a four-level
vector), which can be calculated while the pattern
elements are being constructed, e.g.
p.footnote → <pattern match="p[tokenize(@class,'\s+')='footnote']" specificity="0,0,1,1">.... section#bibliography > p.header → <pattern match="section[@id='bibliograph']/p[tokenize(@class,'\s+')='header']" specificity="0,1,1,2">....
This sorted set of rule descriptions is then used to generate an XSLT template (as
shown in Figure 4) in mode css
for each one, with a successively increasing priority and a generic form of executing
lower priority matches first through an xsl:next-match
chain followed by writing the relevant attribute properties.
With the equivalent stylesheet formed, it can either be emitted as a result, to be
used externally, or within an XSLT3.0 environment, it can be used
as the source stylesheet for a transform()
function invocation to provide the styling to some candidate XML within the encompassing
stylesheet itself.
Additional features
Apart from selectors, property values and counters, CSS also defines a number of other styling features for which equivalent XSLT can be generated comparatively easily. These include:
text-transform |
CSS supports models for transforming the case of text content of an element with the
property |
content: open-quote | close-quote |
The content property can contain reserved tokens for inserting quotation marks, which
can nest, and whose non-default values are taken from a
property |
A larger example
A more realistic example is the authoring/production process for a professional paper to be published eventually in PDF format. The publisher, in this case the ACM, provides a style guide, and templates in Word or LaTeX. In this case the paper source, much of which might be autogenerated (including self-laying-out SVG sections) was in neither of those formats, but in an extended and well-formed XHTML. The publication route, which had been developed over several years, involved a mapping from XHMTL to an extended SVG/FO format (which included layout directives), then to a fully grounded SVG and thence to PDF with a SVG→PDF converter[9].
Originally the paper sources were created in a standard XML editor and the ACM styling
was imposed through XSLT templates and attribute sets that were
used to form up the equivalent document content and layout directives. This was reasonable,
but the editing front-end was all angle
brackets
. With the advent of styled authoring packages, such as that in Oxygen, where a document
is viewed and edited through a CSS stylesheet,
it became possible to edit the document with most elements styled reasonably – e.g.
paragraphs have correct fonting, headings look similar to those that
will be in the final document, including section numbering, lists are seen as lists
etc[10] and angle brackets are rarely seen!
It took little time to write the ACM-equivalent CSS for the main components of an XHTML report, so the paper authoring became much easier. Then we can use the techniques described above to ground the style of the XHTML according to the CSS, and remove the attribute-set styling from the XHTML→SVG/FO step. In the process of doing this of course, we increase the generality of the solution. Textural styling (as opposed to layout/geometric) can now be completely defined from the CSS.
This is a part of the original source, where there are some additional element declarations
(embed
supports adding multiple images,
declaring their relative positioning and sizing), and attribute properties (@lay:float
declares that this element, in this case this entire
div
, can always float forward
(next page, next column..) if it won't fit in remaining available space.)
<div lay:float="always" lay:id="d48e871"> <figure id="sE2" height="free"> <embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg" xpath="(.//svg:g[@id='main']/*,./*)[1]" stack-offset="50" TYPE="image/svg-xml" gap-y="5" height="20" columns="2" box="no" lay:same.scale="true"/> </figure> <p class="caption">Figure 6. A continual variation of Figure 5</p> </div> <p>The retained instructions (<code>xsl:for-each</code>...) are considered to be a null element as far as layout is concerned, will not be displayed in an SVG viewer, have no influence on geometry and will be copied in place in the resulting resolved flow<sup class="fnIndex">8</sup>.</p> <p class="footnote" lay:footnote="footnote"> <sup class="fnIndex">8</sup>This is not so for topologically-modifying layouts, such as sort-by-size but any eventual (graphical) results from these instructions would be placed correctly on resolving the parent's layout.</p> <p>There are a small number of possible types of propagation that may be declared in an <code>@R:retain</code> attribute:</p> <ul> <li> <code>true</code> - execute the instruction and always retain.</li> <li> <code>while[<em>(expression)</em>], until[<em>(expression)</em>] </code>- execute the instruction; retain while/until the given XPath expression is true.</li> </ul>
These are some relavent sections from the ACM style CSS:
* { font-family:Times-Roman; font-size:9pt; } p { padding-bottom:3pt; text-align:justify; hyphenate:true; } dl,ul,ol { margin-left:12pt; } li { text-align:justify; hyphenate:true; padding-bottom:3pt; } code { font-family:Courier; font-size:8pt; font-weight:bold; } sup { vertical-align:4pt; font-size:6pt; } footnote { font-size:8pt; font-style:italic; }
And this is what it looks like when viewing the combination in Oxygen's Author mode.
When we transform the CSS into XSLT and apply it, we now get:
<div lay:float="always" lay:id="d48e871" style="font-family:Times-Roman;font-size:9pt"> <figure id="sE2" height="free" style="font-family:Times-Roman;font-size:9pt"> <embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg" xpath="(.//svg:g[@id='main']/*,./*)[1]" stack-offset="50" TYPE="image/svg-xml" gap-y="5" height="20" columns="2" box="no" lay:same.scale="true" style="font-family:Times-Roman;font-size:9pt"/> </figure> <p class="caption" style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt; text-align:center; hyphenate:true;margin-top:5pt;font-weight:bold">Figure 6. A continual variation of Figure 5</p> </div> <p font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt; padding-bottom:3pt;text-align:justify;hyphenate:true">The retained instructions ( <code style="font-family:Courier; font-size:8pt;font-weight:bold">xsl:for-each</code> ...) are considered to be a null element as far as layout is concerned, will not be displayed in an SVG viewer, have no influence on geometry and will be copied in place in the resulting resolved flow <sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup>.</p> <p class="footnote" lay:footnote="footnote" style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true"> <sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup> This is not so for topologically-modifying layouts, such as sort-by-size but any eventual (graphical) results from these instructions would be placed correctly on resolving the parent's layout.</p> <p font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true"> There are a small number of possible types of propagation that may be declared in an <code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">@R:retain</code> attribute:</p> <ul font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;margin-left:12pt"> <li font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;text-align:justify; hyphenate:true;padding-bottom:3pt"> <code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">true</code> - execute the instruction and always retain.</li> ... </ul>
which then gets converted to SVG and ultimately a PDF looking like this:
Note
The observant will notice that the footnote styling alters. footnote
was an artificial element used for authoring, whence the italics
aided viewing in the styling editor. During a phase where such additional constructs
are expanded to suitable XHTML equivalents, the
footnote
was converted into a sup
index, placed at the same location and a p @class="footnote"
, paragraph,
containing the relevant index reference, immediately following the enclosing paragraph.
Such footnote bodies
are extracted during
pagination to build from the bottom of a column.
Not only has this technique been used on a variety of professional papers, it has also been employed to generate PDF product data sheets for a well-known XSLT engine supplier, and indeed the entire styling of the author's PhD thesis.
Possible developments and Conclusion
This paper is reporting on work in progress
and as such there are a number of areas which would need development for black
box
use, some of which have been introduced earlier. The most pressing, providing robust
parsing of the CSS, is reported in the next subsection.
Others that await development are:
-
Supporting important (
!important
) declarations – these need to be separated from the humdrum, which means that rules may need to be split into two declaration sections and those that are important ranked in precedence (and thus eventual XSLT template priority) above all that are not. -
Providing a route for inheritance and relative property declarations. As implied earlier, a mechanism using tunneled variables (
xsl:param name="font-size" tunnel="yes"
) could be used to pass through computed values for parent elements in the recursive processing of the tree, from whence the relative value for the child can be computed. This will be complicated further by the need to process in multi-unit arithmetic (150% * 14pt
), in a discrete universe (larger x-small
) or establishing what are the base values in the absence of any CSS stylesheet setting (e.g HTML font size 3).
Robust CSS parsing
Using a robust CSS parser (operating entirely in XSLT perhaps, or as a CSS/SAC-based XSLT extension function) producing an XML tree would be advantageous and certainly needed for any full-scale automated production use. As an example, using an approximate CSS declaration grammar along with a grammar for the selectors, written in EBNF, we can use REx to generate an appropriate parser and obtain a raw parse tree for a CSS stylesheet, where element names are those of the appropriate productions.
These trees tend to be large, as noted by Pemberton and reduced forms are much more
convenient. As in Lumley2014 it is a simple
XSLT task to refine these trees to the essentials, by discarding most one-child elements
in favour of their children, and migrating many of the
operators
to attributes. So for example we can parse the following into the accompanying trees[11].
Table IV
div.fred > pre.code { font-size: 70%; } .alf, #bert { stroke-width: 33; } div#fred svg|p.code.emphasis { font-size: 25pt; } |
||
<ruleset> <selectors_group> <selector> <element name="div"> <class name="fred"/> </element> <combinator type=">"/> <element name="pre"> <class name="code"/> </element> </selector> </selectors_group> <proplist> <declaration property="font-size"> <percentage value="70"/> </declaration> </proplist> </ruleset> |
<ruleset> <selectors_group> <selector> <class name="alf"/> </selector> <selector> <id id="bert"/> </selector> </selectors_group> <proplist> <declaration property="stroke-width"> <number value="33"/> </declaration> </proplist> </ruleset> |
<ruleset> <selectors_group> <selector> <element name="div"> <id id="fred"/> </element> <combinator type="space"/> <element name="p" prefix="svg"> <class name="code"/> <class name="emphasis"/> </element> </selector> </selectors_group> <proplist> <declaration property="font-size"> <dimension value="25" unit="pt"/> </declaration> </proplist> </ruleset> |
Note
The ideas of Pemberton on defining a grammar which specifically marks what is to be
written to the final parse tree, and how, is quite attractive
as it would automate
the refinement phase, which is currently written in (simple) XSLT. It might even
be tractable to use an
Invisible XML
grammar definition and project it into two forms – an EBNF to use for a parser generator,
and a set of instructions
(XSLT templates?) for parse tree refinement.
From these structures it is comparatively easy to either form the same pattern structures as shown earlier, or produce a new template-generation strategy. Here are an intermediate form and the resultant XSLT structures:
Table V
<ruleset pattern="div[tokenize(@class,'\s+')='fred'] /pre[tokenize(@class,'\s+')='code']" specificity="0,0,2,2"> <selector pattern="div[tokenize(@class,'\s+')='fred'] /pre[tokenize(@class,'\s+')='code']" specificity="0,0,2,2"> <element name="div" pattern="div[tokenize(@class,'\s+')='fred']" specificity="0,0,1,1"> <class name="fred" pattern="[tokenize(@class,'\s+')='fred']" specificity="0,0,1,0"/> </element> <combinator type=">" pattern="/"/> <element name="pre" pattern="pre[tokenize(@class,'\s+')='code']" specificity="0,0,1,1"> <class name="code" pattern="[tokenize(@class,'\s+')='code']" specificity="0,0,1,0"/> </element> </selector> <proplist> <declaration property="font-size" xpath="'70%'"/> </proplist> </ruleset> <ruleset pattern="*[tokenize(@class,'\s+')='alf']" specificity="0,0,1,0"> <proplist> <declaration property="stroke-width" xpath="33"/> </proplist> </ruleset> <ruleset pattern="*[@id='bert']" specificity="0,1,0,0"> <proplist> <declaration property="stroke-width" xpath="33"/> </proplist> </ruleset> <ruleset pattern="div[@id='fred'] //svg:p[tokenize(@class,'\s+')='code'] [tokenize(@class,'\s+')='emphasis']" specificity="0,0,2,2"> <proplist> <declaration property="font-size" xpath="'25pt'"/> </proplist> </ruleset> |
<xsl:template match="*[tokenize(@class,'\s+')='alf']" mode="body.css" as="attribute()*" priority="1.506"> <xsl:next-match/> <xsl:attribute name="stroke-width" select="33"/> </xsl:template> <xsl:template match="div[tokenize(@class,'\s+')='fred'] /pre[tokenize(@class,'\s+')='code']" mode="body.css" as="attribute()*" priority="1.507"> <xsl:next-match/> <xsl:attribute name="font-size" select="'70%'"/> </xsl:template> <xsl:template match="*[@id='bert']" mode="body.css" as="attribute()*" priority="1.508"> <xsl:next-match/> <xsl:attribute name="stroke-width" select="33"/> </xsl:template> <xsl:template match="div[@id='fred'] //svg:p[tokenize(@class,'\s+')='code'] [tokenize(@class,'\s+')='emphasis']" mode="body.css" as="attribute()*" priority="1.509"> <xsl:next-match/> <xsl:attribute name="font-size" select="'25pt'"/> </xsl:template> |
For the first rule the appropriate properties for the children (suitable pattern, CSS specificity) are shown – these are then combined to make the generic pattern and compound specificity for the rule. These rules can then be sorted according to their specificity to produce a declared priority order for the resulting XSLT templates. The second rule has been split into two copies as their patterns have differing specificity and hence must be supported by XSLT rules operating at differing priorities.
Generic program translation
In a more general sense, this work is an example of a generic technique whereby portions
of a program
P, written in some language Y are used within some other program written in a language Z
, by equivalence conversion of P→Pz. The success of this would be subject to a number
of factors:
-
The degree of similarity in executional semantics of Y and Z. In this case CSS and XSLT push mode both work in a pattern-directed manner, with a defineable conflict resolution model[12]with a similar source tree → result tree execution and a common target data type, so most of the main semantics of CSS can be simulated closely by XSLT equivalent models.
When the languages are not in such close agreement, the ease by which an interpreter for sections of Y can be constructed in Z becomes important. For example a program defined as simple non-cyclic spreadsheet could be interpreted in XSLT by casting as some XML datastructure coupled with an iterative tree-walker that uses a calculator function.
-
The extent (and accuracy) of the semantics of Y required within P. When these are large, significant problems may result. For example, if P requires accurate error handling, then it may well prove intractable to provide the same in the Z environment. In our case, CSS's execution error handling is basically
ignore
, so this doesn't cause a problem, and we're not pushing the edges of the CSS envelope. Equally we are prepared topush the CSS property
onto the target, rather thanpull the property from the CSS
, giving the possibility of extraneous styling, on the basis that final user agents will ignore such minor infractions, permitting a much simpler and more generic model. -
The need for execution performance. In general such equivalent-code simulations will always be slower than native implementations, sometimes very much slower.
-
The skill and craft of the program developer, especially where some feature requires a workaround to be supported.
-
The ease with which language Y can be parsed from code in Z. This can range from hand-crafted code (such as using regular expressions as described in section “Constructing the XSLT transform”) to using a specifically targeted parser.
The real advantage of using XSLT for such conversion is that once a parse tree has
been established in XML, it is a uniform push-mode
process to build other necessary data structures and ancilliary models which can then
either be used to generate directly equivalent XSLT or be used by
suitable interpreters to execute the required semantics of P These intermediate data structures can be examined easily (they
serialize).
Conclusion
This paper has described using a program written in another language (CSS) as a component within an XSLT execution environment, by cross-conversion. The method was originally rather rough-and-ready and used for a single document purpose but has progressively been refined until it might be suitable for (some) semi-production situations, especially in the hands of craftsmen. Though the similarity of the major semantics of CSS and some parts of XSLT (pattern invocation, property projection) meant the necessary XSLT execution model was reasonably simple, there is still potential in the technique for other situations. The use of an XML-centric processing model starting from the source language parse tree and progressing on to intermediate data structures, coupled with XSLT3.0's rich toolbox, makes this field not only productive, but fun. I encourage others to try it.
Acknowledgements
I must thank the several organisations (ACM, University of Nottingham, Saxonica) who've posed document generation challenges that motiviated and exercised this need to employ CSS where it wasn't supported. I'm also grateful for the insightful and supportive comments from the Balisage reviewers who've encouraged me to increase the robustness of the method, both in terms of using a proper CSS parse and in addressing more complex issues especially in style inheritance.
References
[CSS] Bos, Bert et al, Editors. Cascading Style Sheets Level 2 Revision 2 (CSS 2.2) Specification
. World
Wide Web Consortium, Editors' Draft, 29 March 2016. [online] http://dev.w3.org/csswg/css2/
[CSS3] Çelik, Tantek et al, Editors. Selectors Level 3
. World Wide Web Consortium, 29 September 2011.
[online] http://www.w3.org/TR/css3-selector/
[Lumley2013] Lumley, John. Functional, Extensible, SVG-based variable documents
Proceedings of the
2013 ACM symposium on Document engineering, pp 131-140. doi:https://doi.org/10.1145/2494266.2494274.
[online] http://dl.acm.org/citation.cfm?doid=2494266.2494274
[Lumley2014] Lumley, John. Analysing XSLT Streamability
. In Proceedings of Balisage: The Markup
Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014).
doi:https://doi.org/10.4242/BalisageVol13.Lumley01. [online] http://www.balisage.net/Proceedings/vol13/html/Lumley01/BalisageVol13-Lumley01.html
[LumleyPhD] Lumley, John. Documents as Functions
. University of Nottingham, PhD Thesis. June 2012
[online] http://eprints.nottingham.ac.uk/12631/
[Pemberton2013] Pemberton, Steven. Invisible XML
. In Proceedings of Balisage: The Markup
Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013).
doi:https://doi.org/10.4242/BalisageVol10.Pemberton01. [online] http://www.balisage.net/Proceedings/vol10/html/Pemberton01/BalisageVol10-Pemberton01.html
[REx] Rademacher, Gunther. REx Parser Generator
. [online] http://www.bottlecaps.de/rex/
[XPath3.0] Robie, Jonathan, Chamberlin, Don, Dyck, Michael, and Snelson, John, Editors. XML Path Language
(XPath) 3.0
. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/
[XPath.FO] Kay, Michael, Editor. XQuery and XPath Functions and Operators 3.0
. World Wide Web
Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-functions-30/
[XSLT2.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 2.0 (Second Edition)
. World Wide
Web Consortium, 23 January 2007. [online] http://www.w3.org/TR/xslt20/
[XSLT3.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 3.0
. World Wide Web Consortium, 2
October 2014. [online] http://www.w3.org/TR/xslt-30/
[1] Lector, si exemplum requiris, circumspice
[2] Or if it does, it requires various forms of black art or eldritch wizardry.
[3] At least the half-dozen or so the author has needed, in production of professional papers (ACM, Balisage, XMLLondon ...), product marketing documents and his own PhD.
[4] The author argues in Lumley2013 that such good XML citizen
behaviour by processing tools is important for
the production of compound document processing architectures. Note that if the content
of an SVG desc
element contained markup
elements (such as in the XHTML namespace) this styling might well effect those components.
[5] Reusing the same XSLT variable name in sibling positions is legal – the scope for
each is along a
following-sibling::*/descendant-or-self::*
compound axis, until overridden.
[6] For some XML dialects this might be true by default. SVG has default interitance for textural styling, even after CSS style application.
[7] Indeed one of the samples for the generator is for CSS selectors.
[8] In XSLT3.0 a set of map() bindings for these regular expressions might increase the coherence.
[9] These papers were all reporting on a highly extensible and highly-adaptable variable document architecture. Not only was it more coherent and robust to use the architecture itself to form the papers, for example supporting in-document evaluation of examples, it was also behoven on the authors to demonstrate that the architecture was capable of producing documents to a high professional standard. Close visual inspection of the resulting PDF can tell it came from neither the Word or LaTeX templates, but the variation is less than between those two forms themselves.
[10] Much as this paper is being authored using the Balisage CSS.
[11] The author is one of those who prefers keeping properties in attributes when they are singular rather than as text values of elements.
[12] Other target languages that might have similar tendencies include Prolog, Mathematica, the Haskell/ML family of functional languages and SNOBOL descendants.