Lumley, John. “Approximate CSS Styling in XSLT.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). https://doi.org/10.4242/BalisageVol17.Lumley01.
Balisage: The Markup Conference 2016 August 2 - 5, 2016
A Cambridge engineer by background, John Lumley created the AI group at Cambridge
Consultants in the early 1980s and then joined HPLabs Bristol as
one of its founding members. He worked there for 25 years, managing and contributing
in a variety of software/systems fields, latterly specialising in
XSLT-based document engineering, in which he subsequently gained a PhD in early retirement.
He is currently helping develop the Saxon XSLT processor
for Saxonica and consulting on various aspects of using XSLT.
This paper discusses transforming a CSS stylesheet into an XSLT transform that projects
an approximation of the styling from
the CSS onto a target XML document. It was developed during several XSLT-based projects
involving multi-dialect XML documents, where there was a need
either to evaluate CSS properties for another external tool, such as in an HTML →
XSL-FO → PDF pipeline, or where a document styling needed to be
fixed for embedding in another document, such as examples in professional papers. The paper
presents examples, explains the general
architecture of the generated XSLT transform, discusses how that transform is itself
constructed from the CSS stylesheet and outlines the strengths and
weaknesses and some of the directions in which the tool could be developed. It is
approximate in that it only supports some of the core CSS features,
assumes the user is skilled in the art and is working with CSS stylesheets that are understood and visible, and that the
execution speed
of the CSS projection is not an issue. Nevertheless, in the author's experience the ability to mix CSS
styling into the XSLT
researcher's toolbox has proved to be of some utility.
In the current world of the web, much of the textural styling of XML-based documents is defined by declarations within Cascading Style
Sheets (CSS) CSS. These encourage a model where XML elements are defined as members of classes, and
the CSS stylesheets declare patterns
to match elements of given type, class, position, ancestry, siblings and progeny,
for which a set of values for properties (e.g. font, colour …), pre- and
post-ambles (numbering, text), layout type (block, none, inline.…) and possible geometry
(size, positioning...) are provided. Applicable patterns are
chosen through a relatively simple specificity and supersessional model. For many
simple presentational views of XML documents an agent who implements CSS
can produce acceptable results. Many XML dialects (e.g SVG, XHTML) describe properties
for many of their elements that are manipulable through CSS
styling, as an alternative to or overriding direct attributive properties.
Different views of the same document can be projected by binding different CSS stylessheets.
For example a 'table of contents' summary could be
generated by suppressing display (display : none) of any element that is not a header or a descendant of a header. CSS works because
for most
cases it can provide a comparatively simple, and relatively easily understood, model
that mere mortals can work with. But generally CSS has one major
drawback for providing views of complex documents – it cannot, save for deletion or
minor addition of textual/image content, alter the topology of the
tree result, such as moving a caption element from the start to the end of the adjacent
table.
Note
It is possible in CSS to indicate geometric requirements, through directives such
as float:right or position:absolute,
which can alter the presentational order of components, but these are intimately tied
up with the CSS layout model, often supported in incompatible
ways and usually require both craft and measures to support multi-browser robustness.
Another feature is that CSS styling is through inheritance – it is not universally
possible to 'block' application within a subtree; only by
introducing subtree-scope overriding rules (which will still define properties) can
a section of a document be protected from other CSS rules. (Mechanisms
to revert to a parent-inherited value or the default initial value are being defined
in later CSS versions (3, 4), but browser support is inconsistent.)
In more complex document creation from XML sources, XSLT3.0 provides vastly greater flexibility, by being effectively a full-blown
(and higher-order capable) programming language with an XML tree as one of its principal
datatypes. It has an arbitrary capability of manipulating a
document and styling the result (e.g. through properties described at element attributes.)
Necessity is the Mother of Invention – over recent years the author has had several occasions where document styling was
available in
CSS, but an XML document being processed in a full-blown XSLT setting needed some
of the CSS styling information applied within an XSLT execution. One
option would be to manually construct equivalent XSLT libraries (templates, attributes
etc.) which corresponded to the CSS intention and apply these to
the source. This would of course be subject to coherence drift having to keep two styling masters in synchronisation. Another option was to
see whether some equivalent XSLT could be generated from the given CSS which would detemine the same properties for the document in
question, leaving the results principally as attributes on the result tree.
There are a number of cases where the effect of CSS styling might need to be evaluated before some final projection or display.
Examples are:
When the effect of a particular CSS needs to be fixed on a subsection of a document,
such that conflict with a wider scope CSS stylesheet is
avoided. For example, an embedded SVG component within an XHTML page could be styled
by its own CSS stylesheet. This could be in conflict with that
used (or externally imposed) for the HTML page. In this case for document consistency
we wish to fix the properties for the SVG. Similarly for example
in the preparation of a paper for a conference that is styled by CSS, which needs
to present embedded examples that are styled by
another CSS[1] it can be helpful to project the localised effects completely, and keep the conference
organisers from fiddling with the author's intent.
Note
Recent changes in the Balisage publication process, moving away from a final XHTML
publication to one in PDF, has reduced some of the
self-referential amusement this paper might offer the reader. However, if this had
been presented at Balisage 2015, humour would have been
restored.
The eventual processor for the document doesn't support CSS styling directly. For
example XSL-FO doesn't support references to CSS resources,
though it does use many attributive properties taken from the CSS2 model.
The goal of this work is to attempt to project as much of the effect of a CSS stylesheet
onto a an XML document as possible, leaving the effects as
either grounded property values as attributes on relevant elements, or modest sections
of added content.
Note
The author is well aware that CSS is evolving (descriptions of parts of CSS3 date
back to 2001, the CSS2.2 spec is being continually re-drafted),
handled to differing degrees of compliance by a variety of tools, and that extensions
are numerous. This work is not indented to build a definitive CSS
→ XSLT converter (hence the title Approximate), but show how a basic converter can be constructed and permit limited, but common,
CSS
styling to be projected in an XSLT environment. In the process, if I play somewhat
fast and loose with the CSS standard(s), it is partly because in
practice that is how CSS appears to be.
In this paper I will first present a very small SVG example styled by a CSS and show
what an equivalent XSLT transform could be. Possible XSLT
mechanisms that should support a variety of CSS rules are then discussed, followed
by how the transform itself is generated. A larger example of its use
(styling an ACM paper) is then presented. Finally I consider possible developments
of the technique and draw some conclusions.
A Simple Example
Suppose we have a very simple SVG document:
Now we want to improve the attractiveness of the offering, by styling it with an externally
defined CSS stylesheet:
(If you are viewing this directly in an HTML browser, where the SVG is referenced
by an img element, as opposed to the
mediaobject in the original DocBook source, then you won't see the pretty colours – this is one
of the issues this work is addressing!)
SVG extends the CSS property vocabulary for its own specific presentational properties,
such as fill. In line with the don't
change topology philosophy of CSS, these do not cover alterations to geometry, i.e. we can't change
the size of one of the rectangles through
a CSS stylesheet. If we now reference this SVG document from within another, such
as an HTML rendering of this paper, the CSS styling probably won't
come through[2]. A much more robust approach would be to fix the styling on the SVG and embed it
directly:
This paper is about how, and to what extent, this projection of a CSS stylesheet onto
an XML document can be performed in an entirely XSLT execution
environment.
Similar work
The problem of incorporating CSS into the XSL-FO mix has been addressed by a specific
product – CSSToXSLFO.
http://www.re.be/css2xslfo/index.xhtml which is specifically targetted to generated styled XSL-FO structures corresponding
to the box-model
of layout implicit in CSS. Internally it uses the Flute Java-based CSS Parser to generate
an internal (Java) rule structure. This structure is then
applied across the (arbitrary) input XML document to create a DOM tree with additional
namespaced attributes or elements (e.g.
@css:text-align, css:before) which define the appropriate property as determined by the CSS. This intermediate
document is
then converted to suitable XSL-FO by a generic XSLT stylesheet matching these determined
CSS directives attached to elements.
Externally my approach is perhaps similar, in that the CSS sheet is parsed into an
action that determines CSS-defined styling properties and
additional content, though these are used to overwrite any existing properties of the same name, and no further conversion is
attempted. Internally it is quite different, developing a complete parse tree for
the CSS (using regular expressions in XSLT), which is used to generate a
suitable XSLT program.
The execution model
A single simple CSS rule declaration has the following generic form:
where for elements matching the selector pattern, values for the given style properties
are declared. Such property values might be superseded or
modified by properties defined in other rules with more specific matches to the given
element. In this way, stylesheets can cascade their effects. For
example a rule h1 {font-size: 130%;} in a generic stylesheet merely declares that h1 elements should be rendered in a 30% larger
font than whatever is the default. A more specific CSS sheet can define what that
default font size should be or even force a specific font size on the
h1 element.
The selector patterns, whilst not as rich as XSLT's generic XPath patterns, do have
enough expressiveness to cover ancestry, class membership and
identity reference, sibling position, descendant contents, attribute values and so
forth. Pseudo-element selectors permit styling portions of the tree
that are not direct elements, such as adding preamble via the :before selector.
Most properties are simple stylistic scalar values (e.g. color:), but some relate to box-model display control (display:)
and others declare additional content or counter variables, which are used to support
numbering. Later versions have started to incorporate declarations
about column placement, flow targets etc.
The precedence model between these CSS declarations is comparatively simple – a specificity
model for a given property examines patterns which would
effect it and score them in a hierarchy on the numbers of identity attribute, other
attribute and element name references in the pattern. Specific
@style attributes win out and the universe of declarations is also split according to source
(user agent, user and author) and an
importance declaration (!important).
Simple (and approximate) XSLT model
There are a number of possible architectures that could be used. In this section I
discuss one where there is some direct correspondence between the
CSS rules and the XSLT templates in the equivalent stylesheet – the approach the author
took originally. Later on I'll discuss some other possibilities
for the overall design.
I'll start by examining a very simple case, using the example CSS stylesheet above.
According to the CSS specificity rules, when rule 2 is matched,
then the @fill property would be set to 'blue' as it is more specific than the (unclassed) rule
1. Technically then we should compute the
fill property for each element independently of the other properties, with reference
to these rules.
However, if for the moment we make some assumptions about the sort of CSS stylesheets
being employed, we can build a fairly simple XSLT transform
with the following templates that would have the same effect:
(using a default XPath namespace for SVG, see section “Namespaces” ) which generates
We use XSLT templates to describe the entirety of the property applications within
each CSS rule, rather than using the set of rules to look-up a
required property in turn. This works by using the feature that XSLT can overwrite
the values of element attributes while still effectively constructing
the element node itself rather this its children – the last written with a given name
to the element head tag becomes the value in the result. This we
arrange by ensuring that the lowest precedence rules fire first followed by the successively
higher and higher precedence (specificity) rules in turn,
giving every one of our template rules defined and different priorities and using
the xsl:next-match instruction to chain through to the
lower priorities for earlier evaluation.
This simple general technique makes the following assumptions which seem to be approximately
good enough for many common uses of CSS[3]:
Evaluating matching rules for all properties en masse for a given element produces an equivalent end result to lookup of
each relevant property through the rule structure in turn. It can be dependent on
the fact that many final user agents will (should?) silently
ignore additional extraneous (and usually attributive) information. For example the
transform above, because of the * matching, would also add
@fill and @stroke attributes to any desc or title elements (or their descendants) in an SVG
document, even though such properties aren't defined for those meta-documental components
within SVG[4]. How this problem might be alleviated will be described in a later section.
No use is made of relative properties, i.e. where a property changes dependent upon
that of some parent. In this case each rule can be
considered in isolation. Later I will discuss how this issue might be solved.
To generate these rules first we need to determine equivalent XSLT patterns that will
match elements according to the CSS selector grammar. Some
examples are shown in the following tables:
Table I
Some basic CSS pattern equivalents
CSS
XSLT pattern
*
*
element
element
.class
*[tokenize(@class,'\s+')='class']
#id
*[@id='id']
E > F
E/F
EF
E//F
E - F
F[preceding-sibling::E]
E + F
F[(preceding-sibling::*)[1]/self::E]
Table II
Some CSS predication pattern (selector) equivalents
CSS
XSLT pattern
Notes
pattern[attr]
XSLT pattern[exists(@attr)]
pattern[attr=value]
XSLT
pattern[@attr=value]
A suitable typed binding of value to an XPath literal (e.g. 1, 2, 'three' ...) will be needed.
pattern:nth-child(even)
XSLT pattern[position() mod 2 = 0]
pattern:nth-child(odd)
XSLT pattern[position() mod 2 = 1]
pattern:nth-child(digit)
XSLT pattern[position() = digit]
@style property
As well as supporting grounded attributive properties (@fill, @stroke-width) many XML dialects permit a conglomerate
@style attribute, whose value is a list of property-value pairs. Some dialects (e.g. XHTML)
only permit direct
document-borne styling though this means. To make this compound style attribute list,
we can use substantially the same mechanism with the following
modifications to the code:
We write the found attributes onto a temporary element H so that the last values win, then we sweep up all the
attributes into the property/value list. (If we only want the @style property then the xsl:sequence select="$style-att"
instruction can be suppressed. This could be controlled on a source-language property
of the overall process.)
Generated content
CSS selectors can also include qualifiers (pseudo-elements) that add (textual) content
to a given element. This is often used to support list
labelling or numbering of headers or running entities such as figures or tables. (Sadly
this isn't a feature of SVG, so instead of illustrating with
pictures, I'll have to use dull XHTML instead.) As a very simple example:
The :before and :after selectors target effectively the beginning and end respectively of the contained
content of the
given element, using the content property (which can either be text, source resource URI, smart quotes, counters (see later)
or the value of an attribute on the source element) as the content to insert. [Technically
in CSS3 pseudo-elements such as these must use a double-colon
:: prefix to distinguish from pseudo-classes, but :before and :after are permitted in backwards compatibilty
for CSS1 and CSS2. See Selectors Level 3].
To process these we have a parallel set of templates that process the overall body
of the element, collecting before and after content sections in
turn being passed as tunnelled variables of xs:string? type. (Content is as far as I'm aware not additive, so more specific content
overrules less specific.) The match patterns are the same as those used for any normal styling for that element.
CSS uses the same mechanism to declare, modify and interpolate the values of counters,
which become important in styling sectioned documents.
When the counter names are the same as the elements they are counting, and same-name
elements don't nest, an equivalent structure for the
h2:before rule can be built in the XSLT by altering the content to:
which counts back through the relavent elements to the last element which reset the
given counter[5]. Obviously more complex code can be developed to handle arbitrary counter names,
and within an XSLT3.0 environment accumulators may offer a
more elegant solution.
Using accumulator as counters
XSLT3.0 introduced accumulators to support the processing or retaining data from the
source tree for later use within streamed processing. They
have parallels with xsl:key, but rather more flexibility, and can be considered as a map from source tree nodes
to an arbitrary value,
which is computed by a walk through the tree computing a pre-descent and a post-descent
value for each node. In the absence of any applicable rule
that changes the value of the accumulator, the value remains constant. Thus we could
compute the values of the h1 and h2
counters above (which increment by 1) with the following:
and we retrieve the value of the counters by accumulator-before('h1') and accumulator-before('h2') respectively. Note
that the h2 rule gets its value reset to 0 by any h1 element, as declared by the h1 rule. CSS defines that
numbering will nest for descendants using the same counter. This can be accomodated
for a nested example like ol/li.. ol/li by using the
accumulator to hold a stack of values and popping the stack on exit from the list:
This uses the post-descent phase of processing an ol to return to the previous level if it is within an outer
ol and the value is now accessed with head(accumulator-before('list')).
Drawbacks, shortcomings and workrounds
The currrent technique of course falls well short of handling all CSS constructs,
but there are some drawbacks and shortcomings that might be
susceptible to different approaches:
Relative properties
Some numeric properties within a rule in CSS, especially font-size, can be defined
to be relative to the default or
already-determined value for that property, such as h2 {font-size: larger;} or .superscript {font-size: 50%;}
The current technique would merely write this as the attribute's value, which would
probably be an (invalid) property value. In order to support this it
will be necessary to determine any computed value of that property for the parent.
In XSLT of course only the source tree can be examined directly by XPath – result trees need to be bound to local variables
to
examine those, and doing that from a child into the parent is tricky and fraught.
This suggests that computed properties for the parent need to be
passed down to child processing, probably in the form of tunnelled variables. A generic
possibility is to pass down a childless, but attributed,
instance of the altered element as a parameter for processing the children. In XSLT3.0
a map() would be a much more suitable mechanism,
using something like:
where the mode css:properties collects a map of the properties (relying on map entries overwriting), which can
involve properties from
inheritance or less specific patterns, and css:apply looks up the requested appropriate values and returns as attributes. Alternatively
in
XSLT3.0, accumulators, which compute across the tree, could use multiple rules to
track the values. For example with the following accumulator:
accumulator-before('font-size') executed on a div.small span.a would yield the value 10.
Inheritance
The property value keyword inherit can be used to declare that the value for the given property is the same as that
for its parent.
Thus * { color: inherit } means that all descendants of a node have the colour property of the closest ancestor
that defines a colour[6]. Any solution to support relative properties would also be able to support such inheritance.
Again using a tunneled variable holding such
inheritance values may be be suitable.
Namespaces
Most CSS stylesheets are used in a namespace-ignorant manner: patterns match local names of elements. Of course XSLT is far from such
ignorance and patterns should be suitably namespace-applicable. This can either be
done by using the local name with wildcarded namespace
(*:element-name) or when the document is in a single namespace, declaring the default XPath namespace,
which is used in the examples above. In CSS3 there are mechanisms to define default
and prefixed namespaces (@namespace svg
"http://www.w3.org/2000/svg";, svg|circle { ... }) so these could be detected and converted to XSLT trivially.
Style provenance and importance
Styling rules can come from a number of sources: the rendering agent, a user-imposed
stylesheet, an author-declared stylesheet and also attached to
@style properties on a document element itself. In addition an important indicator can preclude further overriding of a
given property. These would all have influence on the precedence order between rules
and associated templates, though !important applies
only to a property, not to a rule. This would require properties to be processed individually
rather than as at present,
en masse.
Restriction to applicable elements
As suggested in section “Simple (and approximate) XSLT model”, the generated XSLT patterns are somewhat catholic, or even perhaps cavalier, in
their
applicability. Rather than an element of, say, type foo only requiring properties a and b, and in CSS looking
through the ruleset to determine if any definitions for these two properties exist,
the XSLT approach means that the rules impose themselves on the
elements. Thus a CSS rule * {a:1; b:2; c:3;} would impose the (attributive) property c onto foo elements, even if
that property were irrelevant to the foo type. In our SVG example above SVG meta-document elements such as desc or
title could be so styled with irrelevant colouring or fonting. When tools behave as good XML citizens this shouldn't be a
problem, though conceivably a property could be imposed (or overwritten) as an attribute
that was not stylable through CSS according to the domain
semantics.
If this is a problem, then a possible solution would be to exploit restrictions from any schema
available for the target
documents. The SVG schema for example restricts the permitted attributes
on desc to the following: @id, @xml:base,@xml:lang, @xml:space, @class, @content and @style. This it would be
possible to guard the application of other properties within a * { … } pattern with xsl:if test="not(self::desc)". A better
and more coherent option would be to use the schema actively either in the run-time
or the compile time. This is especially so given the structured
nature of many schemas – in the case of the SVG schema large attributeGroups partition the sets of applicable properties into useful chunks.
Note
Sadly SVG 1.1 only has a DTD – the schema is from an earlier 2002 version.
Another drawback is that our CSS rules are applied regardless of whether the property
involved is CSS-manipulable within the target language. For
example we could write a rule:
rect.large {
width: 200;
}
which would be translated into the simple template:
the consequence of which would be to set the width attribute of large rect elements. However in SVG, unlike for example in XHTML, no
direct geometry properties are susceptible to CSS styling (see SVG's
styling properties), and normally the directive would be ignored, so in our (XSLT-styled) case we'd
get an incorrect, not just approximate,
result. This could be solved by a target-language-specific filtering of the CSS rules.
Note that using the SVG schema or DTD alone to determine
applicability would not be sufficient – @width is a rather crucial component of svg:rect!
Constructing the XSLT transform
In this section I'll discuss how the XSLT transform to perform the CSS styling is
constructed. As this tool was built up incrementally as needed, some
parts are certainly not as complete as they eventually should be.
The first step of this process is to parse the CSS stylesheet(s) to produce an XML
equivalent parse tree, from which code equivalents can be generated
through normal XSLT push-processing methods. It should be possible of course to use
some form of full-blown parser, perhaps one generated by Gunther
Rademacher's excellent parser generator REx REx[7]. Another option would be to build on the example of Invisible XML described by Pemberton Pemberton2013
which uses parsing CSS as an example. I'll return to these possibilities in the conclusions.
However, in the somewhat ad hoc environment within which this tool was originally used, and given the assumption
that the
complexity of the CSS stylesheets being processed is understood well and that content
is friendly (e.g. strings don't contain
{ or }), compound use of regular expressions has sufficed so far.
CSS stylesheets can import other stylesheets through @import placed at the directives at the head of a file – the semantics of the
collection are flat, so a recursive inclusion sweep through the implied URL network
concatentating text will yield a collection of statements that are in
the correct document order.
After removing comments, the text of the CSS is split into separate rules by tokenization
against a \}\s* pattern. Further
regular expression tokenization gives us for each rule the selector text and a sequence
of property/value pairs. The selector text is then parsed using a
set of regular expressions (operating within an xsl:analyze-string instruction) to produce a modified @match attribute
containing a suitable XPath equivalent. For extensibility these regular expressions
are bound to variables – the following section of the converter shows
how they're used[8]:
These regular expressions are used twice – firstly within an overall match (when they
have been joined as a union) and then individually in specific
tests for particular components – the order of these choices can be used to match
more specific cases before more general, such as a named element with
class before a generic class. These pattern structures not only can contain the selector pattern and any implied properties,
but also any
other qualifiers, such as being a :before pseuo-element. As an example:
(Note that these pattern elements contain xsl:attributeinstructions, which will then be copied into the final templates of the transform.) We now have
a sequence of all the rules within
the CSS. They should then be sorted according to their CSS specificity rank (a four-level
vector), which can be calculated while the pattern
elements are being constructed, e.g.
This sorted set of rule descriptions is then used to generate an XSLT template (as
shown in Figure 4) in mode css
for each one, with a successively increasing priority and a generic form of executing
lower priority matches first through an xsl:next-match
chain followed by writing the relevant attribute properties.
With the equivalent stylesheet formed, it can either be emitted as a result, to be
used externally, or within an XSLT3.0 environment, it can be used
as the source stylesheet for a transform() function invocation to provide the styling to some candidate XML within the encompassing
stylesheet itself.
Additional features
Apart from selectors, property values and counters, CSS also defines a number of other
styling features for which equivalent XSLT can be generated
comparatively easily. These include:
text-transform
CSS supports models for transforming the case of text content of an element with the
property text-transform that supports four
options (lowercase | uppercase | capitalize | none). This is supported by passing any text-transform property that has
been determined when processing an element's body as a parameter to the match for
text() nodes, and altering (some) characters of the
text string value if required.
content: open-quote | close-quote
The content property can contain reserved tokens for inserting quotation marks, which
can nest, and whose non-default values are taken from a
property quotes which contains a list of open and close pairs for successive nesting levels. (Currently
these are merely mapped to “
and ” respectively, with no nesting considered.)
A larger example
A more realistic example is the authoring/production process for a professional paper
to be published eventually in PDF format. The publisher, in this
case the ACM, provides a style guide, and templates in Word or LaTeX. In this case
the paper source, much of which might be autogenerated (including
self-laying-out SVG sections) was in neither of those formats, but in an extended
and well-formed XHTML. The publication route, which had been developed
over several years, involved a mapping from XHMTL to an extended SVG/FO format (which
included layout directives), then to a fully grounded SVG and thence
to PDF with a SVG→PDF converter[9].
Originally the paper sources were created in a standard XML editor and the ACM styling
was imposed through XSLT templates and attribute sets that were
used to form up the equivalent document content and layout directives. This was reasonable,
but the editing front-end was all angle
brackets. With the advent of styled authoring packages, such as that in Oxygen, where a document
is viewed and edited through a CSS stylesheet,
it became possible to edit the document with most elements styled reasonably – e.g.
paragraphs have correct fonting, headings look similar to those that
will be in the final document, including section numbering, lists are seen as lists
etc[10] and angle brackets are rarely seen!
It took little time to write the ACM-equivalent CSS for the main components of an
XHTML report, so the paper authoring became much easier. Then we can
use the techniques described above to ground the style of the XHTML according to the
CSS, and remove the attribute-set styling from the XHTML→SVG/FO step.
In the process of doing this of course, we increase the generality of the solution.
Textural styling (as opposed to layout/geometric) can now be
completely defined from the CSS.
This is a part of the original source, where there are some additional element declarations
(embed supports adding multiple images,
declaring their relative positioning and sizing), and attribute properties (@lay:float declares that this element, in this case this entire
div, can always float forward (next page, next column..) if it won't fit in remaining available space.)
<div lay:float="always" lay:id="d48e871">
<figure id="sE2" height="free">
<embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg" xpath="(.//svg:g[@id='main']/*,./*)[1]"
stack-offset="50" TYPE="image/svg-xml" gap-y="5" height="20" columns="2" box="no" lay:same.scale="true"/>
</figure>
<p class="caption">Figure 6. A continual variation of Figure 5</p>
</div>
<p>The retained instructions (<code>xsl:for-each</code>...) are considered to be a null element
as far as layout is concerned, will not be displayed in an SVG viewer, have no influence on
geometry and will be copied in place in the resulting resolved flow<sup class="fnIndex">8</sup>.</p>
<p class="footnote" lay:footnote="footnote">
<sup class="fnIndex">8</sup>This is not so for topologically-modifying layouts,
such as sort-by-size but any eventual (graphical) results from these instructions
would be placed correctly on resolving the parent's layout.</p>
<p>There are a small number of possible types of propagation
that may be declared in an <code>@R:retain</code> attribute:</p>
<ul>
<li>
<code>true</code> - execute the instruction and always retain.</li>
<li>
<code>while[<em>(expression)</em>], until[<em>(expression)</em>] </code>- execute the instruction;
retain while/until the given XPath expression is true.</li>
</ul>
These are some relavent sections from the ACM style CSS:
And this is what it looks like when viewing the combination in Oxygen's Author mode.
When we transform the CSS into XSLT and apply it, we now get:
<div lay:float="always" lay:id="d48e871" style="font-family:Times-Roman;font-size:9pt">
<figure id="sE2" height="free" style="font-family:Times-Roman;font-size:9pt">
<embed src="simpleExtend-2.svg?type=tree simpleExtend-3.svg"
xpath="(.//svg:g[@id='main']/*,./*)[1]" stack-offset="50" TYPE="image/svg-xml"
gap-y="5" height="20" columns="2" box="no" lay:same.scale="true"
style="font-family:Times-Roman;font-size:9pt"/>
</figure>
<p class="caption" style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt; text-align:center;
hyphenate:true;margin-top:5pt;font-weight:bold">Figure 6. A continual variation of Figure 5</p>
</div>
<p font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;
padding-bottom:3pt;text-align:justify;hyphenate:true">The retained instructions (
<code style="font-family:Courier; font-size:8pt;font-weight:bold">xsl:for-each</code>
...) are considered to be a null element as far as layout is concerned, will not be displayed
in an SVG viewer, have no influence on geometry and will be copied in place in the resulting resolved flow
<sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup>.</p>
<p class="footnote" lay:footnote="footnote"
style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true">
<sup class="fnIndex" style="font-family:Times-Roman;font-size:6pt;vertical-align:4pt">8</sup>
This is not so for topologically-modifying layouts, such as sort-by-size
but any eventual (graphical) results from these instructions would be placed correctly
on resolving the parent's layout.</p>
<p font-family="Times-Roman"
style="font-family:Times-Roman;font-size:9pt;padding-bottom:3pt;text-align:justify;hyphenate:true">
There are a small number of possible types of propagation that may be declared in an
<code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">@R:retain</code>
attribute:</p>
<ul font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;margin-left:12pt">
<li font-family="Times-Roman" style="font-family:Times-Roman;font-size:9pt;text-align:justify;
hyphenate:true;padding-bottom:3pt">
<code font-family="Courier" style="font-family:Courier;font-size:8pt;font-weight:bold">true</code>
- execute the instruction and always retain.</li>
...
</ul>
which then gets converted to SVG and ultimately a PDF looking like this:
Note
The observant will notice that the footnote styling alters. footnote was an artificial element used for authoring, whence the italics
aided viewing in the styling editor. During a phase where such additional constructs
are expanded to suitable XHTML equivalents, the
footnote was converted into a sup index, placed at the same location and a p @class="footnote", paragraph,
containing the relevant index reference, immediately following the enclosing paragraph.
Such footnote bodies are extracted during
pagination to build from the bottom of a column.
Not only has this technique been used on a variety of professional papers, it has
also been employed to generate PDF product data sheets for a
well-known XSLT engine supplier, and indeed the entire styling of the author's PhD
thesis.
Possible developments and Conclusion
This paper is reporting on work in progress and as such there are a number of areas which would need development for black
box use, some of which have been introduced earlier. The most pressing, providing robust
parsing of the CSS, is reported in the next subsection.
Others that await development are:
Supporting important (!important) declarations – these need to be separated from the humdrum, which means that rules
may need to be
split into two declaration sections and those that are important ranked in precedence
(and thus eventual XSLT template priority) above all that are
not.
Providing a route for inheritance and relative property declarations. As implied earlier,
a mechanism using tunneled variables (xsl:param
name="font-size" tunnel="yes") could be used to pass through computed values for parent elements in the recursive
processing of the tree,
from whence the relative value for the child can be computed. This will be complicated
further by the need to process in multi-unit arithmetic
(150% * 14pt), in a discrete universe (larger x-small) or establishing what are the base values in the absence of any CSS
stylesheet setting (e.g HTML font size 3).
Robust CSS parsing
Using a robust CSS parser (operating entirely in XSLT perhaps, or as a CSS/SAC-based
XSLT extension function) producing an XML tree would be
advantageous and certainly needed for any full-scale automated production use. As
an example, using an approximate CSS declaration grammar along with a
grammar for the selectors, written in EBNF, we can use REx to generate an appropriate
parser and obtain a raw parse tree for a CSS stylesheet, where
element names are those of the appropriate productions.
These trees tend to be large, as noted by Pemberton and reduced forms are much more
convenient. As in Lumley2014 it is a simple
XSLT task to refine these trees to the essentials, by discarding most one-child elements
in favour of their children, and migrating many of the
operators to attributes. So for example we can parse the following into the accompanying trees[11].
The ideas of Pemberton on defining a grammar which specifically marks what is to be
written to the final parse tree, and how, is quite attractive
as it would automate the refinement phase, which is currently written in (simple) XSLT. It might even
be tractable to use an
Invisible XML grammar definition and project it into two forms – an EBNF to use for a parser generator,
and a set of instructions
(XSLT templates?) for parse tree refinement.
From these structures it is comparatively easy to either form the same pattern structures
as shown earlier, or produce a new template-generation
strategy. Here are an intermediate form and the resultant XSLT structures:
For the first rule the appropriate properties for the children (suitable pattern,
CSS specificity) are shown – these are then combined to make the
generic pattern and compound specificity for the rule. These rules can then be sorted
according to their specificity to produce a declared priority
order for the resulting XSLT templates. The second rule has been split into two copies
as their patterns have differing specificity and hence must be
supported by XSLT rules operating at differing priorities.
Generic program translation
In a more general sense, this work is an example of a generic technique whereby portions
of a programP, written in some language Y are used within some other program written in a language Z
, by equivalence conversion of P→Pz. The success of this would be subject to a number
of factors:
The degree of similarity in executional semantics of Y and Z. In this case CSS and XSLT push mode both
work in a pattern-directed manner, with a defineable conflict resolution model[12]with a similar source tree → result tree execution and a common target data type,
so most of the main semantics of CSS
can be simulated closely by XSLT equivalent models.
When the languages are not in such close agreement, the ease by which an interpreter
for sections of Y can be constructed
in Z becomes important. For example a program defined as simple non-cyclic spreadsheet
could be interpreted in XSLT by casting
as some XML datastructure coupled with an iterative tree-walker that uses a calculator
function.
The extent (and accuracy) of the semantics of Y required within P. When these are large, significant
problems may result. For example, if P requires accurate error handling, then it may well prove intractable to provide the
same
in the Z environment. In our case, CSS's execution error handling is basically ignore, so this doesn't cause a
problem, and we're not pushing the edges of the CSS envelope. Equally we are prepared
to push the CSS property onto the target,
rather than pull the property from the CSS, giving the possibility of extraneous styling, on the basis that final user agents
will
ignore such minor infractions, permitting a much simpler and more generic model.
The need for execution performance. In general such equivalent-code simulations will
always be slower than native implementations, sometimes
very much slower.
The skill and craft of the program developer, especially where some feature requires
a workaround to be supported.
The ease with which language Y can be parsed from code in Z. This can range from hand-crafted code
(such as using regular expressions as described in section “Constructing the XSLT transform”) to using a specifically targeted parser.
The real advantage of using XSLT for such conversion is that once a parse tree has
been established in XML, it is a uniform push-mode
process to build other necessary data structures and ancilliary models which can then
either be used to generate directly equivalent XSLT or be used by
suitable interpreters to execute the required semantics of P These intermediate data structures can be examined easily (they
serialize).
Conclusion
This paper has described using a program written in another language (CSS) as a component
within an XSLT execution environment, by cross-conversion.
The method was originally rather rough-and-ready and used for a single document purpose
but has progressively been refined until it might be suitable
for (some) semi-production situations, especially in the hands of craftsmen. Though
the similarity of the major semantics of CSS and some parts of XSLT
(pattern invocation, property projection) meant the necessary XSLT execution model
was reasonably simple, there is still potential in the technique for
other situations. The use of an XML-centric processing model starting from the source
language parse tree and progressing on to intermediate data
structures, coupled with XSLT3.0's rich toolbox, makes this field not only productive,
but fun. I encourage others to try it.
Acknowledgements
I must thank the several organisations (ACM, University of Nottingham, Saxonica) who've
posed document generation challenges that motiviated and
exercised this need to employ CSS where it wasn't supported. I'm also grateful for
the insightful and supportive comments from the Balisage reviewers
who've encouraged me to increase the robustness of the method, both in terms of using
a proper CSS parse and in addressing more complex issues
especially in style inheritance.
References
[CSS] Bos, Bert et al, Editors. Cascading Style Sheets Level 2 Revision 2 (CSS 2.2) Specification. World
Wide Web Consortium, Editors' Draft, 29 March 2016. [online] http://dev.w3.org/csswg/css2/
[CSS3] Çelik, Tantek et al, Editors. Selectors Level 3. World Wide Web Consortium, 29 September 2011.
[online] http://www.w3.org/TR/css3-selector/
[XPath3.0] Robie, Jonathan, Chamberlin, Don, Dyck, Michael, and Snelson, John, Editors. XML Path Language
(XPath) 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/
[XPath.FO] Kay, Michael, Editor. XQuery and XPath Functions and Operators 3.0. World Wide Web
Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-functions-30/
[XSLT2.0] Kay, Michael, Editor. XSL Transformations (XSLT) Version 2.0 (Second Edition). World Wide
Web Consortium, 23 January 2007. [online] http://www.w3.org/TR/xslt20/
[2] Or if it does, it requires various forms of black art or eldritch wizardry.
[3] At least the half-dozen or so the author has needed, in production of professional
papers (ACM, Balisage, XMLLondon ...), product marketing
documents and his own PhD.
[4] The author argues in Lumley2013 that such good XML citizen behaviour by processing tools is important for
the production of compound document processing architectures. Note that if the content
of an SVG desc element contained markup
elements (such as in the XHTML namespace) this styling might well effect those components.
[5] Reusing the same XSLT variable name in sibling positions is legal – the scope for
each is along a
following-sibling::*/descendant-or-self::* compound axis, until overridden.
[6] For some XML dialects this might be true by default. SVG has default interitance for
textural styling, even after CSS style application.
[7] Indeed one of the samples for the generator is for CSS selectors.
[8] In XSLT3.0 a set of map() bindings for these regular expressions might increase the
coherence.
[9] These papers were all reporting on a highly extensible and highly-adaptable variable
document architecture. Not only was it more coherent and
robust to use the architecture itself to form the papers, for example supporting in-document
evaluation of examples, it was also behoven on the
authors to demonstrate that the architecture was capable of producing documents to
a high professional standard. Close visual inspection of the
resulting PDF can tell it came from neither the Word or LaTeX templates, but the variation
is less than between those two forms themselves.
[10] Much as this paper is being authored using the Balisage CSS.
[11] The author is one of those who prefers keeping properties in attributes when they
are singular rather than as text values of elements.
[12] Other target languages that might have similar tendencies include Prolog, Mathematica,
the Haskell/ML family of functional languages and
SNOBOL descendants.
Robie, Jonathan, Chamberlin, Don, Dyck, Michael, and Snelson, John, Editors. XML Path Language
(XPath) 3.0. World Wide Web Consortium, 08 April 2014. [online] http://www.w3.org/TR/xpath-30/
Kay, Michael, Editor. XSL Transformations (XSLT) Version 2.0 (Second Edition). World Wide
Web Consortium, 23 January 2007. [online] http://www.w3.org/TR/xslt20/