Introduction
The standard method for displaying a TEI document on the web is to either pre-transform it to HTML using XSLT or to dynamically transform it to HTML, again usually with XSLT or possibly XQuery. This method, while obviously totally viable, does tend to enshrine a particular workflow and division of responsibilities that may not be optimally flexible. We will present an alternative approach that permits a lighter-weight development workflow and uses more standard web technologies.
The TEI Stylesheets and (though to a lesser extent, usually) more specialized XSLT conversions are large, complex software packages in their own right. Their development and maintenance requires experienced XSLT developers. Much though we might regret it, this is no longer a widespread skill. Even very sophisticated XSLT transforms usually involve discarding or substantially reformatting some data in the source. That XSLT is capable of this is obviously a strength, but we would argue there is a subtle pressure towards rewriting your data model for presentation rather than making use of it. Moreover, since XSLT has become a niche specialty, projects are more likely to either just use an existing package for transformation or rely on inexperienced developers customizing such a package. More subtly, the way XSLT transforms divorce the presentation of texts from the work done to model them means there is a disjunction between the (often intensive in TEI) intellectual labor of encoding texts and the presentation of those texts online. Browsers today are dynamic and powerful rendering engines, so why not apply them more directly to the TEI’s data model?
CETEIcean (sɪˈti:ʃn)[1] was developed to support a lightweight TEI presentation workflow that requires neither pre-display document transformation nor any complex server-side architecture. All that is needed to use it is a web server. The browser does all of the work. CETEIcean is an ECMAScript (JavaScript) library that reads in TEI XML and converts it to HTML Custom Elements. The converted source can then be rendered using standard CSS and JavaScript techniques. CETEIcean supplies a method, calld "behaviors" to add decorations—or even make widgets out of—TEI elements. An obvious advantage to using JavaScript and CSS is that people with these skills are much easier to find, making both initial development and ongoing support simpler. But beyond these mundane questions, we have found that having the TEI’s data model directly available makes some interesting techniques possible.
Because it does not rely on a transformation step, the use of CETEIcean means that a distributed development workflow can be used. The Digital Latin Library (DLL)[2] is a new initiative to publish digital critical editions of Latin texts. We have been using GitHub Pages and CETEIcean to work on pre-publication texts. A “stub” HTML file is placed on our GitHub Pages site that points to the raw XML in a separate GitHub repository. When the stub file is loaded in a browser, it fetches the source file and displays it. What this means in practice is that editors working on the XML source have only to push to GitHub to be able to see how their TEI document will look in a web browser. The editor does not need to be able to run their own XSLT transform, nor is there any setup required beyond a very simple HTML file being placed in the GitHub Pages repo. This “push to publish” workflow means it is very simple to check your own or others’ work. If the encoder and the editor are not the same person, reviewing the encoder’s work is trivial. Moreover, because the reviewer is looking at a 1::1 representation of the source, errors are more likely be easy to troubleshoot, as they haven’t gone through a transformation process that might accidentally suppress or obfuscate them.[3]
The “push to publish” workflow is also useful in the classroom; we have adopted it
to
teach TEI as part of the Introduction to Digital Studies in the Arts and Humanities,
a
graduate course at the Maryland Institute for Technology in the Humanities. Students
are able
to focus on learning how to encode texts with the TEI and are not required to learn
XSLT to
preview or publish their work. By simply adopting an HTML template equipped with CETEIcean,
they are able to engage with issues related to digital publication with a lower learning
curve. With the 1::1 representation of the TEI source, students are able to use CSS
to style
their encoding directly. This kind of workflow is of course possible using XSLT 1.0
with an xml-stylesheet
processing instruction in the source XML or using
(non-standard) JavaScript-based XSLT transforms, but we think CETEIcean permits more
flexibility than the processing instruction method and that JavaScript is more powerful
than XSLT 1.0.[4]
Implementation
CETEIcean converts all TEI elements to an “HTML” equivalent with a tei-
prefix. TEI attributes, which mainly have no namespace, are simply copied over. The
@xml:lang
and @xml:id
attributes are converted to their HTML
equivalents @lang
and @id
. Data attributes are used to preserve
namespace information, original element name, and whether the element is empty. For
most
elements, this is sufficient, but for TEI elements which have HTML equivalents, more
work is
necessary. TEI has a couple of constructs roughly equivalent to HTML <a href>
for example, namely <ref target>
and <ptr target/>
(the former
represents linked text with one or more targets and the latter a bare pointer with
no text).
While linking behaviors can be applied to these post-conversion, those links do not
have the
full status of HTML links—browsers do not preview the link URL on hover, for example.
CETEIcean handles this differently depending on whether the browser in question handles
registering Custom Elements.[5] Recent builds of Chrome and Safari support the feature. In
Firefox, the support is experimental.
In browsers with Custom Elements support, the additional behaviors are applied in
the
element’s constructor. So when the element is added to the DOM, it gets additional
features.
TEI <ptr>
elements, for example, get an <a href>
element
inserted inside them, with the @href and the content set to the @target of the
<ptr>
. Tables are another case where HTML and CSS can have a hard time with
non-HTML table elements (the new CSS Grid Layout module may resolve this issue), so
TEI tables
have their content hidden and replaced with HTML table elements.
In browsers which have not yet implemented Custom Elements, the same effects are achieved
using a fallback method which calls the same behavior function. A baseline set of
element
behaviors is defined for CETEIcean, but these may be redefined or extended by the
addition of
custom behaviors. Behaviors are simply JavaScript objects which list element handlers
which
match the TEI element name. These handlers may either be JavaScript functions or arrays
containing one or two strings. The latter are automatically converted to functions
which
insert the content of the array into the element, prefixing or wrapping its content.
We might
define a behavior for the TEI <add>
element, for example, which would wrap its
contents in the Leiden Convention markers for text added to a document.
The latter method provides a useful alternative to using CSS content to decorate
elements, as that produces text which is not selectable in the browser.
TEI in the Browser
CETEIcean is not the first solution to permit TEI in the browser. TEI Boilerplate
(http://dcl.ils.indiana.edu/teibp/)
uses an XSLT script to wrap the TEI document in HTML and convert those TEI elements
that have
specific HTML equivalents, such as <ref>
and <ptr>
discussed
above. The transformation is triggered by an “xsl-transform” instruction which has
to be added
to the source file, and because browser-based XSLT capabilities stalled at XSLT 1.0,
it is
limited to that version of the language. The now-deprecated Saxon-CE implementation[6] supported XSLT 2.0, and its replacement, Saxon-JS[7], supports XSLT 3.0 and provides significant performance improvements. Saxon-JS
requires a commercial version of Saxon to compile the stylesheets into JavaScript,
however,
which means it is not really viable in an open source environment. As a matter of
policy, the
TEI Consortium won’t distribute software that requires users to purchase a software
license in
order to use it.
Moreover, since web technologies like CSS and Javascript are now capable of delivering a browser-based TEI and since they have become a de facto requirement, even in an environment where XSLT or XQuery are transforming the TEI source to HTML before delivery, perhaps it is fair to argue that we should simply use them and skip the XSLT step. XML technologies (XSLT and XQuery) are arguably the most effective at manipulating XML and CETEIcean is not meant to replace them for every operation. Rather, it offers a lightweight, yet fully-featured, solution for publishing text encoded with XML. Occasionally, however, transformation and DOM manipulation can be essential to publication. While performing transformations on the server may be the most effective solution in those cases, JavaScript is still capable of performing complex DOM operations. For example, the Shelley-Godwin Archive (S-GA)[8] displays TEI documents that encode the mise-en-page of manuscript material, that is, the XML identifies zones of writing (main, marginal, etc) and authorial revisions (deletions, additions, and their location on the page). In the TEI, additions in the margin are encoded separately from the main zone of writing, but in display they need to be positioned near the line where they logically belong. If we were using XSLT to transform TEI to HTML, we may position the marginal text in a way that makes it easier to render; in this case, we use JavaScript to determine the position based on the TEI encoding and adjust the CSS dynamically (see the result in Fig 2)[9].
The DLL’s edition viewer does some DOM manipulation to generate a traditional apparatus
criticus from the TEI <app>
elements in a text. A source text with inline
elements (see Figure 3) is displayed as text (Figure 4) plus apparatus (Figure 5).
This kind
of transformation is fairly standard, and it would not be unusual to see it done with
XSLT
instead. What is unique about the DLL viewer is its use of the TEI’s model of textual
variation to produce a dynamic apparatus. Besides the traditional appearance of the
app.
crit., the viewer also generates widgets which manipulate the edition’s DOM (which
isomorphically presents the TEI’s model), and thereby allow a reader to make decisions
about
what should appear in the reading text. The TEI models textual variation by placing
the
variants in parallel inside an <app>
element. The variant to appear in the
main text is placed in a <lem>
and any additional variants go in
<rdg>
elements. The DLL viewer’s widgets allow readers to change any
<rdg>
into a <lem>
(and the <lem>
into a
<rdg>
), promoting that reading to the main text. The page’s CSS takes care
of the rendering, as <lem>
s are displayed and <rdg>
s are not.
In this way, the edition’s readers can try out different versions of the text and
see how the
affect its flow and meaning. This makes for a much more powerful and intuitive critical
edition than is possible in print (or static HTML). Obviously, something similar could
be done
with an HTML version converted from TEI with XSLT, but having the affordances of the
TEI model
directly available in the browser makes it easier for a developer to see how that
model can be
leveraged.
Limitations and Solutions
There are, of course, cases where one might want some radical transformation of one’s
source encoding. The natural inclination (and CETEIcean’s default) in rendering a
TEI document
in a browser is to make the header invisible, for example. After all, it’s the text
that’s
meant to be read. But a number of projects put important information in the
<teiHeader>
, and they might reasonably want to display that in the rendered
document. Things like critical apparatuses are an example where a natural display
form must be
created by extracting information from the text and reformatting it. CETEIcean by
itself does
not help particularly with these cases, although it is quite possible to accomplish
the goal
with Javascript, as the S-GA and DLL do.
A more important concern is what to do about search engines. Google, for example,
used to
make some attempt to index pages rendered using AJAX calls, where the page content
isn’t
actually present in the page source, but is fetched at load time, but deprecated this
practice
in 2015. The DLL’s pre-publication editions would never have been indexed anyway,
as their
source is fetched from the raw.githubusercontent.com
domain, which excludes all
web crawlers. For purposes of the DLL workflow, Google-invisibility is actually an
advantage,
as it means pre-publication materials can be open, but at the same time not very discoverable,
and will not be in competition with the eventual publication. But when those editions
are
published, we definitely want them to be searchable. The DLL’s solution at publication
time is
to deliver a partially-converted file—i.e. A TEI file already converted to HTML Custom
Elements. The rendering is still done using CETEIcean, CSS, and additional JavaScript,
but the
source is indexable by search engines. An XSLT or XQuery conversion to a Custom Elements
format is trivial to write—the core of the DLL implementation is 30 lines of XQuery,
for
example. An alternative method would be to embed the TEI XML source in the HTML page
and let
CETEIcean load that, though it is hard to know what a search engine might make of
such a
Chimera. It is worth noting that (anecdotally) TEI Boilerplate does not fare well
with search
engines either.
Conclusion
CETEIcean does not provide a complete replacement for XSLT and XQuery as a means for publishing TEI on the web, but we do think it is a viable alternative for some projects, and is particularly useful in situations where a quick view of work in progress is needed. It especially shines in situations where the TEI’s model of the text can be usefully leveraged to allow interesting dynamic functionality in the browser. The isomorphism of CETEIcean documents to their TEI sources also means it will support things like robust annotation (since annotation targets should be able to be trivially mapped to the source documents) and in-browser editing (because we can easily turn the HTML back into TEI). In sum, it seems like this approach has a great deal of potential and starts to get us out of the cul-de-sac our XSLT dependency had put us in.
[1] https://github.com/TEIC/CETEIcean. CETEIcean is released under a BSD 2-clause license (see https://opensource.org/licenses/BSD-2-Clause).
[3] Examples are available at https://digitallatin.github.io/viewer/calpurnius.html and https://digitallatin.github.io/viewer/balex.html. Both examples pull their source text from a separate repository on GitHub.
[4] See below for a discussion of proprietary XSLT 2.0 and 3.0 browser-based implementations.
[5] See https://caniuse.com/#feat=custom-elementsv1 for a chart of current browser support for Custom Elements v1.
[6] Saxon CE, http://www.saxonica.com/ce/index.xml.
[7] Saxon-JS, http://www.saxonica.com/saxon-js/index.xml.
[9] S-GA has been using a project-specific JavaScript solution for publishing this material and is in the process of switching to CETEIcean.