Caton, Paul, and Miguel Vieira. “The Kiln XML Publishing Framework.” Presented at XML In, Web Out: International Symposium on sub rosa XML, Washington, DC, August 1, 2016. In Proceedings of XML In, Web Out: International Symposium on sub rosa XML. Balisage Series on Markup Technologies, vol. 18 (2016). https://doi.org/10.4242/BalisageVol18.Caton01.
XML In, Web Out: International Symposium on sub rosa XML August 1, 2016
Paul Caton has worked in digital humanities since for two decades. Beginning as
Electronic Publications Editor for the Women Writers Project he went on to hold posts
with
the TEXTE Project at the National University of Ireland, Galway and with the INKE
Project
at the University of Victoria in British Columbia before going to the Centre for Computing
in the Humanities at King's College, London in 2010. Now a Research Analyst in the
recently-formed King's Digital Laboratory he works on multiple projects in both analytical
and development roles. His research interests include the representation of text by
formal
models and by markup languages; ontologies of personal relations; and models of
transcription.
Miguel Vieira is Kiln project manager and one of the developers. He has worked in
the
digital humanities area as a developer/software engineer for more than ten years.
He is
currently a software engineer/technical coordinator at the recently-formed King's
Digital
Laboratory, where he is reponsible for the research projects technical architecture,
and
managing the development team. His research interests include analysing and modelling
humanities and unstructured data, natural language processing, machine learning, data
visualisation, and linked data.
Kiln, previously known as xMod, is an open source multi-platform framework for building
and deploying complex websites whose source content is primarily in TEI/XML. It brings
together various independent software components into an integrated whole that provides
the
infrastructure and base functionality for such sites. Separation of roles is central
to
Kiln's design, allowing people with different backgrounds, knowledge and skills to
work
simultaneously on the same project without interfering with one another’s work. Developed
and maintained at King’s College London it has been used to generate more than 50
websites
for digital humanities research projects which have very different source materials
and
customised functionality.
Kiln[1], previously known as xMod,[2] is an open source multi-platform framework for building and deploying complex
websites whose source content is primarily in TEI/XML. It brings together various
independent
software components into an integrated whole that provides the infrastructure and
base
functionality for such sites. Kiln has two competing design goals: to support the
development
of unique, complex web applications; and to provide an out-of-the-box system suitable
for a
single non-technical person to publish a TEI-based site. The former requires the
customisability of every default component and the flexibility to integrate external
components as necessary; the latter requires a large amount of built-in behaviour
that can be
easily tweaked in isolation, and excellent documentation. Kiln’s documentation includes
a
tutorial showing how to customise each of the major elements of a site, as required
beyond the
provided defaults. Separation of roles is central to Kiln's design, allowing people
with
different backgrounds, knowledge, and skills to work simultaneously on the same project
without interfering with one another’s work.
Kiln is the latest iteration of work begun by a team at the Centre for Computing in
the
Humanities (CCH) - later the Department of Digital Humanities (DDH) - at King’s College,
London (KCL). Further development and maintenance of Kiln is now under the auspices
of the
recently-formed King's Digital Laboratory at KCL. Over the past years and over several
versions, Kiln has been used to generate more than 50 websites for digital humanities
research
projects[3] which have very different source materials and customised functionality.
Origins of Kiln
Kiln originated at CCH around 2004 as a framework called xMod. CCH was collaborating
with
academic partners in text-based projects where primary sources were encoded using
markup from
the TEI Guidelines. From this work three things became clear:
Even for relatively simple, straightforward digital resources academics needed to
come to CCH/DDH to have the resource built.
Multiple projects shared a core set of requirements.
In most cases, as well as the core set of requirements projects also had their own
very particular and often quite complex needs.
And from them arose corresponding needs:
The need to give non-technical people a way to set up a basic digital resource that
allowed web page display of XML-encoded source files.
The need to set up for ourselves a quick, dependable, consistent way of getting the
core requirements dealt with to maximize time available for the project-specific
work.
The need to be able to meet the particular requirements while still using the shared
approach to basics.
Those needs pointed towards the best solution being a framework that enables a
phased approach, where the initial phase involves quickly and easily setting up a
basic
digital resource which displays texts and offers basic browse and search functionality.
Subsequent phases could involve any or all of the following: customizing the look
and feel;
expanding the browse and search capability; integrating with other things to create
a larger
whole (eg. having a CMS front end). This approach avoids making users choose between
"simple"
and "complex" versions of a framework.
Kiln - the principle components
Given the needs we described above, Apache Cocoon[4] is a natural choice to sit at the heart of Kiln because the Cocoon
sitemap+pipeline system is very flexible and powerful. At the basic level it is easy
to create
default paths and behaviours which are available to users after a few simple steps,
thus
meeting needs (1) and (2). Then, if desired, we can set up processing sequences of
increasing
complexity and/or granularity which supplement the defaults rather than replacing
them -
thereby satisfying need (3).
The Solr search platform[5] is a good complement to Cocoon. At the basic level we can have simple indexing
pipeline to provide a free text search facility (see next section, below). As our
needs become
more complex - when, for example, we might want to incorporate into the index data
from
non-primary sources such as authority files, bibliographies, and so on - we can use
Cocoon's
aggregation mechanism to bring the disparate sources together and channel them into
a single
indexing transformation. By using internal Cocoon URLs we can pre-process some of
the
secondary sources and channel the output into the aggregation. And because Solr has
numerous
faceting features built-in we can easily add faceted browsing functionality to the
resource;
indeed this step is so straightforward that even a site admin with relatively modest
technical
knowledge can implement it.
Earlier versions of Kiln - then named xMod - used XML databases such as eXist for
storing
and indexing structured textual data and queried the databases using XQuery requests.
Perfomance issues with XML databases led to our adopting Solr and since then we have
had no
compelling use case for XQuery so it is not included in the 'off-the-shelf' Kiln
package.
Kiln comes bundled with the Jetty web application server[6] thus allowing Kiln to be a completely standalone application (beyond the user
having the Java language installed on their machine). For a larger-scale production
environment it is also easy to install Kiln as a WAR in, for example, an Apache
Tomcat[7] setup.
The last main component of Kiln is the Sesame RDF framework[8], about our use of which we say more below.
One small convenience which this component set offers is that the parts that a user
with
limited technical knowledge might want to tinker with - ie. Cocoon sitemaps; Solr
schema; XSLT
stylesheets - are all in XML and the user is at least likely to be familiar and comfortable
with its syntax and rules (assuming they are also responsible for the TEI content
files).
Kiln - as the user sees it
After dowloading it or cloning it from GitHub, the user can start Kiln from the command
line with ./build.sh (there is a .bat version for Windows) to use the built-in Jetty
server or
alternatively can associate it with an existing Tomcat server. The default port is
9999 and on
going there the user will see the default home page (Figure 2). Obviously this is
intended to
serve only as a place holder until the user 'finds their feet' and feels confident
enough to
begin shaping the site themselves. To that end the page offers suggestions about next
steps
and has a link to the online documentation which includes a tutorial that walks users
through
initial setup and common tasks.
Users can see a barebones display of their XML texts simply by adding them under ROOT/
as
content/xml/tei/*.xml. The 'Texts' link which is already present in the navigation
bar brings
up an index list of files available, showing for each file some simple metadata extracted
as
part of the pipeline processing for the 'Texts' URL. This is default behaviour so
the index
list is always current without the user having to restart the server.
The most common need users have after being able to view their texts is for search
capability. To enable this users go to the Admin page where a button lets the user
run a Solr
indexing process (Figure 3); when complete the user can perform simple text searches
over the
XML files.
Beyond this point a user wanting more advanced features such as faceted browsing will
have
to start editing application files. While the documentation provides guidance and
the steps
are not particularly complex, we do expect the user here to be at least comfortable
with XML
configuration files and XSLT stylesheets.
Kiln's templating system
Where TEI-encoded XML files constitute the main source content, an XSLT transformation
remains the crucial gateway through which content must pass as it is fetched from
the back end
to be displayed on the front end. The approach we adopt to this part of the site workings
is
guided by the needs outline earlier. Ideally:
It should be clear to non-technical users how displays are assembled.
There are commonly required types of displays, so these should be 'pre-assembled'
and offered by default.
We also want to be able to adapt/add to/replace defaults to provide project-specific
displays.
In addition to the desiderata just listed, we know that very often the person
with the skills to write templates that find and handle parts of the source XML is
not the
person with the skills to organise the output into a functional, ergonomic, and aesthetically
pleasing display - so we want our approach to allow for that. As far as is possible
the XSLT
specialist and the UI/UX specialist should be able to do their respective work concurrently
and independently. Kiln handles these concerns by using a distinctive XSLT-based templating
system. To show how this system works we'll follow a request for an XML source file
to display
as an HTML page.
A request for texts/**.html goes to ROOT/sitemaps/main.xmap, is matched by a
template.
That template:
Firstly, creates an aggregate, which includes this: <map:part
label="tei" src="cocoon://internal/tei/preprocess/{1}.xml" />
That preprocess call runs the XML through two stylesheets under
ROOT/kiln/stylesheets/tei with the aim of
identifying some known potentially troublesome features in the source XML
and 'preparing the ground' for the final display stylesheet to deal with
them.
One aggregates the content of div elements that are linked via
"next" and "prev" attributes. Supposing an input markup structure like
so:
<body> <div xml:id="div_1" next="#div_2">
<p>content of div 1</p> </div> <div
xml:id="incidental"> <p>intervening unwanted
content</p> </div> <div xml:id="div_2"
prev="#div_1"> <p>content of div 2 that continues div
1</p> </div> </body>
the output structure would be:
<body> <div xml:id="div_1" next="#div_2">
<p>content of div 1</p> <anchor xml:id="div_2"/>
<p>content of div 2</p> </div> <div
xml:id="incidental"> <p>intervening unwanted
content</p> </div> </body>
The other moves pagebreak markers that occur in certain structural
contexts into a different structural context (to stop what should be a
single block display being broken up); and adds some kiln-namespaced attributes:
To block level elements saying whether or not they contain
only inline material.
To link elements saying whether or not they are nested
inside another link element.
These attributes allow for allocation of CSS class
markers that will help adjust the display formatting according to
context.
Secondly, runs a transform on the aggregate with this call:
<map:transform src="cocoon://_internal/template/tei.xsl" />.
The important thing here is that tei.xsl does not exist as an actual
stylesheet. Instead, the cocoon URL pattern is matched in /ROOT/kiln/sitemaps/main.xmap as
"_internal/template/**.xsl" by a template which:
Looks for a template XML file which matches the wildcard value (in this
case it would be "tei", so it looks for tei.xml).
On the template XML file it runs: <map:transform
type="xinclude"/> to grab anything referenced with an xinclude.
So for example it acts on <xi:include href="base.xml"/>
to bring in a template file which sets up the overall default HTML page
framework.
Then on the template XML file it runs: <map:transform
src="../stylesheets/template/inherit-template.xsl" />;
inherit-template.xsl is an actual stylesheet which creates what is
effectively a virtual stylesheet as its output - and that output functions
as 'tei.xsl', applies the templates defined within itself, and thereby
completes the transformation call that originated with
<map:transform
src="cocoon://_internal/template/tei.xsl"/>.
Finally the output from 'tei.xsl' is serialized as HTML by a
<map:serialize/> instruction. (Note that the default type of
serializer is set to be HTML in sitemap.xmap, so no @type is specified.)
The processing sequence outlined above allows a designer to shape the structure of
output
web pages by putting HTML directly into the template XML files. They don't need to
know
anything about writing XSLT templates because they never need to edit an XSLT stylesheet.
An
inheritance system based on named blocks means all parts of a page can be customized
and at
different levels of granularity. Each template file has as its first element
<kiln:parent>, with an XInclude child that brings in the base.xml
template. This template is an hierarchical structure of named <kiln:block>
elements which by itself supplies all the necessary elements of a web page. The idea
is that
the calling template (tei.xml in our example) declares named <kiln:block>s
each of which overrides part-or-all of the equivalent <kiln:block> in
base.xml. If the named <kiln:block> in the calling template has a
<kiln:super>as its first child element, that imports all the content and
functionality of the corresponding named block in the parent template. The user can
then add
elements according to what they wish to override from the parent. With this templating
mechanism overriding can occur from a very granular level all the way up to the top-level
<kiln:block name="html">. This means the user can easily create a page
that looks different in almost every way from their other pages but that is still
a regular
page as far as the framework is concerned.
Kiln introspection
Another distinctive feature of Kiln is the ability to see the back end workings via
the
front end. Browser tools such as Firebug can give a lot of information about the current
HTML
page but do not usually reveal how the page got that way. Non-technical
site owners normally have only a limited grasp of the back-end workings, and if the
site is
complex with multiple sitemap/pipeline files in play then even developers can have
a tedious
time identifying templates and stylesheets responsible for producing a particular
page. As a
development and debugging aid Kiln allows users to view relevant aspects of the processing
mechanism via three different access routes:
Match for URL - in a search text field the user
specifies a root-relative URL from the site - eg. text/myfile.html - and the search
returns the associated sitemap template
Match by ID - the user is given a list of sitemap
template identifier strings - eg. "local-tei-display-html" - and clicks on the name
to
see the template
Templates by file name - the user is given a list
of XML template files - eg. tei.xml - and can click on a link to see (via view
source) the relevant XSLT stylesheet
RDF / Linked Open Data
Kiln includes the Sesame [9] framework for processing and handling RDF data. The framework is composed of two
web applications, a server web application (openrdf-sesame) to store, parse and infer
over RDF
data, and a client web application (openrdf-workbench) to make queries over the data.
Sesame is built into Kiln via a set of Cocoon pipelines. By default there are pipelines
for generating and adding RDF to the Sesame store, and pipelines for making SPARQL
queries, in
the sitemap file ROOT/sitemaps/rdf.xmap, which makes use of the basic operations
- to add, remove, and query the triple store - defined in the internal sitemap
kiln/sitemaps/sesame.xmap. Because the RDF requirements are very distinct
across different projects, the default XSLT for adding content to the triple store
is
basically a placeholder meant to be extended and customised as required. The Kiln
tutorial[10] includes a sample XSLT for converting TEI documents into RDF statements, and that
can be used as a guide for further customisation work.
The main reasons to include a RDF framework with Kiln were to promote the publishing
of
linked data and also to increase the interoperability between the projects implemented
with
Kiln.
Future for Kiln
When CCH staff first began to shape xMod, they did so to meet specific needs which
they
felt were not being met by any free, open-source XML-to-webpage application available
at the
time. For the technically competent willing to 'get their hands dirty' on the server
side
there was Apache's AxKit[11], but this mod_perl module was not designed with convenience for non-specialists in
mind (see, for example, Eric Morgan's account of trying to use it in Morgan 2005.)
The most similar framework in the digital humanities field - the California Digital
Library's
eXtensible Text Framework (XTF)[12] was equally in its infancy at the time. Other existing frameworks such as TUSTEP
("TUebingen System of Text Processing Programs")[13] were more specialist in focus - designed to help scholarly editors produce
editions - and without the same concern for enabling a website from XML-encoded source
files.
Today the landscape is somewhat different, with more applications available that are
designed
and documented with the non-technical user in mind.[14]
Most of the development work that produced Kiln in its current form was done via grant
funding that ended two years ago. However, KDL still allocates time to maintain and
further
develop Kiln. Plans for the future include the possibility of a replacement for Cocoon,
due to
the lack of active development in Cocoon and also the direction that Cocoon is currently
heading - its build process has become a lot more complicated with the later versions
and it
would not be possible to package a default version to be used in Kiln without off-loading
technical work to the users. One possible future step would involve using XProc to
handle the
XML pipeline operations currently performed by Cocoon, but this would require an extensive
codebase change for which (in the immediate future at least) KDL does not have resources
to spare.[15] Another thing we would
like to explore is adding modular extensions that could be easily 'made live' by the
user and
that would orient the functionality towards a particular content type, for example
source
files encoded according to the EpiDoc[16] guidelines. Tighter integration out-of-the-box with CMS frameworks is also a
desideratum, as most project websites involve information pages, image galleries,
etc. that
are often more conveniently handled by such frameworks. Whatever is to come in the
future,
Kiln remains the most important tool that KDL uses to build XML-based online resources.
[2] The name change reflected a major rewrite of the code. 'Kiln" was chosen to call to
mind a container into which 'raw' source materials go and from which, after processing,
'finished'
materials emerge.
[3] A list of some of these projects is available at http://kiln.readthedocs.org/en/latest/projects.html. Note that due to the nature
of humanities grant funding a majority of these project sites remain as they were
at point
of launch. KDL can undertake to keep sites running for an agreed period (usually five
years from the end of thew funding period) and fix bugs if caused by system
updates/upgrades, but upgrading a project site to use a later version of Kiln usually
depends upon the project partners acquiring extra funding.
Morgan, Eric Leese. "Creating and managing
XML with open-source software" Library Hi Tech, Vol. 23 Iss: 4, pp.526 - 540.
doi:https://doi.org/10.1108/07378830510636328