How to cite this paper
Viglianti, Raffaele. “Encoding document and text in the Shelley-Godwin Archive.” Presented at Symposium on Cultural Heritage Markup, Washington, DC, August 10, 2015. In Proceedings of the Symposium on Cultural Heritage Markup. Balisage Series on Markup Technologies, vol. 16 (2015). https://doi.org/10.4242/BalisageVol16.Viglianti01.
Symposium on Cultural Heritage Markup
August 10, 2015
Balisage Paper: Encoding document and text in the Shelley-Godwin Archive
Raffaele Viglianti
Research Programmer at the Maryland Institute for Technology in the
Humanities
Copyright © 2015 by the author. Used with permission.
Abstract
The Shelley-Godwin Archive uses TEI to encode
manuscript text from two perspectives: one focused on the document and one focused
on the text. This short presentation addresses issues of adopting stand-off markup
as a technique for the project's encoding goals.
The Shelley-Godwin Archive (S-GA) is a project involving
the Maryland Institute for Technology in the Humanities (MITH) and the Bodleian, British,
Huntington, Houghton, and New York Public Libraries that began in 2011 and has now
completed
two phases of work. In October 2013, the project released a beta version of its online
reading environment containing high-resolution images and accompanying TEI-encoded
transcriptions of the three surviving manuscript notebooks containing Mary Shelley’s
drafts
of Frankenstein, or, The Modern Prometheus. In Summer 2015,
the project released a faster and more stable online reading environment together
with
images and transcriptions of three manuscript notebooks in Percy Shelley’s hand containing,
amongst other works, the fair copy of the dramatic poem Prometheus
Unbound.
With the project’s latest phase completed, we are now planning for future research,
and
development. This short presentation will describe the next phase of work on our markup
scheme, which will embrace stand-off markup.
The design of the TEI markup scheme for S-GA coincided with the addition of the new
"document-focused" elements to the TEI in the release of P5 version 2.0.1. This encoding
approach switches focus from text as communicative act or linguistic content to text
as sign
on some physical support. This approach enables, for example, rigorous description
of often
complicated sets of additions, deletions, and emendations. The S-GA scheme primarily
follows
this approach; however the archive is also meant to include clear "reading texts"
for those
readers who are primarily interested in the final state of each manuscript, which
requires
representing two different hierarchies, one documentary and one textual.
Our solution prioritizes the documentary hierarchy over the textual one and relies
on an
automatic process to convert the “document-focused” encoding into a “text-focused”
one.
While some transformations can be handled heuristically, others are explicitly encoded
with
the general purpose <milestone> and <anchor> element pairs, for example to
indicate the start and end of a paragraph, or a verse of poetry.
This has proved effective for the Frankenstein
manuscripts. When working on Prometheus Unbound, however,
we found that the approach does not scale well. Shelley’s manuscripts complicate matters
in
two ways: there are additional dramatic and poetic textual structures, and there are
fragments of other works interspersed with the fair copy of Prometheus
Unbound. Authoring a valid encoding and processing it for publication have
become a vexed process.
Our next phase of development will focus on separating the hierarchies more neatly
by
creating parallel documents for the documentary and textual hierarchies. In order
to avoid
redundancy, we do not intend to encode the text twice, as it is done, for example
in the
Digitale Faustedition project, one of the very first
projects that successfully adopted the new document-focused vocabulary of the TEI.
Rather,
the S-GA primary encoding will remain the documentary one, while a parallel document
will
encode textual structures and use stand-off pointers to pull character data from the
documentary encoding. Instead of over-loaded milestone elements, regular TEI elements
will
encode paragraphs, poetic structures, etc., thus simplifying, we argue, the authoring
and
processing of the encoding. By targeting character data in an XML document is now
seemingly
achievable, also given the recent revision of the TEI Pointer scheme proposed and
implemented by Hugh Cayless (presented at the TEI Members Meeting in 2014).
This short presentation hopes to stir conversation about two main topics. 1) authorship
of
stand-off markup, or how to simplify the creation of pointers to character data in
XML
documents. We envisage a web-based tool to visually select target elements and strings
to
build precise pointer expressions (e.g. in XPointer). We created an early prototype
of this
called the coreBuilder, which is currently used in another project to create a stand-off critical
apparatus. 2) updating pointers to changeable sources. Given our goal of making S-GA
participatory, we expect the primary documentary TEI encoding to be changeable; what
would
be necessary to update the pointers in the secondary textual encoding? At this point,
we
speculate that automatically monitoring a versioning system may provide the information
necessary to update the pointers. This is also a necessary step towards our goal of
enabling
user participation and curation on the archive.