How to cite this paper
Beshero-Bondar, Elisa Eileen. “Rebuilding a Digital Frankenstein by 2018: Reflections toward a Theory of Losses and
Gains in Up-Translation.” Presented at Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions, Washington, DC, July 31, 2017. In Proceedings of Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions. Balisage Series on Markup Technologies, vol. 20 (2017). https://doi.org/10.4242/BalisageVol20.Beshero-Bondar01.
Up-Translation and Up-Transformation: Tasks, Challenges, and Solutions
July 31, 2017
Balisage Paper: Rebuilding a Digital Frankenstein by 2018
Reflections toward a Theory of Losses and Gains in Up-Translation
Elisa Eileen Beshero-Bondar
A scholar of British Romanticism and hybrid literary genres, Dr. Beshero-Bondar earned
a PhD in English Literature from Penn State University in 2003, and afterwards took
a post teaching literature at the University of Pittsburgh at Greensburg. Her publications
include a book about women Romantic epoists, titled Women, Epic, and Transition in British Romanticism, published by the University of Delaware Press in 2011, and articles in Literature Compass, ELH (English Literary History), Genre, and Philological Quarterly on the poetry of Robert Southey, Mary Russell Mitford, and Lord Byron in context
with 18th- and 19th-century views of revolution, world empires, natural sciences,
and theater productions.
Since earning tenure in 2011, she has applied herself adventurously to the building
of digital editions and digital research projects, such as studying studying associations
among physical and mythical locations in epic poetry, and teaching undergraduates
the XML family of languages in the context of designing research projects. She is
the director of the Digital Mitford project, whose two-fold mission is:
-
to produce the first comprehensive scholarly edition of the works and letters of Mary
Russell Mitford, and
-
to share knowledge of TEI XML and other related humanities computing practices with
all serious scholars interested in contributing to the project.
In keeping with the second goal, she hosts an annual coding school at Pitt-Greensburg
each May or June to teach TEI, regular expression up-conversion of documents, XPath,
schema design, and XSLT or XQuery, based on the interests and background of registrants.
Other digital projects in which she is enmeshed include
a Bicentennial Frankenstein project to up-convert the 1990s electronic Frankenstein edition by 2018, and
Amadis In Translation, which applies XML to quantify, categorize, and study alterations made by Robert
Southey in translating an early modern Spanish text of Amadis de Gaule. She was elected
to serve on
the TEI Technical Council from 2016-2017.
Copyright © 2017 by the author. Used with permission.
Abstract
Digital editions were young in the 1990s, and the expansive possibilities of hypertext
in that decade sharply distinguish early digital editions from the productions of
our moment. The accessibility and simplicity of early HTML code made for innovative
experiments with the size of a page
and the way one might handle displays of variants, before diffing
tools like the Versioning Machine and Juxta came to define how we usually imagine
the digital comparison of texts.
This paper investigates the serious problems and vexing potentialities of up-translation
when standards change, concentrating on work underway on a Bicentennial Frankenstein
project. Our project is to produce a new, freshly collated digital edition in TEI
based on the Frankenstein texts digitized by Romantic Circles, and incorporating a little-known publication
of 1823 together with 1818 and 1831 versions currently represented. Readers in the
past century are likely to have encountered either the 1818 or the 1831 edition but
not the 1823, and we think that folding this text into our collation may help us to
understand more about when and about how gradually some of the major alterations in
the 1831 text (for example, to Victor Frankenstein's family members and the compression
and reduction of a chapter in part I) occurred. The three print editions will be compared
in parallel, and we will incorporate pointers to the Shelley-Godwin Archive’s edition
of the MS Notebooks.
To prepare the collation we returned to the simplest original form of the current
edition, the Pennsylvania Electronic Edition prepared by Stuart Curran and Jack Lynch
in the 1990s. Exploring that edition exposes the ambitious intellectual scope of early
web editions and raises important questions about how we built editions then vs. now.
We do not build a new edition to replace the earlier work, which yet lives
and is available on the web. But our work is a fresh start, not a seamless integration.
This particular project’s encounter with an impressive early hypertext edition raises
more general questions worthy of reflection towards theorizing the up-transformation
process:
-
How do we understand the relationships among generations of digital editions?
-
What aspects of the old hypertext editions (or editions in formats not consistent
with our own) transcend or exceed the structures we currently consider sustainable?
What perspective might a thorough review of the first still extant hypertext editions
contribute to our scholarly editing practice now?
Table of Contents
- Overview of the Bicentennial Frankenstein Project
- The Dream of 90s Hypertext: The Pennsylvania Electronic Edition
- Stages in Up-Translating the Pennsylvania Electronic Edition
-
- Decisions for preserving and eliminating markup in plain text versions
- Stages for processing the altered PA EE HTML to produce plain text editions
- XSLT for Translation of
Up-Translated
HTML to Pseudo-Marked Text
- Collation Discoveries: Flattening and Chunking Again
- Reflections toward a theory of up-translation
Overview of the Bicentennial Frankenstein Project
In the fall of 2016, a team of researchers from Carnegie Mellon University, the University
of Pittsburgh, and Maryland Institute for Technology in the Humanities (MITH) joined
together with a goal to prepare an updated and improved digital edition of Mary Shelley's
Frankenstein, conformant with current TEI P5 standard. In October 2016, Elisa Beshero-Bondar met
with Raffaele Viglianti and Rikk Mulligan to discuss a strategy for updating the Frankenstein edition on Romantic Circles. Viglianti and Beshero-Bondar subsequently met over Skype with Neil Fraistat and
Dave Rettenmaier and agreed that by May of 2018 we would prepare a new edition to
update the one currently published on Romantic Circles, MITH’s refereed website on scholarly editions from the British Romantic era. Romantic
Circles. Fraistat indicated that currently Frankenstein is by far the most clicked-on
text in the Romantic Circles archive, so this is a site of high visibility. We initially
agreed to improve the existing edition by collating the 1818 and 1831 editions, which
are currently saved in two separate files and compared by viewing in Juxta Commons.
We aimed to improve the precision of the collation by processing with CollateX software, which automates the location of alignments and deltas in multiple versions
of a document. Processing plain text versions of the separate 1818 and 1831 edition
files with Collatex would provide the basis for a thorough overhaul of the TEI encoding
in the Romantic Circle's edition, because it would output a single document holding
variations tagged according to the TEI P5 critical apparatus markup. The document
output by Collatex is plain text that holds critical apparatus tagging, and represents
a first phase of up-translation to a new and improved TEI document. The next stage
would be to apply auto-tagging with regular expressions to reconstruct the body
of the edition in TEI as a single document from which multiple editions can be generated
for reading and for genetic studying of textual variation over time.
Though our initial goal was to revise and improve the existing edition on Romantic
Circles, we realized that we had an opportunity to produce a much richer edition,
if we could incorporate a rarely studied 1823 edition of the novel into our collation.
Incorporating this edition requires OCR of a Google Books scanned edition and careful
correction to prepare a text file formatted consistently with the files representing
the 1818 and 1831 editions. Finally, we would include pointers into the Shelley-Godwin Archive edition of the manuscript notebook drafts of the novel.
As we began analyzing the XML code underlying the Romantic Circles edition, we discovered
inconsistencies with the semantic standard of TEI. List elements were used to hold
paragraphs, for example, and we learned that the TEI had been generated from an XSLT
process of translation from a previous digital edition, the Pennsylvania Electronic
hypertext edition produced in the 1990s. TEI elements appeared to have been selected
for their correlation to HTML presentational elements, but we were mystified about
the decisions to organize chapters of a novel thus:
<p rend="emph">Chapter 18</p>
<p> </p>
<list type="simple">
<item>
<p>DAY after day, week after week, passed away on my
return to Geneva; and I
could not collect the courage to recommence my
work. I feared the vengeance of the disappointed
fiend,
yet I was unable to overcome my repugnance to the
task which was enjoined me. I found that I could not
compose a female without again devoting several
months to profound study and laborious disquisition.
I had heard of some discoveries having been made by
an English
philosopher, the knowledge of which was material
to my success, and I sometimes thought of obtaining
my father's consent to visit England for this
purpose; but I clung to every pretence of delay, and
shrunk from taking the first step in an undertaking
whose immediate necessity began to appear less
absolute to me. A change indeed had taken place in
me: my health, which had hitherto declined, was now
much restored; and my spirits, when unchecked by the
memory of my unhappy promise, rose proportionably. My
father saw this change with pleasure, and he turned
his thoughts towards the best method of eradicating
the remains of my melancholy, which every now and
then would return by fits, and with a devouring
blackness overcast the approaching sunshine. At these
moments I took refuge in the most perfect
solitude. I
passed whole days on the lake alone in a little
boat, watching the clouds, and listening to the
rippling of the waves, silent and listless. But the
fresh air and bright sun seldom failed to restore me
to some degree of composure; and, on my return, I met
the salutations of my friends with a readier smile
and a more cheerful heart.</p>
</item>
<item>
<p>It was after my return from one of these rambles
that my father, calling me aside, thus addressed
me:—</p>
</item>
<item> I remembered also the
necessity imposed upon me of either journeying to
England, or entering into a long correspondence with
those philosophers of that country, whose knowledge and
discoveries were of indispensable use to me in my
present undertaking. The latter method of obtaining the
desired intelligence was dilatory and unsatisfactory:
besides, I had an insurmountable aversion to the idea
of engaging myself in my loathsome task in my father's
house, while in habits of familiar intercourse with
those I loved. I knew that a thousand fearful accidents
might occur, the slightest of which would disclose a
tale to thrill all connected with me with horror. I was
aware also that I should often lose all self-command,
all capacity of hiding the harrowing sensations that
would possess me during the progress of my unearthly
occupation. I must absent myself from all I loved while
thus employed. Once commenced, it would quickly be
achieved, and I might be restored to my family in peace
and happiness. My promise fulfilled, the monster would
depart for ever. Or (so my fond fancy imaged) some
accident might meanwhile occur to destroy him, and put
an end to my
slavery for ever.</item>
Occasionally multiple
p
elements representating paragraphs in the novel might appear inside a list item,
and occasionally the p element was missing within the item. More strange was the markup
of poetry:
<lg>
<l>
"The
sounding cataract<lb/>
Haunted him like a passion: the tall rock,<lb/>
The mountain, and the deep and gloomy wood,<lb/>
Their colours and their forms, were then to
him<lb/>
An appetite; a feeling, and a love,<lb/>
That had no need of a remoter charm,<lb/>
By thought supplied, or any interest<lb/>
Unborrow'd from the eye"*
</l>
</lg>
On inquiry, we learned something of the history of the TEI preparation of the file,
that the problem was posed by a need to convert a web 1.0 hypertext edition into TEI
as a curatorial decision, designed to be readily published in the numbered segments
we see visible on the Romantic Circles site. Numbering must have been accomplished
by transformation from these hybrid
faux-TEI
list elements into HTML.
In October 2016 we experimented with text extracted directly from the Romantic Circles
1818 and 1831 editions, and processed these with CollateX. The experience showed us
worrisome problems at the level of the text, angle brackets poised in the text and
other anomalies such as misnumbering and missing italics. We decided that perhaps
the transformation that the digital Frankenstein had undergone in 2009 for Romantic
Circles republication was not kind to the text and that to work with a reliable foundation
for our bicentennial edition, we should return to its origin in the early 1990s hypertext
Pennsylvania Electronic Edition. We also determined that we had better proof check
the texts of all documents against originals. What we may have considered an up-transformation
to meet new web standards in 2009 appears to us on close inspection to have potentially
damaged a cleaner earlier encoding.
The Dream of 90s Hypertext: The Pennsylvania Electronic Edition
The Pennyslvania Electronic Edition (PAEE) represents a much more extensive scholarly
effort than what is rendered of it on its supposedly updated version on Romantic Circles.
While neither edition renders the 1823 text, whose publication was supervised by William
Godwin, the PAEE prepared a table of hand-collected variants indicating how the 1823
edition differs from the 1818 and 1831. Meanwhile, what appears of the better-known
early (1818) and late (1831) publications is rendered side-by-side in old-fashioned
(long since deprecated) HTML frames built from 238 separate HTML files (each usually
representing a few paragraphs) of the 1818 and 249 files of the longer 1831 novel.
The particulation of files represents an editorial method of juxtaposition of tiny
pieces, in keeping with the hypercard format of early hypertext books. The early editors
(Stuart Curran and Jack Lynch of the University of Pennsyvlania), took care to produce
a highly legible, color-coded collation in hypertext, and while dates of preparation
or publication are not clear in the files, a short web publication about the edition by Curran from November 1994 in Penn Magazine indicates the production was well underway at that moment, with plans for release
of a CD-ROM edition and non-profit production on the web. The goal appears to have
been to reach high school and college students, and to place before them (in Curran’s
words) a convenient repository of otherwise widely scattered scholarly and critical materials
through which they are given an opportunities to browse, to read in a non-linear
fashion and discover for themselves a rich store of contexts and, effectively, a way
to read the 1818 and 1831 editions together rather than just one or the other. Curran
described his aim as an assault
on print-bound habits of reading:
Multiply these possibilities by the large number of ancillary texts, and you have
a sense of what an assault this technology portends on a normative, atomistic conception
of the act of reading. One doesn’t, it is true, exactly curl up with a good book here.
Rather, one is faced with dozens of possibilities at once, literally replicating the
ways intertextual allusions play against and within any literary work of dimension
and intellectual ambition.
Curran clearly had an idea that his edition was breaking new ground:
We are hoping to create here for the first time an electronic variorum
, and perhaps he did not at that moment realize that the work he and Jack Lynch were
putting to the project to encode variants with HTML was being replicated by the Text
Encoding Initiative, which in the same year (1994), released a draft of its P3 Guidelines,
a draft which contained its first modeling of apparatus criticus markup with the elements
app and rdg and their corresponding attributes. The first half of the 1990s saw intense
drafting for the TEI in defining a standard tagset in SGML, and the irony of Curran’s
venture with HTML seems to have placed his effort in a parallel universe, perhaps
due to its emphasis on immediate sharing and distribution. Duplicating the efforts
of the TEI in SGML, and bound by the semantic limitations of another specific instantiation
of SGML in HTML 1.0 to represent descriptively the variations between two texts, the
PAEE Frankenstein editors pioneered their own system of pseudomarkup that appears
in a third frame running beneath the 1818 and 1831 windows, as visible on
a representative page. The pseudomarkup applies a system of square brackets and angle brackets to render
variants inline together in the document.
What is immediately evident is the precision and care taken in the first edition,
even in its apparent lack of awareness of the TEI as it was developing an alternate
SGML form in the mid 1990s. Also evident, ironically, is the net loss of information
in the translation of the PAEE into TEI for Romantic Circles in 2009, where the difficulty
of negotiating between HTML 1.0 and the XML standard of TEI appears to have been strained
in favor of expressing the presentation view of the texts while removing the more
adventurous aspects of pseudomarkup. The editorial annotations were transferred from
the HTML to the XML syntax, no effort was made in the conversion of 2009 to apply
the apparatus criticus of TEI to curate the handiwork in pseudomarkup of the original
PAEE editors.
Looking back on origins of our electronic Frankenstein monster, we see something of
a history of strained relations between chunky
hypertext books and creamy
TEI which in those days applied SGML in favor of the semantics of document hierarchies
to show interrelationships. The PAEE attempted an apparatus criticus in preparing tiny hypercard chunks
in HTML 1.0 frames as an interface that prioritized a study of variation, and even
coded that variation in markup of their own that used angle brackets, square brackets,
boldface, and italics to provide through the web browser interface a synoptic view
of a variorum edition. Fortunately the edition is still served from its original University
of Pennsyvlania URL, apparently unchanged over the years, but in a time of generational
transference marked by rapid aging of electronic texts prepared in non-standard ways,
the old edition’s fragility and ambition are worth contemplating now. In 2017-2018,
we preparers of a new edition (just like our predecessors) are standing on the proverbial
shoulders of giants, and if we are single-minded about preparing the documents in
a format we can readily process and publish, we stand potentially to lose the scholarship
and the impressive mass of paratext surrounding that first edition. We underestimate
these early editions to our peril (or to our potential cultural impoverishment). Indeed,
the PAEE contains hundreds of paratext documents, including an impressive corpus of
other literary texts in its Works Included in this Edition
bordering on and relevant to Frankenstein, as well an impressive array of Contexts
pages covering religious, mythical, geographic, scientific topics. Whether we can help
preserve the vision of the first editors in interlinking contextual materials with
their critical variorum edition, or whether we should try to do so, are open questions
at this stage in our work on the Bicentennial Frankenstein project. Our own team,
itself pressed for time and finding a need to invent a new expression of Frankenstein’s contexts, is not likely to curate the paratextual assemblage of documents in the
PAEE, as we have prioritized for the bicentennial moment to reconstruct the electronic
variorum and add more to it than available in the original.
Stages in Up-Translating the Pennsylvania Electronic Edition
In December 2016, I harvested the nearly 500 individual HTML files of the PAEE representing
the 1818 and 1831 editions and began a painstaking process of up-translation, with
the following major stages in view:
-
converting the hundreds of hypercard documents of the PAEE from HTML 1.0 into two
separate plain text documents representing the 1818 and 1831 editions;
-
up-converting these to a simple form of XML to prepare them for automated collation
with collateX, software designed to compare multiple documents and locate and mark
their points of deviance and output them tagged according to TEI’s critical apparatus
markup;
-
preparing from carefully proofed OCR a digitized 1823 text, formatted in XML for collation
with the 1818 and 1831 editions;
-
processing the three XML documents with collateX to produce an XML text, to be up-converted
into TEI;
-
maintaining the three separate documents to process with an additional set of witnesses
prepared from the Shelley-Godwin Archive edition of the 1816-1818 manuscript notebooks
of Frankenstein.
At the time of this writing in July 2017, we are beginning work on the last two stages
of this. That is, we have successfully prepared a fresh automated collation of the
1818, 1823, and 1831 printed editions of Frankenstein, and we are planning how to:
-
re-process the collation after we prepare a similarly formatted unified file of the
thousands of manuscript notebook files from the Shelley-Godwin Archive
-
build a complete TEI edition from the collation
Beyond these goals of moving from chunks to large unified files for our base texts,
we of course also face the challenge of how to share the edition with readers, not
only scholars of the novel and its time period but also enthusiasts of the Frankenstein
narrative, who may, if we are successful, learn something new about its textual genetics:
how this text transformed over time. That last goal is out of scope of this paper,
as our current project is basically to create a tractable semantically meaningful
TEI markup building anew with a start in the PAEE files.
My concentration for this paper is on a stage of what is completed already, and my
sustained interaction with the PAEE hypertext edition. We first decided to process
a predictable plain text with pseudomarkup to preserve information from the markup,
and we decided that each stage of our up-conversion would produce a distinct and re-usable
edition in its own right, whether in plain text, or a preliminary stage of XML for
collation, or ultimately the collated edition in TEI P5. In this context, the PAEE
files take on multiple afterlives
in plain text and XML forms.
At the time of preparation, we were uncertain whether plain text of simple XML formatting
was best suited for collation processing, and in the course of that processing we
discovered indeed that XML was preferable since the collation software could be programmed
to read and process and ignore particular elements. The pseudomarkup in our plain
text, however, served (and continues potentially to serve) as a viable intermediary
format, trivally easy to up-translate into XML, and co-existing alongside its hierarchially
organized kin.
We began, then, by producing plain text from the old PAEE hypercards. The following
documents my decisions and actions taken on the 1990s files to prepare them as plain
text editions, prior to collation.
Decisions for preserving and eliminating markup in plain text versions
-
Using regex find-and-replace strategies, we prepared altered versions of the PA EE
HTML files to reproduce simpler forms consistent with current XHTML 5 standards.
-
In the PA EE some elements (like <p>
and <br>
) were not given close tags, while others were, making the code difficult to process
with XSLT. Close tags were applied and the files were simplified to carry only the
title page, prefacing material, and text of the novel.
-
The elements holding navigational information in the PA EE were excluded. This is
because the PA EE texts were prepared as 238 and 250 separate HTML files (for the
1818 and 1831 editions) in order to manually align them in small chunks
as a means to compare them visually in HTML frames. Since our edition is uniting
these hundreds of chunks into a single document, we will prepare new navigational
elements at a later stage after we have prepared our new TEI edition and are ready
to produce a new reading view.
-
Renaming files and directories: The PA EE files were stored in three separate directories
for each edition, associated with volumes 1, 2, and 3 of the 1818 edition, and the
1831 files were given names to assist with pairing them with associated chunks of
the 1818 edition inside HTML frames. Since we need to process the files all together
to output a single text of the 1818 and of the 1831 novel, we flattened the hierarchy:
We removed the volume directories and held each edition’s set of 238 and 250 files
respectively in its own directory. The files were renamed carefully to number their
sequence in assembling the text, and to simplify their association with the text’s
structural layers: the opening material, the Walton letters that frame the text at
its beginning and end, and the internal chapters.
-
Eliminating hyperlinked editorial annotations: We decided we must simply represent
the nineteenth-century editions and that we cannot at this time properly curate the
PAEE edition itself, so we did not port the links to editorial annotations coded in
the PA EE. For easing the collation and up-conversion process later, we are preserving
information from the presentation markup of the PA EE texts: its rendering of italics,
square brackets, and centered text.
-
In the PA EE there is no distinction between italics for titles and italics for emphasized
words. Because the asterisk is used to signal footnotes in the text, we use the underscore
(_
) instead to mark off italicized text of any kind.
-
Square brackets ([ ]
) are placed around text marked as small caps. (We have commented out the one instance
in the 1831 PA EE HTML in which square brackets were used to hold a normalized variant
of a word, to suppress that from the output.)
-
Centered text is marked between curly braces: { }
Note: some center tagging, such as in header tags, was lost in the conversion process
and should be restored as we proof the texts.
-
Each unit of PA EE HTML texts marked with a structural element to indicate line break
(<br>
) or paragraph (<p>
) is produced as a unit line in the plain text. Thus, an entire paragraph appears
as a single line. Every unit line is followed by two newline characters.
-
Documentation is generated at the head of the text files inside commented text marked
with hashes (`# `), to indicate the derivation of the documents from the PA EE and
to document the rendering decisions above.
Stages for processing the altered PA EE HTML to produce plain text editions
The following processing stages would certainly have been better accomplished entirely
with XSLT, because after the first step, they rely on a complex series of find and
replace operations that might have been better accomplished and documented if all
handled with XSLT string-processing functions. The basic idea, though, is to prepare
a consistently formatted set of documents that can reliably be compared with collation
software.
The conversion process relies on an XSLT transformation to prepare plain text from
the updated HTML, running it over directories of the hundreds of files. Prepared to
process from a collection organized unambiguously by filename and output a single
file. Filenames were prefaced by a number to process in sequential order The XSLT
is featured in the following section and linked here in our GitHub repository.
Open the output in Text Wrangler (recently superseded by BBEdit) and in oXygen, and
work on the following:
-
In Text Wrangler (or BBEdit), remove line breaks (option in the Text menu). This ensures
that any text preceded by just one newline character is pulled into the preceding
line, which unites the content of each paragraph inside a single line.
-
In oXygen, with regex find and replace, eliminate instances of more than two newline
characters `\n`, but ensure that two newlines appear between each line.
-
Add \n\n
after VOLUME, LETTER, PREFACE, and CHAPTER headings and the Introduction heading
in the 1831 edition. Search for (PREFACE|VOLUME|LETTER|CHAPTER)\s+[IVXLC]+\.*
. Also check and restore newlines in letter headings.
-
In Text Wrangler (or BBEdit), educate
the quotes: This produces curly apostrophes and quotes from the straight quotes of
the PA EE.
-
Regularize white spaces using Find & Replace in oXygen, using the \h
regex to indicate white space inside a line. Replace any instances of \h\h
with a regular space.
-
Convert double hyphens (--) to em dashes (—).
XSLT for Translation of Up-Translated
HTML to Pseudo-Marked Text
Our translation of the PAEE into XML involves a significant intermediary step that
might be considered a destination in its own right: a not-so-plain text file featuring
pseudo-markup that preserves information from the markup. Our XSLT produces a header
of meta-information about the edition we are preparing, with an explanation of symbols.
This edition of the 1818 and 1831 text files might well be a valuable output in its
own right, prior to their collation. Preparing this text format and its pseudomarkup
establishes the basis for our new preparation of a plain text edition for the OCR'd
1823 edition. All three editions must be prepared in the same plain text format to
ensure a precise and accurate machine collation into a single file.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<!--2016-12-28 ebb: Prepared to process from a collection organized unambiguously by filename and output a single file. Filenames were prefaced by a number to process in sequential order.-->
<xsl:strip-space elements="*"/>
<xsl:output method="text" encoding="UTF-8"/>
<!--ebb: Uncomment one of the following lines to process the appropriate edition, either 1818 or 1831.-->
<!--<xsl:variable name="paEdition" select="collection('../frankenTexts_HTML/PA_Electronic_Ed/1818_ed')"/>-->
<xsl:variable name="paEdition" select="collection('../frankenTexts_HTML/PA_Electronic_Ed/1831_ed')"/>
<xsl:template match="/">
<xsl:text>********************************************************************************
# FRANKENSTEIN; OR, THE MODERN PROMETHEUS
## The Pittsburgh Bicentennial Edition
### INTRODUCTORY NOTE ON THE TEXT:
This is a plain text edition of the </xsl:text><xsl:value-of select="($paEdition//head[1]/tokenize(title, ', ')[2])[1]"/> edition of _Frankenstein; or, the Modern Prometheus_ by Mary Shelley <xsl:text>prepared for the Frankenstein Bicentennial project, which commemorates the 200th anniversary of the first published edition of this novel in 1818.
</xsl:text>
<xsl:text>Frankenstein; or, the Modern Prometheus: Pittsburgh Bicentennial Edition is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. <!--ebb: Check with project team. Do we want this to be a free culture license, meaning we permit commercial uses of this work? If so, change this to read:
Frankenstein; or, the Modern Prometheus: Pittsburgh Bicentennial Edition is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
-->
</xsl:text>
<xsl:text>Date this text was produced: </xsl:text><xsl:value-of select="current-dateTime()"/><xsl:text>.
</xsl:text>
<xsl:text>This edition is part of the Pittsburgh research team's contribution to the Bicentennial Frankenstein Project, and is prepared by Elisa Beshero-Bondar of the University of Pittsburgh at Greensburg with assistance from Rikk Mulligan of Carnegie Mellon University. We are grateful for consultation from Wendell Piez, David J. Birnbaum, and Raffaele Viglianti, as well as Neil Fraistat and Dave Rettenmaier. This edition's stages of development are stored and documented in the Pittsburgh_Frankenstein GitHub repository: https://github.com/ebeshero/Pittsburgh_Frankenstein/ .
We have produced this plain text edition for two purposes:
1) To prepare for automated collation of the 1818, 1823, and 1831 editions of _Frankenstein_ using CollateX, in order to generate a TEI XML document that stores the variations of these texts.
2) To provide a reliable digital base text of each edition tractable for future projects.
</xsl:text>
<xsl:text>This plain text edition is one of two, representing the 1818 and 1831 editions of the novel. This pair of editions is based on the Pennsylvania Electronic Edition of _Frankenstein; or, the Modern Prometheus_ by Mary Shelley, edited by Stuart Curran and assisted by Jack Lynch, located at http://knarf.english.upenn.edu/ and hereafter referred to as PA EE. Elisa Beshero-Bondar and Rikk Mulligan *are correcting* these texts against photo facsimiles of the 1818 and 1831 texts.
* We will alter the previous sentence in this header when this phase of proof-checking is completed.
</xsl:text>
<xsl:text>Our plain text edition preserves the rendering of italics, square brackets, and centered text from the PA EE HTML texts.
* In the PA EE there is no distinction between italics for titles and italics for emphasized words. Because the asterisk is used to signal footnotes in the text, we use the underscore (`_`) instead to mark off italicized text of any kind.
* Square brackets (`[ ]`) are placed around text marked as small caps. (We have commented out the one instance in the 1831 PA EE HTML in which square brackets were used to hold a normalized variant of a word, to suppress that from the output.)
* Centered text is marked between percent symbols: `% %`.
* Each unit of PA EE HTML texts marked with a structural element to indicate line break (`<br>`) or paragraph (`<p>`) is produced as a unit line in the plain text. Thus, an entire paragraph appears as a single line. Every unit line is followed by two newline characters.
</xsl:text>
<xsl:text>Note for later processing: In the PA EE of this text, there are </xsl:text><xsl:value-of select="count(distinct-values($paEdition//body//a/@href))"/> encoded links, each pointing to an editorial annotation.
<xsl:text>********************************************************************************</xsl:text>
<xsl:apply-templates select="$paEdition//body"/>
</xsl:template>
<xsl:template match="br">
<xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="p">
<xsl:apply-templates/><xsl:text>
</xsl:text>
</xsl:template>
<!-- <xsl:template match="text()">
<xsl:apply-templates select="normalize-space(.)"/>
2016-12-28 ebb: normalize-space() causes problems: too much tightening up of the output so words are run together, also when applied at <p> template, child nodes aren't processed.
Using regex and Text-Wrangler on the output file to remove its excess lines.
</xsl:template>-->
<xsl:template match="i">
<xsl:text>_</xsl:text><xsl:apply-templates/><xsl:text>_</xsl:text>
</xsl:template>
<xsl:template match="small">
<xsl:text>[</xsl:text><xsl:apply-templates/><xsl:text>]</xsl:text>
</xsl:template>
<xsl:template match="center">
<xsl:text>%</xsl:text><xsl:apply-templates/><xsl:text>%</xsl:text>
</xsl:template>
</xsl:stylesheet>
Collation Discoveries: Flattening and Chunking Again
At the time of this writing, two members of our team completed proof-checking the
plain text files generated by the above stages of processing against photofacsimiles
of the nineteenth-century editions. We have undone the normalized spellings of the
PAEE to restore the original spellings of the nineteenth-century texts, and have corrected
transcription errors, improving what we can. We also painstakingly labored on correcting
a plain text file generated by OCR from ABBYY Finereader of the 1823 text. We discovered
it was best to convert these to XML (easily done with a series of Find and Replace
operations from pseudomarkup to angle brackets). In May 2017, we processed our first
collation of the documents, and discovered in the process that we could output plain
text tables to align the outputs side by side, as well as XML output, both of which
are useful. We also discovered two things that have caused us to return to revise
substantially the documents we had prepared:
-
Collating multiple versions of a novel is highly demanding of processing power, and
lags considerably and may not properly locate alignments and deviations if the the
entire novel is processed at once. I had to prepare my XML for chunking again, with
larger roughly chapter-sized chunks rather than the one or two paragraphs of hypercard
chunking from the PAEE. That meant dispensing with structuring an elaborately hierarchical
document prior to machine collation: the highest level of hierachical organization
below the document node is a paragraph, and the structural components of the novel
(chapter and volume and letter divs) are signalled with milestone style elements.
Milestone elements are used to signal the boundaries of the pieces to be processed
by collateX.
-
Once we had processed the collations, we easily saw in collateX's plain text output
tables many more errors to check and correct against our photofacsimiles.
The irony of this is, just as we thought we might be finished with
stitching up
the body of the
Frankenstein Creature, we discovered a necessity to break it into pieces again in order to faciliate
collation. Collation outputs are necessarily multiple now, and must be processed again
to stitch them together. Collation is now a recurring process, helping us to note
corrections still to be made in our base texts. The plain text and simple XML files
we have prepared now serve as a sort of
ur-text
still leading to our goal of preparing a new synoptic TEI document combining the
three texts. From the process, we begin to see some new flexibility and value in flattened,
chunkable XML documents.
The need to work on the collation in small units is further exacerberated by analysis
of the work ahead on working the manuscript notebooks into the collation with the
print editions. At first we thought that ur-text
file would be an XML document containing angle bracket markup in the form of critical
apparatus tags, thus:
<p>I am by birth a
<app>
<rdg wit="#c56"><ptr target="http://url.shelley-godwin"/></rdg>
<rdg wit="#p1818">Genevese</rdg>
<rdg wit="#p1823">Scotsman</rdg>
<rdg wit="#p1831">Martian</rdg>
</app>.
</p>
This encoding (of a nonsensical and nonexistent sample passage) demonstrates our first
plan of 2016 for interweaving three text files together with pointers into the Shelley-Godwin
Notebooks, which combine text and image and cannot be rendered as text documents.
Our team member at MITH and my colleague on the TEI Technical Council, Raffaele Viglianti,
first planned to encode linkages to associated mansucript notebook facsimile pages
in the Shelley-Godwin Archive. The pointers would lead in a published edition to links
from our edition into related passages of the draft notebooks. However, we discovered
a major problem with this plan on examining the output of our collation of the 1818,
1823, and 1831 editions:
-
First, the collation output is much too complicated to make processing by hand particularly
easy.
-
Perhaps more significantly, the notebooks are themselves a variant edition in their
own right, and it would perhaps be more efficient to process their texts with automatic
collation in parallel with the print editions we have been preparing.
This raises a fresh set of challenges for our project. The notebook XML chunked quite
finely, with a separate file for each page, using diplomatic TEI markup that prioritizes
the description of each page. Line-breaks are particularly vexing because where words
in the notebooks break at the end of the lines, there are no consistent reliable symbols
to indicate how they are joined on the next line.
Our plan is to pull out the line elements, use the existing markup to locate paragraph,
and preserve information about insertions and deletions in the TEI of the Shelley-Godwin
Notebooks. The new version of XML we produce would then be more or less compatible
with the editions we prepared of the 1818, 1823, and 1831 texts for the purposes of
collation, working in some additional information from the diplomatic edition and
finding a way to signal that information in the critical apparatus output. We would
need to stitch
the thousands of notebook pieces into the larger chunks at alignment points we can
identify across all of the documents. This will undoubtedly prove the most challenging
stage of our work so far.
Reflections toward a theory of up-translation
The Bicentennial Frankenstein project’s encounter with an impressive early hypertext
edition raises more general questions worthy of reflection towards theorizing the
up-transformation process:
-
How do we understand the relationships among generations of digital editions?
Our experience urges caution with hasty reappropriation or automated methods in up-translating
dated documents. Careful document analysis and an assessment of the vision of the
original edition will challenge the would be up-translator to find some way to respect
the vision and scope of a dated electronic edition.
-
What aspects of the old hypertext editions (or editions in formats not consistent
with our own) transcend or exceed the structures we currently consider sustainable?
What perspective might a thorough review of the first still extant hypertext editions
contribute to our scholarly editing practice now?
The fragmentation of the early Frankenstein into nearly 500 pieces represents a particulated
vision of collation that made difficulties for our up-translation process, and we
did not anticipate that we would need to chunk
the documents yet again, and yet once more (for the manuscript notebooks) in our
own software-assisted collation.
The survival of a particulated, chunky Frankenstein edition is the most remarkably persistent feature of our work thus far. We have been
discovering that large documents with entrenched hierarchies are difficult process
in portions. When we prepare chunks or segments of text for collation, we need to
make sure their start and end points are aligned in some way, so we look for those
moments of alignment and set milestone units as signal posts. But to produce tractable
files, cutting through volume divs (for example) raises problems, particularly when
those structural units of hierarchy are not consistent in the editions being compared.
What have we learned? We discover that deep-nested hierachies create problems when
we need to compare different editions of the same text whose hierarchies are not aligned.
We discover that readily fragmentable output may be preferable for indicating points
of intersection. We consider that the process of up-conversion and up-translation
might best produce multiple formats of output, where plain text and XML co-exist.
We find on the path of our goal in producing a synoptic TEI edition, a desirability
in sharing both a hierarchic document that stores information about comparison as
well as separate edition files of each document. The imposition of structural and
crit-apparatus hierarchy represents one interpretation of our documents, but that
will not be the only one worth preserving over generations. Apparently, we need simple,
granular documents, too.