How to cite this paper
Hajo, Cathy Moran. “The sustainability of the scholarly edition in a digital world.” Presented at International Symposium on XML for the Long Haul: Issues in the Long-term Preservation
of XML, Montréal, Canada, August 2, 2010. In Proceedings of the International Symposium on XML for the Long Haul: Issues in the
Long-term Preservation of XML. Balisage Series on Markup Technologies, vol. 6 (2010). https://doi.org/10.4242/BalisageVol6.Hajo01.
International Symposium on XML for the Long Haul: Issues in the Long-term Preservation
of XML
August 2, 2010
Balisage Paper: The sustainability of the scholarly edition in a digital world
Cathy Moran Hajo
Associate Editor, Margaret Sanger Papers
New York University
Cathy Moran Hajo is the Associate Editor and Assistant Director of the Margaret Sanger
Papers, a scholarly editing project located at NYU. With the Sanger Papers, she has
published three volumes of The Selected Papers of Margaret Sanger, a two-series microfilm edition, and two electronic publications. She has worked
as a documentary editor for over twenty years, specializing in the publication of
historical materials in digital form, and participating in scholarly conferences and
meetings on digital issues.
Cathy is a Past President of the Association for Documentary Editing. Dr. Hajo received
her PhD from NYU in 2006. She is the author of several articles on documentary editing,
most recently, “Scholarly Editing in a Web 2.0 World,” (Documentary Editing, 2009)
and “Last Words: Documenting the End of Lives,” (Documentary Editing, Fall 2006).
In addition to her work with the Sanger Project, she is the author of Birth Control on Main Street: Organizing Clinics in the United States, 1916-1939 (U. of Illinois Press, 2010).
Abstract
Scholarly editions must be used for generations; by nature they require a stable long-term
publication format. Some editors have eagerly embraced digital editing and XML, but
many more editors remain unconvinced that digital publications can last as long as
printed books. Community standards and DTDs for editions have not been widely adopted
and editors lack consensus about what a digital edition should be. XML's stability
and sustainability is critical to efforts to go beyond “the book,” and to develop
new ways of presenting texts and scholarly commentary. To build 21st century editions,
we need tools to make XML encoding easier, to encourage collaboration, to exploit
social media, and to separate transcriptions of texts from the editorial scholarship
applied to them.
Scholarly editors have long been invested in the creation of long-lasting and sustainable
publications. Whether they create complex multi-year projects that rely on cooperative
teamwork or develop short-term solo projects, editors understand that their work will
be consulted for years to come. The expense of locating, selecting, transcribing,
annotating and publishing historical documents could not be maintained if these editions
were not built to last. Editors have developed practices and policies to ensure that
their readers can confidently rely upon their versions of important historical manuscripts.
This care has always extended to the stable publication formats editors chose, whether
in letterpress or microform. When editors turn to digital publication, sustainability remains of critical importance.
Scholarly editions are also committed to bringing primary sources to broader audiences.
Editors take Thomas Jefferson to heart when he wrote: “…let us save what remains:
not by vaults and locks which fence them from the public eye and use, in consigning
them to the waste of time, but by such a multiplication of copies, as shall place
them beyond the reach of accident.” By publishing edited works, we take fragile and unique archival manuscripts and make
them available in research libraries where scholars and students can access them.
They don't reach everyone. These editions are expensive to purchase and most libraries
do not carry complete sets of all of them, but given the technology of the time, printed
volumes certainly served to disseminate and preserve historically important materials.
The advent of digital publication challenges editors to expand their reach, to move
beyond the ivory towers of research libraries to high schools, town libraries and
even to the comfort of private homes. This monumental expansion of our audience forces
us to rethink how we edit documents and how a truly accessible edition should behave.
Editors have always had to balance costs against their ability to preserve and disseminate
documents. Never has this been a more difficult task than at present, when the cost
of creating long-term quality digital editions often prohibits editors from offering
them freely to the public. Digital publishing permits unprecedented access possibilities,
but it is a fragile medium that is susceptible to obsolescence. Should we create an
edition that is as sustainable as digital text can be, but might not be widely accessible,
or should we create a widely accessible edition that might not last a long time?
Neither option is acceptable. The reality of funding for scholarly editing in the
21st century is this: it is difficult enough to raise the funds to create the content
of these editions. Adding the technical specialization needed to render these texts
in well-formed XML is beyond the capabilities of many editing projects. Some editors,
too, are reluctant to embrace the notion of providing free and open access to their
editions and support the idea of subscription-based for digital editions. Arguing
that their print volumes were never free of charge, many editors seek to generate
income to help offset project costs or seek royalties for their intellectual work.
Federal agencies prefer, or in some instances demand, that the editions they fund
be produced using XML, but they do not provide sufficient guidance or tools to help
editors comply. After fifteen years of being told that markup is the gold standard
for digital publications, we get it. We know that we need to use XML on our digital
texts, but I would say that the best adjective to describe our adoption of XML might
be “reluctant.” This is especially true of historical editors as opposed to literary
editors. Few historical editors have embraced digital publication's promise, and
most have treated digital publication as an add-on to their central work of publishing
volumes.
Aside from our goals of accessibility and sustainability, digital publication questions
some of editing's guiding premises. I don't believe that many editors, even those
working with digital text, have really explored or thought through these issues.
To date, most digital editions still closely resemble the books they are based upon.
As we finish digitizing older publications and begin to construct more and more born-digital
editions, we have an opportunity to redefine the edition, the editor and the editing
project. While none of us can peer into a crystal ball and know what is yet to come,
I expect that the edition of the 21st century will differ substantially from the editions
of the 20th century.
The transcription is central to the work of the scholarly editor. Faced with the problem
of how to make primary sources useable and accessible in a print world, editors decided
to transcribe them. Publishing paper-based facsimiles was too expensive and while
microfilm editions offered images, they could not easily include annotation or much
editorial intervention. To render all the complexities of their texts, editors developed
typographical mechanisms to record changes of hand or pen, additions, insertions,
deletions and margin notes. Their familiarity with the texts enabled them to read
and transcribe difficult handwriting, account for variants of a text and provide historical
context that makes the manuscript come alive. The transcription is the text for most editions, once it is proofread or verified as accurate, it is the
center around which the project orbits and if transcriptions are questioned or found
wanting, the edition is quickly discredited and disused. As the capability to provide high quality images over the internet has increased
dramatically, editors need to grapple with the idea that transcription might have
a different role as we move into the future.
Another agreed upon principle of scholarly editing is that the editor's work should
be as unobtrusive and objective as possible, which in part contributes to the edition's
long shelf-life. Despite the fact that selecting texts and drafting annotation can
be intensely subjective tasks, editors try to keep interpretation and historical arguments
to a minimum. Our role is to present the most important documents with factual annotation
that enables the reader to understand the text and reach his or her own conclusions.
More often than not, the editor who feels compelled to weigh in on current historiography
or the controversial issues of the day does so outside the edition, in a journal article,
monograph or biography. At projects that employ more than one editor, the work of each is subsumed into one
consistent editorial voice. We credit editors on the title page, not each portion
of the edition that the scholar created. In this way editions are truly collaborative
and a rare example of team-based humanities research. As digital scholarship allows
up to build larger and more far-flung collaborations, editors need to decide whether
that current mode of attribution should be continued.
Finally, editors agree that because their work is so labor intensive, it must be done
well. It will be a long time before anyone has the opportunity to do the work again.
The existence of a previous published edition on a topic forces the editor to explain
why the existing publication is so flawed that the time and money needs to be expended
to redo it. One of the promises of the digital age is the ease of re-purposing objects
once they have been digitized. When editors can continually revise and enhance their
editions will these projects ever truly end? Will we develop mechanisms by which
we can allow other scholars to try different approaches on our texts? Will they need
our permission?
Editions are not created in a vacuum. We rely on more than sixty years of tradition
and previous work to guide our steps. But not all our inspiration comes from the
print editions of the past. We are not just content creators, but users of digital resources. We have seen the work done by archivists, public historians,
and digital historians to digitize primary sources and to bring them to new and larger
audiences. We also see new ways of researching and collaborating with both experts
and the public and we want to try these out in our editions. Editors need to decide
how many of our the tried and true methods still remain valuable, and what aspects
of the new technology would be best incorporated into our editions. Digital historians
seek to do more than just enable searches of electronically rendered texts, they want
to encourage research and collaboration, to use data mining, visualizations, and other
computer-aided tools to analyze texts on a much larger scale than once possible. Sometimes
I am not sure that we recognize the power of conducting a simple Google Book search
for a string of text—we are searching more than ten million books in a few seconds,
a task unthinkable before digital publication. This kind of computing power can fundamentally
change the way we do research as well as how we formulate a feasible research question.
The problem that historians will face from this time onward is how to deal with an
abundance of sources, not how to overcome a dearth. So any method that we can develop to help to dice up our editions into smaller
useful portions will be valuable. While digital historians are only small minority
of the profession at present, in just a generation or two, all historians will be digital historians. Editors need to keep an eye on the trends
in this emerging field, to educate themselves in the technology, and learn to adapt
their editions to meet the needs of this important group of stakeholders.
Scholarly editors have been hearing about the benefits of markup XML for fifteen years
but that does not mean that most are comfortable with the idea. Again, I am talking
primarily about editors of historical documents, rather than those working on literature.
English departments and linguistic scholars adopted markup languages and computer
aided text analysis early and have led the way in the development of XML, especially
the Text Encoding Initiative guidelines that focuses on humanities texts. Digital
editions like the Women Writers Project, the Whitman Archive, Willa Cather Archive
and the William Blake Archive have benefitted from close associations with humanities
computing centers to create sites that rely on robust encoding of complex manuscript
material.
Historians have not been as quick to embrace markup language and I am not entirely
certain why. We are more reluctant to put the time into mastering XML or the programming
languages needed to search and display XML texts effectively. We balk at the amount
of technology we need to learn, seeing time spent on it as time spent away from our
documents and our research. It may also be that XML provides greater immediate benefit
for literary research than it does for historical research. The Willa Cather Archive
offers text visualization and analysis tools that enable readers to search for word
frequencies, create word clouds and concordances. These are useful tools, to be sure,
but not the first ones that a historian might apply to a set of documents. We might prefer to locate all the mentions of a person, organization, place, subject
or idea. We want to find the appropriate documents and study them through close reading.
I think that at some level we resist the idea that a computer could ape the way that
we attack documents. If our print editions are any clue, we invest the most time in
creating detailed subject indexes that organize the documents into important categories.
While historians are interested in the process of document creation, tracing variants
and versions over time, they are more interested in assigning content to the text.
Encoding these contextual relationships are a sort of cross between annotation and
indexing, something that might be done slightly differently by each project depending
on the interests of an editor or the specific documents being published. Believing
that each of their project's needs and processes to be unique, historical editors
have not really united to develop the digital tools that would lead to new ways of
looking at texts, either within or across editions.
We don't have a good model in use today for a sustainable XML edition with which we
could develop a shared conception of digital editing. There are a lot of silo projects
that have been developed for specific sets of documents and that are not broad-minded
enough to serve all editions. I think that a lot of these idiosyncratic editions
have good ideas, but often do not have sufficient infrastructure to ensure longevity.
The XML-based edition we are building at the Margaret Sanger Papers is an example
of such a specific application. The Project's focus is on Margaret Sanger, the 20th
century American birth control activist. Our entire archive consists of slightly
under 100,000 documents, published on microfilm. Only about one percent of these
documents were selected, transcribed and annotated for our four-volume print edition. For our digital edition, we did not want to repeat the work that we did with our
book edition. Instead we wanted to explore the capabilities of XML encoding, and
text searching. We selected Sanger's articles and speeches, six hundred texts in
all, as the best test for digital publication. Few of these documents were included
in our book edition because of their length and the fact that they were somewhat repetitive—the
vast majority dealing with some aspect of birth control. Even though all of them
were accessible within the microfilm, they were not searchable in that format. We
believed that these documents would best benefit from the ability to search text and
the addition of metadata. We began this project in 2003, but without dedicated funding
for it, progress has had to be slow. The beta site contains about three hundred documents
now, searchable by text, date, title, format, publication venue, and by subject.
We worked with staff at New York University's Humanities Computing Group to set up
the initial search and display and as we complete the edition we will add additional
searches for the names of organizations, people, places, titles of books, and text
quoted by Sanger. All these are already tagged in the texts. Our XML encoding is
not as complex as many others I have seen, but we feel that the combined search capabilities
of text and metadata offers something different than either our book or microfilm
editions. It does not offer annotation in its classic form, but allows in-depth research
of Sanger's ideas, audiences, and changes in her arguments over time.
Ours is a concise digital edition drawn from a closely connected group of texts that
share a common format and purpose. The content encoding that we have done required
an understanding of the texts, knowledge of the references made, and the ability to
construct detailed subject entries that provide meaningful intellectual divisions
between six hundred documents that might all fall under a handful of Library of Congress
Subject Headings. If we wanted to expand this edition to include correspondence or
other kinds of documents, we would have to revise our tagging scheme as well as our
display interface. We could not use a system like this to digitize all one hundred
thousand documents in our microfilm edition because it would be too time consuming.
We do not have the staff to carry out transcription and content encoding and index
entries for such a large number of documents. It is not even easy for us to make
small adjustments to the edition's design because the editorial staff did not create
the programs that search and display the texts.
Our situation is similar to that of many other projects who don't have access to digital
text experts at a humanities text center. Our original project was encoded using
a slightly amended version of the Model Editions Partnership DTD for TEI P4, created
with the help of Matthew Zimmerman of NYU's Humanities Computing Group. Sustainability has become a concern for us. Since our edition began, the TEI has
introduced P5 and NYU dissolved it's Humanities Computing group. What we have right
now works, but we need to decide whether to spend the time and money that it will
take to update our encoding to comply with TEI P5 or stay with the older version.
If we do choose to migrate to P5, we would need to redevelop our encoding policies,
resolve the differences in the texts already encoded, and recreate the search and
display interfaces to work with the new encoding. While these tasks might not be
difficult for those who work with TEI, XSLT and PHP every day, for us it will require
either raising the funds to hire a programmer, or spending many, many nights laboriously
working our way through web tutorials and a small library of the Complete Idiot Guides!
These kinds of predicaments tempt editors like us, pressed for time, to consider farming
out XML encoding to a consultant or their publisher. There can be a danger in going
that way because I don't think that anyone knows better than we do how people can
use our texts. If we don't master XML encoding we won't be able to participate fully
in decisions made on how the texts should be tagged, nor will we be able to fully
explore the possibilities of digital editing. And I think that we need to always be
thinking about how to make better editions, even if it means breaking with some of
our traditions. For example, I think that when we convert a print edition to digital
form, we should be describing the manuscripts, not trying to replicate the organization
or structure of a specific volume. When we cling to our older formats, I think that
we limit the possibilities for redefining the way we do things with an eye to the
capabilities of digital publishing.
A case in point is the University of Virginia's Rotunda digital imprint. Rotunda
is the fastest growing source for digital historical editions, best known for its
ambitious project to digitize the massive multi-volume Founding Era projects, including
both volumes previously published in paper and those still to be published. The Founding
Fathers problem offered a “perfect storm” of a test for digital publication. Their
shared geographical, chronological, and subject focus provides a strenuous test of
how XML can integrate research not only across volumes, but across editions as well.
The Founders had hundreds of print volumes, only some of which were in any kind of
digital form, which meant embarking on a large-scale legacy conversion project, while
at the same time creating a workable platform for the continued production of volumes.
Finally, it offered the challenge of developing a way to combine the editor's main
access tool—the index—across large collections of volumes and editions. To date,
the combined Founding Era collection includes six editing projects, almost 90,000
discrete documents and almost 850,000 index references.
Putting the American Founding Era Collection online is a massive undertaking, but
one that doesn't really serve as a model for developing new digital editions. Perhaps
because it was led by an academic press, the Collection seems wedded to the idea that
readers want to see a digital version of the old print edition. Yes, it provides
text searching across the entire collection, but the organizing principle is not the
document, but edition and the volume. Granted, merging the work of so many different
editorial projects is no simple task, as even slight differences in editorial approach
or transcription styles can result in unexpected cross-edition search results. Despite
its lush appearance and useful hyperlinks to texts mentioned within each edition,
this digital publication still feels very much like using a book, or a series of books,
rather than an integrated digital collection. Some examples:
-
Each edition is organized by the original print series and volume. Documents retain
the original print volume's page breaks, rather than that of the manuscripts they
describe.
-
Because of the topical overlap in editions, the same document can appear in more than
one edition—for example a letter written by George Washington to James Madison will
appear both in the Washington Papers and the Madison edition. Though each edition
includes its own internal hyperlinks to related texts, there is no link to take the
readers between the two versions of an identical text.
-
The text searches that tie together the six editions are rudimentary. In the body
of the texts, XML encoding has been used sparingly, no doubt because of the cost of
converting all those back volumes. When, as often is the case in scholarly editions,
a portion of the text is bracketed to indicate the editor's regularization or uncertainty,
the brackets are not removed from the text searches. Thus, if one edition used brackets
in a phrase and the other did not, the text search would not find all instances of
that document. Searches also do not always locate variant spellings of the same word,
though the Collection does employ stemming. Most documentary editions use a literal
transcription policy that seeks to capture the text as written, with misspellings
and abbreviations rendered as is. This isn't usually confusing when read by a human,
but it becomes problematic when we rely on a computer to read the text.
-
Consolidated the indexes to multi-volume editions that were published for more than
fifty years is a daunting proposition. Indexing styles change over time and with
historiographical trends. Rotunda has made available consolidated indexes for the
Adams, Washington and Jefferson editions thus far. The indexes are created by coding
in hyperlinks from the index to the volume and page number, hence the decision to
retain the original pagination. Right now you can only search one index at a time
and it is not clear if there will be an effort to merge them.
Most of the questions raised by the American Founding Era Collection's digitization
are ones that every editor would have to address when trying to make the conversion
to digital form. It is made that much more complicated by the number of projects and
legacy print volumes. The decision to maintain the organizational structure of the
volumes limits the functionality that new volumes of the Founding Era or other editions
can have if they want to join Rotunda. Of the six editions represented in the American
Founding Era, only one was “born digital.” The Dolley Madison Digital Edition, when
used as a standalone product presents a more website-like document display. To the
left of the text are links to biographical identifications of names mentioned, keywords
assigned to the text, and places mentioned. Each of these are linked to fuller annotations,
short biographical studies and description of the places. A summary of the document
is used on search results pages to help the reader decide which documents to open.
This digital edition doesn't offer the indexing depth found in the Washington, Adams
or Madison indexes, but it does provide a more useable and flexible digital text.
When the Dolley Madison Edition is searched as part of the American Founding Era Collection,
however, we only see the transcription. None of the annotation is visible. All the
reader gets is a somewhat cryptic link at the bottom of the text that advises the
reader to “See this document in the standalone Dolley Madison Digital Edition.”
I don't mean to be overly critical of Rotunda's American Founding Era project. Rotunda
is the only organization even making the attempt to tackle the issue of digitizing
legacy editions. I don't think that Rotunda is claiming to be developing the next
generation of digital editions, but because they are actively seeking new projects,
their approach has the possibility of becoming a de facto standard for digital editions.
I worry that its very literal approach to these print volumes may inhibit the development
of more ambitious digital editions. Rotunda's light encoding of the texts and its
limited search options do not maximize the capabilities of XML encoding. Rotunda
editions are likely to always resemble print publications. Neither does it have
the capacity to include social media tools that digital humanists and web-savvy readers
seek to foster collaboration and reader engagement. And in order to cover its costs,
Rotunda charges a subscription fee for access to the collection. To be fair, our
small, intensive, and understaffed digital edition doesn't provide a good model either,
with little in the way of support staff or capability for migration to newer data
formats.
So where do we go from here? We need to do some hard thinking about how digital editions
should look how we can sustain them. Digital media presents a number of challenges
to the way that editors think about their texts and how they prepare them for the
public. Thinking and talking about some of these and trying to see how XML might
affect our decisions may help developers to anticipate future needs.
-
Images of manuscripts are now easier to digitize and serve over the web. If editors
can link their transcriptions to images of the manuscript, will that change the role
of the transcription? How many people really want to see the image? How many readers
really want to see all those strikeouts, additions, false starts and other complications
of a text? While we could provide two different views of a transcription using stylesheets,
do we need to do it? Could most of the people interested in the complexity of the
original be served by looking at a good digital image? Could we then default to a
regularized transcription that would be easier to read and more accurate as the base
of text searches? Perhaps we can encode a links to the document image in places where
the user might want to consult it. I am sure that some editors and some users would
not feel comfortable doing this, but I think that it is an option worthy of a trial.
It might not be appropriate for complex literary editions, but for many historical
editions, it might serve well.
-
The idea that once an edition was done it was unlikely to be done again is not a product
of the digital age. Once text is digitized, particularly when using markup like XML,
it becomes far easier to re-purpose it, run it through text analysis tools, add new
levels of encoding, and open up the possibility that other scholars might find new
uses for our old editions. At some point, a scholar can go back to the American Founding
Era Collection and encode those variant spellings, or create a version that ignores
brackets when searching. Someone might even want to try to tackle creating a comprehensive
index. The chances of this happening go directly to the question of sustainability;
because these texts were encoded in XML, they should be useable, so long as the scholars
are allowed to use them.
-
How will the editor's job as annotator change as more and more materials are made
available on the web? In days past, the editors' subject specialization and familiarity
with hard-to-find primary and secondary sources ensured high quality annotation. No
single scholar could dedicate the hours and years that long-term editing projects
do to their subjects. But now, the availability of more and more web-based resources
means that many once hard-to-find sources are readily available to the average reader.
Should the editor still summarize a book when he can link directly to it on Google
Books? Do we need to provide a short biographical identification when we can add
a link to an entry in the online American National Biography, the individual's obituary in the New York Times, or heaven-forbid, his entry in Wikipedia? I don't actually think that links can
replace all kinds of annotation, but with many kinds of facts easier to find every
day, editors should question how they annotate documents. As the digital edition reaches
further outside its boundaries for annotation, it may start to resemble a web site
more than a book. With the ability to use the rest of the World Wide Web as linkable
resources, will editions begin to resemble an ever expanding “life and times” of the
subject, limited only by the questions asked by the researcher, and the paths they
choose to take?
-
Following up on the idea that annotation may change, it strikes me that content encoding
might replace at least some annotation and indexing tasks. If instead of spending
time annotating, we can use our expertise to encode links to other documents, to names,
organizations, and topics, and spend more time creating in-depth indexing entries,
we may be able to provide as good a service to the readers as we now do by conducting
annotation research. All annotation will not go away, but the editor will be freed
to focus on the difficult concepts that the average reader might not be able to find
for herself.
-
One of the promises of digital publication is that it will make collaboration easier.
One can see that an editing team in the 21st century might not need to reside in the
same city or same continent. Figuring out how we can use cloud computing to construct
digital editions, and looking into how we might credit contributions may help attract
collaborators that have a skill or specialization we lack. One of the main problems
with digital scholarship, especially of a collaborative nature, is the inability to
easily cite the works on vitae or resumes. This can dissuade some academics from participating
in team-based research as they build tenures portfolios. Can we develop new systems
where portions of the edition are credited, such as translations, annotations and
metadata?
-
How will social media networks affect editing and XML? Web 2.0 tools are increasing
in sophistication and enabling large amounts of people from all walks of life to participate
in the creation of editions. One could conceive of a digital edition constructed
as a wiki by volunteers who locate, digitize, transcribe, research, and proofread
historical texts. Such a wikidition could grow either incrementally or exponentially
depending on its ease of use, general interest, and word of mouth. If it should become
even a hundredth as popular as Wikipedia has, one could see a large and diverse collection
of materials taking shape outside of the control of editors and scholars. Blogging
software has been used to present diaries, like that of Samuel Pepys, a site that
encourages readers to comment on individual entries or provide more formal annotation
in a companion digital encyclopedia. Investigations of the feasibility of using crowd sourcing for transcription and annotation
are currently underway at the Papers of the War Department, an image-based digital
edition sponsored by George Mason University. If any of these experiments take off, how will we preserve the digital editions they
create? Can one export a wiki or a blog post into an XML format for long-term preservation? Can we develop XML-capable wikis and blogs that retain their ease of use?
-
Will editors eventually include their project research files and databases in their
editions? Only a small amount of the research conducted by editing projects makes
it into the footnotes of their published editions. Should we share these research
files, open up project chronologies, genealogies, image files, and name databases?
Should we blog about research queries that come into our offices and about out own
research undertaken for the edition? Should we somehow provide our readers with the
experience of working in our editorial offices, where libraries, vertical files, word
processing files and databases are all pressed to the service of one topic? If we
do, can we easily convert these kinds of work files to XML, or should we be seeking
the development of an XML Office Suite that can handle our ongoing needs and also
make them sustainable and accessible in the long haul?
Will we develop new ways of thinking about documents if we look outside of our traditions
and perhaps if we look beyond what XML offers at present? Looking at other applications
like GIS—Geographic Imaging System—for inspiration on how to organize information
might be instructive. It strikes me that the way that a GIS map is constructed, made
of discreet layers and kinds of data that can be selected in any combination by the
user makes for an interesting model for digital editions. If the transcription and
linked image served as the “map,” with interpretation, annotation, and metadata organized
as stand-off encoding many people could share the transcription but be free to add
their own interpretative layer. A letter written by radical anarchist Emma Goldman
to Margaret Sanger while Goldman was on a speaking tour in Portland in 1915 would
interest both the Sanger and Goldman editing projects. The Goldman project might
focus its annotation on Goldman's doings and ideas, whereas the Sanger editors might
see the letter as evidence of Goldman's mentoring role in these years. Other interested
parties could also use the letter, adding their own comments, contextualization, and
interpretation to it. For example a staff member at an Oregon historical society
could use the letter in an online exhibit on the radicalism of Portland at the time.
He could link the places mentioned to maps or historical photographs of the city.
A genealogist might simply comment on a passing reference to her great-grandfather,
adding a link to her web-based genealogy. Each of these users of the text would
be adding to its meaning in different ways, all of which could enrich a reader's experience
of the letter. How can we create this kind of document in a way that allows any user
to their annotation, and also allow users to choose to see any combination of these
annotations simply by selecting them with a mouse click?
I don't know if XML can do this, but what I am getting at is that if we don't keep
thinking creatively about how we might present these documents, we will end up replicating
digital versions of old book editions. If we don't continue to evolve and improve,
we run the risk that other kinds of digital publication, perhaps those that are not
as long-lasting, will become more popular because they have better functionality.
XML was created to render in digital form the publications and scholarship that we
were already producing in print. So it will be good at doing that, and maybe not as
good at representing less structured organizations of texts. That doesn't mean that
we shouldn't figure out ways to make XML do what we need it to do, it just means that
it might be harder.
Being that we don't know—we can't know—where digital history and digital editing will
go in the years to come, how can we ensure that the work we are doing now will last
five years never mind fifty? Sustainability is the capacity to endure change and
if we can say one thing about technology, it is that it is constantly changing. There
are a few simple ways to ensure the longest life for our work. One is to make high
quality content. If the legacy volumes of the Founding Father projects were not seen
as a valuable resource, there would be no great effort made to preserve them. Because
they have lasting value, efforts were made and money spent to keep them viable and
accessible. The other best practice is to do what you can to make it easy for the
next generation to preserve your work. I don't think we have to promise that it will
always last, just so that it will last until the next generation of technology comes
along. At that point, if the value of the work isn't there, it won't be preserved.
If it is, it should not be that difficult to migrate it. So don't develop your own
markup language, unless you are truly a genius, and if you do make sure to share it
with the world. Pay attention to what the digital humanists of the day are using
and advising. If we stick with the educated pack when it comes to data formats, we
can reasonably expect that the tools will be there to preserve our work. In short,
that means we should use XML.
But we are not the only ones with a responsibility for making our tests last. A big
part of determining whether or not a format is sustainable is whether it achieves
buy-in from those it seeks to serve. As I said, most editors know that they need
to use XML to create their digital editions, but that doesn't mean they really want
to. We need better tools and encoding environments to win over editors and other
content providers. We need increased and sustained educational offerings and practical
examples and templates that can help the most numbers of content providers, whether
they be editors, archivists, scholars or students, to put up XML encoded manuscript
material, and we need to make available the programs and stylesheets that will make
these texts display clearly and will take advantage of the encoding to generate valuable
searches. It behooves us to master XML encoding in order to take a creative part
in the development of digital editions. If we hand XML encoding over to consultants
or to or publishers, we are unlikely to get the kind of rich encoding that can substitute
for annotation or indexing.
XML encoding is expensive. Even well-funded providers like Rotunda do not have the
manpower to create in-depth encoding. It is no surprise that some of the best digital
editions are coming out of universities that have dedicated digital humanities centers.
Virginia, Nebraska, and Brown have built expertise and tools in XML encoding that
benefits their affiliated projects. George Mason University's Center for History
and New Media has taken a different approach, fostering image-based projects that
use their open-source Omeka software that relies upon a form-based creation of Dublin
Core metadata for each object. The costs of adopting XML for your edition at an institution
that is not engaged with digital humanities is high. Educational opportunities on
the web as well as through in-person workshops are wonderful resources, but they don't
replace the place of the kind of extended aid one can get from experts or consultants.
Lacking funding to build up a national program of educational resources on XML, we
need to foster more communication among XML users working with digital texts, archives,
literature, and museums. We can learn from each other and share with them the special
experience that we have in publishing historical texts in easy to read forms.
Ultimately, the best thing that we can do to ensure the long-term sustainability of
our editions is to engage more fully with XML developers and our colleagues working
with similar materials. Those of us that use XML ought to encourage our peers to
become more involved and truly master the capabilities and limitations that the format
has to offer. We need to pursue joint projects and consortia to fund this development.
If we can build some simple tools that can get editors started on encoding their manuscript
material and displaying it on the web, we will have come a long way towards ensuring
that both the XML format and the work of historical editors will be around for the
long haul.