How to cite this paper
Lukehart, Peter M. “The Journey of The History of the Accademia di San Luca, c. 1590-1635: Documents from
the Archivio di Stato di Roma into and out of XML.” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Lukehart01.
Balisage: The Markup Conference 2018
July 31 - August 3, 2018
Balisage Paper: The Journey of The History of the Accademia di San Luca, c. 1590-1635: Documents from the Archivio
di Stato di Roma into and out of XML
Peter M. Lukehart
Associate Dean and Project Leader
Center for Advanced Study in the Visual Arts, National Gallery of Art
Peter M. Lukehart is Associate Dean at the Center for Advanced Study in the Visual
Arts, National Gallery of Art (2001-present). His recent publications include “Nuda
veritas: The Afterlife of Michelangelo’s Indecorous Figures in the Last Judgment,”
in After 1564: Death and Rebirth of Michelangelo in Late Sixteenth-Century Rome, ed. Marco Simone Bolzoni, Furio Rinaldi, and Patrizia Tosini (Rome, 2016), and “The
Practice and Pedagogy of Drawing in the Accademia di San Luca,” in Lernt Zeichnen! Techniken zwischen Kunst und Wissenschaft, 1525-1925, ed. Maria Heilmann, Nino Nanobashvili, Ulrich Pfisterer, and Tobias Teutenberg,
exh. cat. (Munich, 2015). He has a longstanding interest in the education and incorporation
of artists in the early modern period. His publications on this subject include contributions
to the exhibition catalogue Taddeo and Federico Zuccaro: Artist-Brothers in Renaissance Rome (J. Paul Getty Museum, 2007) and to The Artist’s Workshop, published under his editorship in the Studies in the History of Art series at the
National Gallery of Art (1993). He also served as editor of the Accademia Seminars
(2009), for which he wrote the introduction and an essay “Visions and Divisions in
the Early History of the Accademia di San Luca.” He is project director for an online
research database entitled “The History of the Accademia di San Luca, c. 1590-1635:
Documents from the Archivio di Stato di Roma” (www.nga.gov/accademia/).
Copyright ©2018 Board of Trustees, National Gallery of Art, Washington
Abstract
When we first undertook the creation of a research database of documents concerning
the early history of the Accademia di San Luca in Rome (one of the first artists’
academies in Europe and the model for most subsequent institutions worldwide), we
were fully committed to following the guidelines of the Text Encoding Initiative (TEI:
http://www.tei-c.org/). Based principally on early modern documents in Latin and Italian, the project seemed
perfectly tailored for the rich organizing and searching capabilities TEI provides.
It also promised to be a future-proof and sustainable platform, anchored as it was
in XML. From its launch in 2010 until about 2014, the site performed very well and
it served tens of thousands of international researchers. What we had not anticipated,
however, was that our bespoke website depended very much on the knowledge and expertise
of our web architect, who had created a hybrid of TEI that allowed for automatization
of tag creation and for the site to interact with content on other areas of the National
Gallery of Art’s (NGA) website (under whose umbrella we function). The web architect’s
untimely death in 2010 put the project in a precarious position vis-à-vis making corrections
and updates, as the coding was not documented. Each change required hiring consultants
at high cost. Adding to our predicament, the NGA changed platforms for its website
and was henceforth requiring that all projects conform with its HTML program (CQ,
now AEM, both Adobe products). Any outliers were responsible for providing all of
their own maintenance and consulting needs, an expensive and time-consuming prospect.
Faced with these compelling challenges, we decided to yield to forces beyond our control
and migrated the entire website from TEI to HTML, which took over a year—from mid-2014
to late 2015. Good news accompanies this tale of loss of our foundation in XML principles:
members of the team can now add content without any training in TEI; there is greater
interoperability with the other areas of digital content on the NGA’s website; and
we are able to benefit from the NGA’s participation in the International Image Interoperability
Format (IIIF) initiative.
Table of Contents
- Acknowledgements
(SLIDE 1) I want to thank Tommie Usdin for the invitation to speak today. As a former
student in one of her intensive summer programs, where I learned so much about the
basic elements of XML, this early training helped guide the structuring of our research
database, The History of the Accademia di San Luca, c. 1590–1635: Documents from the Archivio
di Stato di Roma, from its conception. I am doubly grateful that she invited a traitor to the cause
to speak about our reasons for migrating from XML to HTML. Taking a page from the
Helsinki lexicon, perhaps what I should initially have said was: I couldn’t imagine that we wouldn’t use XML in perpetuity.
Somewhere in that muddle of verb tenses and double negatives lies the tale I have
to tell today.
Before engaging in the content of this conference on mark-up, it would be useful to
know something about this project and its ambitions. Drawing from the original statutes,
the proceedings of meetings, the ledger books, as well as notarial and court records,
The Early History of the Accademia di San Luca, c. 1590-1635 brings together a large number of previously unpublished documentary materials concerning
one of the first artists’ academies in Europe and the model for most subsequent institutions
for the teaching of art worldwide for four centuries. Conceived as two complementary
tools for researchers and scholars of early modern Europe, the database of documentation
on the website (https://www.nga.gov/accademia/en/intro.html) and the printed volume of interpretive studies, The Accademia Seminars: The Early History of the Accademia di San Luca in Rome, c.
1590-1635 (Washington: National Gallery of Art, 2009), shed light on the foundation, operation,
administration, and financial management of the fledgling academy from its foundation
in 1593 to its consolidation as a teaching institution with its own titular church
designed by Pietro da Cortona around 1635.
(SLIDE 2) In 2007, when we first undertook the creation of a research database of
documents concerning the early history of the Accademia di San Luca in Rome, we were
fully committed to following the guidelines of the Text Encoding Initiative (TEI:
http://www.tei-c.org/). Based principally on early modern documents in Latin and Italian, the project
seemed perfectly tailored for the rich organizing and searching capabilities TEI provides.
It also promised to be a future-proof and sustainable platform, anchored as it was
in XML, a textual program that is non-proprietary. Our first step was to secure funding
for the project, which we did thanks to a large grant from the Getty Foundation in
Los Angeles. One of our first tasks was to hire a consultant for the encoding of
the documents. Colleagues suggested that we contact David Seaman, a pioneer and longstanding
advocate of TEI, which was one of the wisest steps we could have taken. With David’s
help, we created a small team of two art historians, a paleographer who made the official
transcriptions of all the documents identified by the team, a text encoder, and a
p.i. (me) responsible for overseeing all aspects of the project.
For its part, the National Gallery of Art (my employer) provided technical support
from the Web team (principally responsible for formatting and styling the pages) and
IT support in the form of our code writer and web architect, Richard (Ric) Foster.
Ric, like David, was able not only to accomplish large quantities of work against
tight deadlines, but also to innovate. Ric wrote the codes (largely in Perl script
[this being the early naughts]) that allowed us to automate the tagging of personal
names, dates (it was David’s bright idea to make all dates machine readable—maddening
for the Europeans, but logical for the coders and the database), and places. This
process, which shaved weeks—if not months—off the text mark-up, took a huge burden
away from the text encoder on our team who then had to clean up the reduced number
of items that remained: implied names (repeated first names without the surname attached)
and places (assuming locations within Rome by contextual clues); words that went over
multiple lines or folios; people identified only by their title; etc. The encoder
was also responsible for marking up key words, document type, and notary. These terms
were too idiosyncratic to automate.
Since I am addressing serious XML users, I wanted to share several images of the workflow
that Ric created for the project (SLIDE 3). The first slide shows the Site Content
production and Processing. The left side addresses conversion of the transcriptions from MS word files (Accademia team), through XHTML to rough
TEI XML (IT data processing) to manual clean-up of the TEI XML (Accademia team) to
the development folder (Accademia team). On the right, the processing side, the Accademia team input the 2nd draft TEI files, the bibliographies, person
IDs, and images files all in XLS. When completed, these files would trigger the application
of supporting XML. And finally, we arrive at the promotion to the Web, where the content and search apparatus are displayed in HTML; the underlying
TEI mark-up was always available as a clickable asset for any researcher who chose
to view it. In addition to the names, places, keywords, document types, notaries,
and dates, there was also metadata concerning the authors of the content and mark-up;
the date it was last worked on; and the source in the archives in Rome. [I should
mention, too, that this metadata is not preserved in the current Accademia website;
we have to indicate it in the HTML. It is no longer embedded in the documents, which
is a loss.]
The second slide (SLIDE 4) is the one I used to take on the road to share with my
fellow art historians so that they could get a general sense of the process of production,
conversion, processing, and promotion to the Web. It served as a Reader’s Digest version. With a grant from the Samuel H. Kress Foundation in 2010, the year the site
was launched, we made presentations to researchers in Rome, Florence, Pisa, Genoa,
Paris, London, Oxford, Cambridge, and Toronto, as well as New York, Washington, Los
Angeles, and Chicago (these latter domestic events took place over several years at
the Center for Advanced Research in the Visual Art’s [CASVA’s] expense).
From its launch in 2010 until about 2014, the publicly accessible site performed very
well and it served tens of thousands of international researchers. What we had not
anticipated, however, was that our bespoke website depended very much on the knowledge
and expertise of our web architect, who had created a hybrid of TEI that allowed for
the automatization of tag creation just described and for the site to interact with
content on other areas of the National Gallery of Art’s (NGA) website (under whose
umbrella we function). Ric Foster’s untimely death in 2010 put the project in a precarious
position vis-à-vis making corrections and updates, as the coding was not documented.
For the first two years we were fortunate not to have any major incidents; by 2012-2013,
we wanted to integrate corrections to the transcriptions; add new documents, bibliography
and images; and create a mapping function using geographic information systems. Each
change to the site, addition of content, upgrade, or new feature required hiring consultants
at high cost. Beyond the expense, we also had to find tech support who were able
to understand the web architect’s coding and create fixes. We did locate one consultant
who was able to accomplish this, but the cost limited the number of times we could
call on him for assistance.
Adding to our predicament, the National Gallery of Art had changed platforms for its
own website and was henceforth requiring that all projects conform with its web content
management program: first Adobe Communiqué 5 (CQ), now Adobe Experience Manager (AEM).
The motivation was, in part, to find a content management program that would allow
staff members to create and update their own pages independently and with minimal
training, and, in part to deploy one program across web, on-site, and mobile devices.
Any outliers were henceforth responsible for providing all of their own maintenance
and consulting needs, an expensive and time-consuming prospect as we knew painfully
well. Faced with these compelling challenges (gun to head, is the way I analogized
it), we decided to yield to forces beyond our control and thus migrated the entire
website from TEI to HTML, which took over a year—from mid-2014 to late 2015.
Which brings us to the second part of my talk, where I will summarize the process
of migration. My colleagues in IT suggested that I use the word recreate
rather than migrate, since we did not have the luxury of converting all or really
any of the marked-up text into CQ. Instead, every one of the 1300 names, as well
as hundreds of terms, places, document types, notaries, and dates had to be hand tagged.
The documents were still the source of content for our website; however, they were
no longer the database itself. In describing this process, I need to offer my deepest
gratitude to the Accademia team: Silvia Tita, Courtney Tompkins, Chelsea Cole, and
Benjamin Zweig (this summer Hannah Segrave has helped to create this brief history
of the migration). They did the lion’s share of the work I will now illustrate.
The first slides show the workflow envisioned by the Gallery’s IT staff in tandem
with the consultants/developers from Ukraine who were building the architecture for
our new site. In these slides (SLIDES 5-7 showing steps 1-4), you see that one of
the most essential aspects of the site was that it is bilingual (English and Italian),
which was easily mirrored in TEI. In CQ some content could be copied by the developers
across the two languages; but others had to be entered by hand by the team. (SLIDE
6) Adobe AEM considers English and Italian versions completely independent ….
From the broken English (SLIDE 7) that follows I am relatively certain that this
description was created by the Ukrainian team. What our team provided for the developers
were storyboards that showed what we wanted the pages to look like and what kind of
faceted searching we envisioned. If anyone would like to review these slides, I am
happy to share them, but to save time I will mention only highlights.
(SLIDE 8) shows the Translation of authored content.
(SLIDE 9) The English pages were considered the templates for the Italian pages.
The former were most often copied to become the basis for the latter. (SLIDE 10:
select Accademia tags) Here, the tags intended for searching also had to be translated
using the Tag Manager.
Customized tags (SLIDE 11) had to be created by the developers, and our team then
authored and moved the content to the appropriate categories.
At this point, it would be important to mention a political exigency we had not counted
on: in the midst of our data migration we learned that our Ukrainian developers lived
in the contested region of Crimea just as it was being annexed by Russia. You can
imagine our concern both for the safety of our new-found colleagues and for the future
of our site. Amazingly, some team members moved in with families in Russia; others
found ways to remain online and working in Crimea. In the end, we did not fall behind
on our timeline. The geopolitical implications of the annexation of Crimea are another
matter.
(SLIDE 12: Person pages) One of the most valuable aspects of our website is that it
provides the names of 1300 artists, church officials, government officials, patrons,
landlords, tenants, and other inhabitants of the city of Rome in the late sixteenth
and seventeenth centuries. The following slide shows how a Person page (SLIDE 13)
was created in CQ. Again, the Accademia team provided the desired structure and the
developers created the architecture for its production. Our team then provided the
relevant items for searching.
One of the advantages of working on the same platform as the NGA website is that we
are able (SLIDE 14: step 3) to populate the images on the Person pages with content
already in TMS (the collections management system used by the Gallery; that is paintings,
drawings, prints, photographs, sculpture, etc.). Further (SLIDE 15), we are easily
able to integrate images from the Digital Asset Management system (from our own collections
and those of the Gallery). Similarly (SLIDE 16), we can access the Gallery’s library
catalogue to link bibliographic references to books in the collection (when the book
is outside of our system, the link goes to WorldCat).
I have reserved discussion of the migration of the document page to the end, because
the documents (SLIDE 17) and the digital images of them did depend on the naming protocols
that we established in the 1.0 version of the site. (SLIDE 18) We reused the titles
and the document id numbers; the repository name, number, and date could therefore
be populated from the metadata. (SLIDE 19) As you can see the summaries and all the
tags had to be recreated by hand, using the language and organization of the original
site. (SLIDE 20) The text of the transcription, by contrast, could easily be copied
(but without the links and mark-up, which had to be deleted). (SLIDE 21) Footnotes
in CQ are no longer embedded, and there is no mouseover in the text; rather, they
are identified by the location at the bottom of the transcription page(s). It is
not an ideal solution, but it is a workable one.
Good news accompanies this tale of the loss of our foundation in XML principles: one
of the most significant benefits is that members of the team can now add content (SLIDE
22 new doc and transcription by paleographer Roberto Fiorentini) without learning
XML or TEI. As we have seen, there are templates for the new site that are easily
mastered, and team members can upload new documents, create tags, add images, bibliography,
and the like after a few days of training in CQ/AEM.
In addition, there is greater interoperability with the other areas of digital content
on the NGA’s website: the newly migrated website provides faceted search components
that allow the user to explore the documents by using names, keywords, document types,
places, notaries, and year dates, just as the original site did (SLIDE 23: screen
capture). Although the structure of the data is different, search results remain as
accurate and complete as before, while including significant enhancements. For example,
researchers can now either select a single category for searching or combine guided
searches in up to six categories. Searchable names (now numbering around 1,300) (SLIDE
24: Screen capture) include those of artists and artisans as well as individuals constituting
a wide swath of the population of Rome who transacted business with members of the
Accademia.
The site now provides pages for all of the individuals mentioned in the documents,
including references and links to the documents in which their names appear, with
a new feature that indicates the role or roles that they played in Roman society and/or
the Accademia, if retrievable. For well-known artists or artists who contributed significantly
to the life of the Accademia, the site now incorporates artists’ pages that include
not only links to the documents in which they are named but also selected bibliographies,
related images, and in some cases portraits. The site’s original features have been
completely updated and re-edited to correct errors and inconsistencies as well as
to incorporate new information. Bibliographies are linked either to the catalog of
the National Gallery of Art Library or to WorldCat (SLIDE 25) so that researchers
can access complete bibliographic information for every reference.
Most of the works of art originally represented were from the collection of the National
Gallery of Art, with about a dozen from other museums that house paintings from the
Samuel H. Kress Collection (by special agreement with the Samuel H. Kress Foundation).
Hundreds more related works of art by academicians from museums throughout the world
have been added in the two years since the migration. We have carefully curated these
images from institutions that are now making them freely accessible to the public
(such as the NGA, the Metropolitan Museum of Art, the J. Paul Getty Museum, and the
Yale University Art Gallery, among others); thus, they are of the highest quality
and resolution and there are no restrictions on viewing and downloading them.
In addition, the Accademia project team completed the creation of a mapping feature
that allows researchers to locate places mentioned in the documents on four historic
maps of Rome. Once again, we have benefited from the NGA’s participation in the International
Image Interoperability Format (IIIF) initiative. Briefly, there are a growing number
of museums, universities, and libraries that are making high-definition images freely
accessible to the public. This initiative offers researchers and scholars high quality
images that can be compared, zoomed for fine-grained analysis, written on (with texts
or notes), and layered. I have cued a (SLIDE 26: screencast) screencast of our maps,
dating from Étienne Dupérac in 1577 to Antonio Tempesta in 1593, to Giovanni Maggi
in 1625, to Giovanni Battista Falda in 1676, so that you can see how we are taking
advantage of these new image technologies. Since the time that the screencast was
made last summer, we have added map pins and mouseovers with short entries; a series
of five longer essays; bibliographies; comparative images; and soon there will be
destination links to rare guidebooks from the sixteenth to the eighteenth centuries
from the NGA’s library.
Finally, migration to the platform used by the National Gallery of Art website (www.nga.gov) will ensure the long-term sustainability and extensibility of The History of the Accademia di San Luca, c. 1590‒1635: Documents from the Archivio
di Stato di Roma. With apologies for preaching to the choir among the digitally savvy, we can never
take long-term maintenance and support for granted.
I have a short coda that provides an afterlife for the original site and the documents
marked up in TEI. For the first few months after the migration of the site from XML
to HTML, the old site and new site ran parallel to one another, with a short text
on the original site explaining that it had been de-commissioned and that a newer
version was now available. For a brief time, the old site automatically re-directed
users to the new site. After six months, the original site was taken off public view
and placed onto the NGA intranet. This decision was made following several discussions
between the Accademia team and the Technical Services (TS) department about how to
preserve the original site. First, TS tried to make a copy of the original site using
the Heritrix archiving software, but it was only partially successful, as images, XML files, and
site Cascading Style Sheet (CSS) could not be properly retrieved. Then the decision
was made to place it on the NGA’s intranet as a stopgap until such time as a long-term
archiving solution could be reached. We went back and forth in discussion about releasing
the xml data by means of github or some other such source (with the caveat that the data was now outdated). The
Accademia project team is concerned about and committed to the long-term preservation
of the NGA’s digital projects, which constitute an important part of the institution’s
history and its mission to educate. In the end, we could not find a long-term means
of archiving a wide swath of the website; instead, we made an 8-minute screen cast
of the principal pages (landing page, description of the project and site; search
page with sample searches; and all the other major components of the site: images,
bibliography, team members, funders, and partners. We have also decided to share the
text-encoded data with researchers or organizations who have a legitimate scholarly
interest. (SLIDE 27: screencast)
We have focused in this video on the components of the original site and the centrality
of TEI to the structure and retrievability of its contents, highlighting typical searches
by artist’s name, key term, and date, among others. Together with making available
the encoded data, we hope in this way to document our own early history with regard
to this successful, if regrettably short-lived, engagement with XML mark-up.
Acknowledgements
The author acknowledges support from Benjamin Zweig, Robert H. Smith Postdoctoral
Research Associate; Silvia Tita, Research Associate; and Hannah Segrave, Summer Graduate
Intern on the project described in this paper. He also thanks Veronica Ikeshoji-Orlati,
the incoming Robert H. Smith Postdoctoral Research Associate, for her help with technical
and editorial matters.