CulturalHeritage symposium logo

Cultural Heritage Markup:
Using Markup to preserve, understand, and disseminate cultural heritage materials
a Balisage pre-conference symposium

Monday, August 10, 2015

Monday 7:30am - 9:00am

Registration & Continental Breakfast

Pick up your conference badge outside the White Flint Amphitheater and join us for a light breakfast.

Monday 9:00am - 09:10am

Introductions, Greetings, and Announcements

Monday 9:10am - 09:30am

Introduction to Cultural Heritage Markup

Hugh Cayless, Duke University

Cultural heritage materials are remarkable for their complexity and heterogenity. This often means that when you've solved one problem, you've solved one problem. Arrayed against this difficulty, we have a nice big pile of tools and technologies with an alphabet soup of names like XML, TEI, RDF, OAIS, SIP, DIP, XIP, AIP, and BIBFRAME, coupled with a variety of programming languages or storage and publishing systems. All of our papers today address in some way the question of how you deal with messy, complex, human data using the available toolsets and how those toolsets have to be adapted to cope with our data. How do you avoid having your solution dictated by the tools available? How do you know when you're doing it right? Our speakers are all trying, in various ways, to reconfigure their tools or push past those tools' limitations, and they are going to tell us how they're doing it.

Monday 9:30am - 10:15am

Integrating Digital Epigraphies (or, if you think the 19th century was bad, try living in the 20th)

Josh Sosin, Duke University

IDEs aims to provide core infrastructure for the field of Greek epigraphy (the study of texts carved on stone), by supporting annotation across an array of disparate digital resources. Epigraphy was born in the early to mid 19th century and has been productive ever since. Perhaps a million Latin and Greek inscriptions are known today. These objects are often badly preserved, physically removed from their original context or even lost; many are repeatedly re-published, emended, joined to other fragments, re-dated, re-provenanced, and not only do they lack a single unambiguous identification system, but many thousands are known by multiple, competing and badly controlled bibliographic shorthands. They are unstable in many senses. Print publication of inscriptions in the late 19th century and throughout the 20th is marked by a considerable and fulsome descriptive rigor. In the generation straddling the 20th and 21st centuries, scholars developed a rich variety digital epigraphy tools. But in all cases these were descendants of previous print resources and entailed significant suppression of the semantic richness that was the (albeit loosely controlled) norm in print publication. In a way, then, much of our effort is devoted to creating a framework for allowing users to re-infuse a suite of late 20th-century tools with the 19th-century scholarly sensibility (and even the very data!) that long informed print epigraphy.

Monday 10:15am - 10:45am

Coffee Break

Monday 10:45am - 11:30am

Specifying a TEI-XML based format for aligning text to image at character level

Alexei Lavrentiev, CNRS & Université de Lyon / ICAR Research Lab; Yann Leydier, Université de Lyon, INSA-Lyon & CNRS / LIRIS Research Lab; & Dominique Stutzmann, CNRS / Institut de Recherche et d'Histoire des Textes [Paper] [EPUB]

When working with a transcription of a manuscript, a reader may wish to navigate from the transcription to the appropriate location in the image of the page it transcribes, or from an image location to its transcription. Such interfaces require that the text and image be aligned at word or even character level. The Oriflamms Research Project has developed a TEI-based XML format for recording character-level text/image alignment information, suitable for use by automated alignment software. The input transcription must mark page, column, and line boundaries; image recognition software analyses the page image into columns and lines and then aligns the individual characters in the transcription with the marks in the image. The format addresses complications produced by abbreviations, transpositions in the manuscript, marginalia, and annotations in the transcript. The format was developed in the context of work on the ontology of letter-forms in medieval manuscripts; the result will be useful in many kinds of fine-grained image/text alignment.

Monday 11:30am - 12:15pm

Three ways to enhance the interoperability of cross-references in TEI XML

Joel Kalvesmaki, Dumbarton Oaks [Paper] [EPUB]

Systems are 'interoperable' if each can work with products of the other with minimal external intervention. Semantic interoperability (exchange of underlying meaning not just syntax) is the goal. Currently supported TEI cross-reference mechanisms are typically not interoperable without extensive human intervention. I offer three practical ways to make standard TEI cross-references more semantically interoperable. The first is the deployment of Canonical Text Services URNs. The second is informal agreements among communities to adopt shared Schematron rules. Both of these methods can be implemented right now; the barriers are practical not technical. The third method is stand-off markup based on the Text Alignment Network, a planned TEI-friendly XML format for the interchange of aligned texts.

Monday 12:15pm - 1:15pm

Lunch

Monday 1:15pm - 2:00pm

Data transforms, patterns and profiles for 21st century Cultural Heritage

Uche Ogbuji & Mark Baker, Zepheira [Paper] [EPUB]

In the early part of the twenty-first century, it's a near certainty that your local library will provide web access, free wifi, or both. Go! Search the web therein for information about books or music or film. You will be unsurprised by the profound lack of libraries amongst the search results despite the fact that interesting, useful, perhaps even important reference materials related to your search are freely available a moments walk from where you sit. Libraries were early adopters of computer technlogies so it is supremely ironic that most core library data has been left behind as we moved into the information age.

Libhub and BIBFRAME attempt to address this problem. They strive to make it simpler and easier to convert from legacy library information formats such as MARC/XML to widely adopted web formats such as HTML and Linked Data. Providing libraries with a fast, flexible way to make their catalogs more idiomatically "on the web" may help restore them to the place at the center of information in human affairs that they enjoyed for centuries.

Monday 2:00pm - 2:45pm

Introducing the UK National Archives digital records metadata vocabulary

Robert Walpole [Paper] [EPUB]

To support the transition from records on parchment and paper to records that are born digital, the UK National Archives has developed a comprehensive architecture that includes physical storage, mechanisms for receiving and delivering documents, and a system of metadata to govern all parts of the operation. The National Archives are now receiving both documents from digitization processes and records from computerized processes; these are being stored in a dark archive with live disk caching of files. Locating records in a petabyte-scale system must depend on effective cataloging that complies with the requirements of the Open Archival Information System model.

Recognizing that its archives must serve the long term, the National Archives has chosen to implement its metadata architecture in XML. Simply using Dublin Core is not sufficient, and there is more than one way of representing RDF information in XML. The Archives have built their own ontology for a triple store and a particular schema for their RDF/XML, with the hope of creating a SPARQL interface to make the Archives' content available to the public. The metadata itself is human created, in hopes that it will be truly useful to future readers and aid in keeping alive the information in storage for its owners, the people of the UK.

Monday 2:45pm - 3:15pm

Coffee Break

Monday 3:15pm -

Short Talks

In this session, short presentations will be followed by discussion.

  • To those who startle at innovation...

    Laura Randall, NCBI/NLM/NIH [Paper] [EPUB] [Slides and materials]

    A continuous publication model is a feature of modern electronic journal publishing, but it is not a new idea. How do we integrate a 200-year-old continuous print publication model into a modern digital journal publishing system?

  • EAGLE and EUROPEANA: architecture problems for aggregation and harmonization

    Pietro Maria Liuzzo, Ruprecht-Karls-Universität Heidelberg [Paper] [EPUB]

    What is a Cultural Heritage Object? And how can one harmonize different models from different projects when aggregating their data? How can we provide enough structure to help users navigate this complexity?

  • Encoding western and non-western names for Ancient Syriac authors

    Nathan P. Gibson, Vanderbilt Divinity School; Winona Salesky, Independent Consultant for Syriaca.org; David A. Michelson, Vanderbilt Divinity School [Paper] [EPUB]

    For the Syriaca.org project, we marked up author names in two different scripts of Syriac with different vowel sets, as well as in English, Arabic, and sometimes French, German, or Latin. We had to figure out how to represent name parts, tag them for language, and visualize them when some read right-to-left and some left-to-right.

  • Duplicitous Diabolos: Parallel witness encoding in quantitative studies of Coptic manuscripts

    Amir Zeldes, Georgetown University [Paper] [EPUB]

    When encoding parallel witnesses in a corpus, how can we manage duplication so that users searching the corpus aren't presented with inflated results? How do we choose what to mark as primary and what as redundant? The devil is in the details.

  • Encoding document and text in the Shelley-Godwin Archive

    Raffaele Viglianti, MITH - University of Maryland [Paper] [EPUB]

    The Shelley-Godwin Archive prioritizes the documentary hierarchy over the textual one, using standoff markup to encode the latter. Can the creation of this kind of standoff encoding be made simpler, and how do we manage change to the source texts when our markup is outside them?

  • Divide and conquer: can we handle complex markup simply?

    Robin La Fontaine, DeltaXML [Paper] [EPUB]

    Recent advances in comparison and merge tools for XML suggest new approaches to handling multiple, even overlapping hierarchies such as often surface in cultural heritage markup. Would this be useful?