Balisage 2026 Program

Monday, August 3, 2026

Monday 9:00 9:15 EDT

Welcome to Balisage 2026

Conference logistics, tips for attendees, and other getting started messages.

Monday 9:15 9:45 EDT

There is Nothing So Practical as a Good Theory (Reflection)

B. Tommie Usdin, Mulberry Technologies

“There is nothing so practical as a Good Theory” has been the tag line for “Balisage: The Markup Conference” since it started in 2008. With this as the guiding principle behind a conference about markup, Balisage has been a mix of case studies, updates on specifications, tutorials, discussions of best (and sometimes worst) practice, conversations about how things should and/or do work. We have explored markup; tools used to create, manipulate and store marked-up content; markup-related information management principles; and formal and mathematical models of markup. Some of our content is solidly grounded in reality and the practices of daily production of documents, some can best be described as imaginative and fantastical. Each of these realms informs the other, often in unpredictable but rewarding ways.

Monday 9:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Designing a Notation Using ixml

Steven Pemberton, CWI, Amsterdam

The ixml language was originally designed to allow un-marked-up textual documents to be treated as if they were XML documents with markup. Although the original intent of ixml was to allow text documents to be treated as XML and then further processed using XML tools, it is possible to work in the other direction: if you have an XML document type, you can use ixml to design a textual representation for it. Take, for example, XForms embedded in an HTML document. Each of the components is well defined, so corresponding text can be developed, with appropriate ixml rules to generate the corresponding output. With a gradual approach to components, even complex documents can be produced.

Monday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Track Changes Support for Automated Regex-Based Text Replacement in ContentXML

Caleb Clauset, Typefi Systems

Preserving the editorial history of a document recording who changed what, when — is a staple requirement in many environments. Under the hood, tracked deletions and insertions require one to slice XML trees and insert anchors. What happens when you need to apply global changes to such a file, across and inside slices and anchors? When the replacements have been applied, the new round of changes need to be integrated into the editorial history, and there must be a fair accounting of whether the changes were made to the original text or to one of the insertions. We will take a look at the track changes architecture of Orion Smart Replace, a plugin for the Typefi publishing platform, and in its three-phase operation — unwrap, replace, rewrap — we will consider the technical challenges involved in managing character offsets and nesting tracked changes.

Monday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

Migrating Ebook Backlists with a Pragmatic XProc-Based Approach

Martin Kraetke, le-tex publishing services GmbH

Since the ratification of the European Accessibility Act (EAA) in 2024, many publishers are sitting on a ticking time bomb: their ebook backlists. There are real issues in the gap between past production standards and the new legal requirements. Can XML transformation help? Not as much as you would think. Even when HTML files of the ebook are available, converting them into XML is not straightforward.

My breakthrough thought was that transforming an EPUB 2.0 directly into EPUB 3.0 would be far easier than attempting to use XML as a transitional format. While there are some real limitations, XProc and the XML framework transpect seem to be good tools to orchestrate the unpacking and repacking of EPUBs and manage the mix of XML and binary files, file references, links, and all the other components involved. The software library repub is designed to address issues in outdated ebook formats. While repub cannot eliminate every barrier, it helps improve compliance and increases the likelihood that an ebook will pass accessibility validation. Over 10,000 ebooks have been converted with repub in the past few months. Success!

Monday 13:30 14:00 EDT

Sponsor Presentation

Sponsor presentation, details to be announced

Monday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

XSpec for XProc, and XProc for XSLT

Amanda Galtman

You set out to build a software pipeline to do important, complex tasks. You want your pipeline to be capable and sturdy over the long haul, and you don't want to chase bugs constantly or have them chase you. In cases like this, proactive testing is not a luxury but a requirement. XML's premier pipeline language, XProc, has finally converged with XML's premier testing language, XSpec. We will see how to write test suites for XProc using the XSpec vocabulary. As the pipeline steps are checked, we'll also peek into the way processors behave and discover how to leverage their features.

Monday 14:45 15:15 EDT (+ Q&A 15:15 - 15:30)

Synthesis and Sustainability: The Evolution of Markup and Toolsets in the Women Writers Project

Julia Flanders &
Ash Clark, both of Northeastern University

In 1988, the Women Writers Project (WWP) started working with markup systems. In 1999, the WWP began working with XML publication tools. In both cases, markup and tools, the WWP has treated the work as active research: not only seeking to build a stable, sustainable working system, but also keeping pace with new developments and exploring their implications for research on early women’s writing. The intertwined history and co-evolution of these two sets of practices within the WWP’s nearly 40 years contributes a valuable perspective on scholarly markup in the humanities. We explore the evolution of the WWP’s markup and publication systems. We consider the reciprocal pressure that tools and markup exert on each other, and the ways in which markup responds to changing tools and technologies.

Monday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Validation-Driven Automation in a Federated XML Ingest Pipeline: The NCBI Bookshelf Case

Kin Ng,
Lisandro Gonzalez &
Stacy Lathrop, all of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

The NCBI Bookshelf of the National Library of Medicine ingests and makes publicly accessible biomedical books and reports from a variety of sources in many formats and of varying quality. Ensuring data integrity across this content has required substantial manual intervention. We have designed and implemented a validation-driven automation framework for Bookshelf that supports scalable conversion, ingestion, processing, and release of content within a federated architecture. The system integrates rule-based validation, workflow orchestration, and identifier-based reconciliation across multiple systems, including content management, XML processing pipelines, PubMed indexing, and Open Access dataset services. To fix dirty and inconsistent data, we use layered validation (applied at multiple stages in the lifecycle), including Schematron and business-rule enforcement. The pipeline supports multiple conversion and ingestion pathways, including direct XML submission, PDF-to-XML conversion via tagging vendors, and Word-based authoring workflows. These workflows converge on a common processing model governed by validation rules and state transitions tracked in an external workflow system. Validation serves not only as a quality assurance mechanism but as a central organizing principle for workflow automation.

Monday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

We Just Can't Have Nice Things (Reflection)

Alex Miłowski

When given a new toy, we may play with it too hard or in a way not intended, and it just breaks, and is irreplaceable. But sometimes, breaking things leads to insights and innovations. We're tempted to get rid of the things that break often because they are “bad” or “poorly designed.” Often, they're just nice things we've failed to curate into better things. We have had so many nice new things . . . HTML, XML, SVG, MathML, XSL-FO . . . that have been broken somewhere, some time. What technologies will sustain value for society? We just can't just have nice things — we have to make them.

Monday 17:15

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday, August 4, 2026

Tuesday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

The XSLT/XPath/XQuery 4.0 Standards: a High-Level Perspective (Reflection)

Michael Kay, Saxonica

Versions 4.0 of XSLT, XPath, and XQuery are well advanced, and they promise clear advantages for developers. What is the big picture? Not simply new concepts such as JNodes and records, and dozens of new functions — although these are significant — but the place of 4.0 in a long journey that began with 1.0. Even that journey needs to be seen in a wider view, against the backdrop of what standards are, who backs them, who changes them, who implements them, who uses them, and to what end. Reflect, not only on the major changes that 4.0 promises to brings to the XML terrain, but on the place of that XML terrain within the shifting plate tectonics of our day.

Tuesday 9:45 10:30 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Tuesday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Transcriptional implicature

C. M. Sperberg-McQueen, Black Mesa Technologies LLC,
Claus Huitfeldt, University of Bergen, &
Yves Marcoux, École de bibliothéconomie et des sciences de l'information, Université de Montréal

Many people transcribe many materials in many ways; universal transcription practice is elusive: for every generalization we find exceptions. Is everything in the exemplar transcribed? Not necessarily. Does everything in the transcript reproduce some word or character in the exemplar? Not necessarily. Many scholarly editions account for transcription variations in an explicit statement of transcription practice. Such statements typically describe deviations from the usual practice, but rarely the ways in which it exemplifies usual practice. By transcriptional implicature for a given community, we mean the things, suggested or entailed by the rules of transcription, that members of that community may find unnecessary to mention explicitly. We propose a way of accounting for common practices while also making sense of variations in practice and sketch a formal approach to transcription in hopes that it will inspire future work.

Tuesday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

It’s Complicated: Holistic Approaches for Considering Complexity in XML Documents

Sarah Connell and Syd Bauman, both of Northeastern University / Library / DSG / WWP

"Complex" : adjective. (1) Composed of two or more parts. (2) Hard to separate, analyze, or solve.

What does "complex" mean, in practice? If you and I agree that one XML file is more complex than another, upon what basis do we agree, and can that be quantified? Are the parts I see as significant the same ones you see as significant? How much weight should be given to hierarchical depth versus the number of siblings versus the type and content of attributes? And how do we even start to fit namespaces into that equation? But mere counting oversimplifies a problem that is rooted in operational complexity, a quality that can be understood only from the perspective of XML users.

Tuesday 13:30 14:00 EDT

Sponsor Presentation

to be announced,

Sponsor presentation; details to be announced

Wednesday 14:00 14:30 (+ Q&A 14:30 - 14:45)

Late-breaking News

Reserved for a late-breaking presentation

Wednesday 14:45 15:30 (+ Q&A 15:30 - 15:45)

Late-breaking News

Reserved for a late-breaking presentation

Tuesday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Graceful Diagramming with SVG plus AVTs

Steven J. DeRose

SVG is a mature, reliable XML vocabulary for vector graphics, but it is very unlike other XML applications because it lacks semantic structure and the ability to bind components into logical units. MVG (Meta Vector Graphics), an extension layered over SVG, treats drawing objects more like XML users expect, while still producing plain SVG that is compatible with existing tools. In addition to giving structure to SVG, MVG offers a repertoire of familiar geometric primitives. MVG's additions should be highly intuitive to users of XML, XSLT, and SVG, because they reuse notions that are well known and provide a declarative, user-oriented way to construct drawings.

Tuesday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

From XML to Holons (Reflection)

Kurt Cagle

Thirty years of working with markup languages, semantic web standards, and knowledge representation systems has revealed a single recurring problem wearing many different masks: the problem of bounded, scoped, composable meaning. I describe my personal and technical journey from the HTML Document Object Model of 1996 through XML, XSLT, XSD, XQuery, RDF, SHACL, and the neural systems of the present day, arriving at the holonic graph as the resolution of a tension that has persisted, largely unacknowledged, throughout the life of the markup community. The holon — Arthur Koestler's term for a unit that is simultaneously a whole and a part of a larger whole — turns out to have been implicit in every major design decision underlying XML and the semantic web. RDF 1.2, SHACL 1.2, and named graphs now provide the formal apparatus to make this reliance on holons explicit. The implications extend from knowledge governance to the grounding of large language models.

Tuesday 17:15

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday, August 5, 2026

Wednesday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

Schemas in EPUB and OOXML: Sanity Checking versus Documentation

Makoto MURATA, Higashi Nippon International University & Information Accessibility Institute, LLC

XML schemas perform two significantly different functions: sanity checking and documentation. By comparing the roles of the schema in EPUB and Office Open XML (OOXML) we show how these differ by ecosystem. In the EPUB ecosystem, schemas serve primarily as validation artifacts to ensure content quality throughout the distribution pipeline. In contrast, the OOXML ecosystem lacks mandatory validation during distribution. However, a validation experiment suggests that third-party office suites produce nearly valid XML once Markup Compatibility Elements (MCE) are preprocessed. This demonstrates that OOXML schemas function successfully as documentation artifacts, guiding developers in software quality assurance to achieve interoperability.

Wednesday 19:45 10:15 EDT (+ Q&A 11:15 - 10:30)

Distributed Text Services 1.0: An API Standard for Publishing and Extending TEI Document Collections

Thibault Clérice, Institut national de recherche en sciences & technologies du numérique
Hugh Cayless, Duke University
Jonathan Robie, &
Ian Scott, Knowledge Commons, Michigan State University

Biblical studies and linguistics have produced an abundance of open, freely licensed data, but there is a bottleneck in the human capacity to use that data. Scripture Pipelines is an open-source declarative pipeline system for LLM-assisted workflows, built to scale access to these materials without compromising rigor, reproducibility, or accountability. Each workflow is a YAML pipeline composed of a sequence of named steps, each with explicit inputs, outputs, and prompt contracts, all persistent and auditable. This paper describes the design decisions behind Scripture Pipelines and reports on its application in several Nida Institute projects which produce biblical reference materials. We examine how Scripture Pipelines chooses the best markup for each purpose, how shared identifiers make format translation mechanical, and how its iteration mechanisms support scholarly work.

Wednesday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Beyond valid and invalid: implementing complex validation in XProc

Andrew Sales

Bloomsbury Publishing Group plc began as a trade publisher nearly forty years ago, now with an academic division. Our digital products include Bloomsbury Collections (c. 50,000 academic monographs), Drama Online (home of the Arden Shakespeare) and the Churchill Archive (the complete digitized papers of Sir Winston Churchill, 800,000+). We publish around 3,500 titles per year on the academic list.

Our team’s main responsibility is to “spec and check”. We specify documentation requirements, maintain content conversion documentation, and support validation assets. This documentation is used by our supplier base to prepare the XML used to publish online content and to apply quality assurance prior to publication. This paper describes the (re-)implementation of a piece of enterprise middleware that acts as quality gatekeeper for our XML-first publishing workflow. We describe how we refactor equivalent legacy software as a single XProc pipeline with MorganaXProc-III; how we test this pipeline; and how we use the postprocessing of the validation reports it produces to both refine and redefine our validity criteria, based on the “size and shape” of the documents under scrutiny and the extent to which they violate our business rules.

Wednesday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

Abstraction, Paradox, and the End of History

Allen H. Renear, School of Information Sciences, University of Illinois Urbana-Champaign

In computing and information science, abstraction plays a leading role in how we tell the story of the evolution of programming languages, software engineering, data management systems, and document text encoding. For example, we can represent data as columns and tables in a relational database, “abstracting away” physical storage concerns; or we can mark up a document according to its semantics rather than how it will be processed. With ontologies and conceptual models at yet higher levels of abstraction, it may seem that we have now reached the zenith of abstraction and are ready to enjoy the benefits of a perfect match between human understanding and our digital information systems. However, scientific and philosophic traditions have shown that greater abstraction leads to contradiction, ambiguity, and resulting paradoxes. In an automated digital environment, a failure in abstract reasoning can occur without human oversight, with serious consequences.

Wednesday 13:30 14:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Wednesday 14:00 14:30 (+ Q&A 14:30 - 14:45)

Late-breaking News

Reserved for a late-breaking presentation

Wednesday 14:45 15:15 (+ Q&A 15:15 - 15:30)

Late-breaking News

Reserved for a late-breaking presentation

Wednesday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Generators – Deferred Evaluation in XPath 4

Dimitre Novatchev

When processing sequences of a few items, memory and performance are negligible. But when we need to engage in real world data such as news feeds, astronomical databases, or the Fibonacci sequence, we face sequences of items that are humongous or infinite (potentially or actually). Conventional approaches do not work. Generators are designed to address this problem, by allowing developers to get only the data that is needed, without attempting to load into memory the entire sequence. Through a real-world use case of news feed aggregation, we will learn how an XPath-first implementation of generators promises to revolutionize our handling of big data.

Wednesday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

My Whole Career Was a Lie? Or: How I Learned to Stop Worrying and Love WordPress (Reflection)

Jeffrey Beck

We in the declarative markup community have long asserted that declarative markup plays a foundational role in modern document processing by separating content from process and presentation. Declarative markup makes content maintainable, reusable, and adaptable across platforms and technologies. But there is another world out there that has never heard of declarative markup, much less seen a need for it. Building an online archive in that environment does not depend on any of the tools we've grown used to in our world. What we've learned about organization, curation, and other good document-management practices can usefully be carried over into that strange new world.

Wednesday 17:15

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday, August 6, 2026

Thursday 8:00 9:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

XML, MCP, and Language Models: A Separation of Concerns

Elisa E. Beshero-Bondar, &
Michael Roy Simons, both of Penn State Erie, The Behrend College

This paper reflects on the “DigitAI” project’s experiments to integrate Small Language Models (SLMs) with XML technologies in academic digital humanities work. The project is motivated by two goals: making “explainable AI” systems that help us understand how language models process, retrieve, and generate text; and making local, customized AI systems as an alternative to economically and environmentally costly Large Language Models. We first attempted to provide an SLM with a Retrieval Augmented Generation (RAG) system built from the TEI P5 Guidelines. This approach proved both bloated and disappointing. We came to realize that XML should be kept as XML, held apart from the language model’s internal machinery, and made accessible instead through a Model Context Protocol (MCP) server that allows the SLM to query the TEI document tree directly using XPath and related technologies. We propose a principled, declarative approach to AI-assisted XML work: a “separation of concerns” between the generative model and the structured data it consults.

Thursday 09:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Don't Touch My SGML — Nothing New Under the Sun

Ari Nordström, Creative Words

SGML may be old and cranky, but there are many users whose libraries of SGML documents are too vast to be given up and whose use of SGML may be mandated. However, SGML tools are rare, especially when compared to XML tools. So what can a holder of SGML documents do? One solution is to convert documents to XML for processing and then convert them back to SGML. A pipeline that starts with James Clark’s SGML tools and passes through multiple XSLT transformations seems to be the best approach and has the advantage of being reversible at the end. The result is that the users get to use modern processing tools while maintaining their valuable libraries.

Thursday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Extending the TEI Processing Model to Describe Digital Edition Interactions

Peter Boot, Huygens Institute for the History and Culture of the Netherlands

The TEI Processing Model (TEI PM) provides a declarative specification for publishing digital editions, describing a number of behaviours that can be attached to TEI elements. Projects can describe desired rendering of their encoding using these behaviours, which are almost all (currently) static — for example, general textual (structural) behaviours such as paragraph, block, and figure. TEI PM provides no way for a project to describe how the user can interact with a generated site. Digital editions, however, are by nature interactive tools. I discuss a number of issues that arise when defining interactive behaviours and propose a number of new behaviours (experimental, but implemented). Most of the new behaviours are inspired by the interactivity currently available in TEI Publisher, to date the only workable implementation of the TEI PM.

Thursday 11:30 12:00 (+ Q&A 12:00 - 12:15)

Late-breaking News

Reserved for a late-breaking presentation

Thursday 13:30 14:45 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Thursday 14:45 15:15 (+ Q&A 15:15 - 15:30)

Late-breaking News

Reserved for a late-breaking presentation

Thursday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Scholia 2026: DIY study aids with TEI, XProc, browser and printer

Wendell Piez

Using electronic tools to learn an ancient language might seem obvious. Surprisingly, the best approach may be to build simple tools that imitate classical approaches to language learning. With the availability of online resources (such as digitized ancient texts), we can build tools like graded readers that enable learning by encouraging the student to create new translations and personal annotations rather than providing existing translations and annotations. The author has built a number of tools. One of these, the Laminator, is an XProc-based, iXML-enabled processing stack. It manages and merges a markup notation designed to represent a text, not as a structure of (XML) elements, but as a set of (not-XML) ranges, including overlapping ranges. More tools are evolving as the author extends his studies.

Thursday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

The Zettelkasten: Topic Maps, Back to the Future

Thomas B. Passin

The Zettelkasten (“card case”), introduced by the German social scientist Niklas Luhmann in 1981, is an external memory system composed of notes linked through explicit cross-references. A mesh of cross-references allows larger conceptual structures to emerge as the collection grows. Luhmann's Zettelkasten was entirely paper-based. Few efforts to computerize the model seem to have achieved or even understood the levels of engagement and serendipity that Luhmann claimed. I present a framework for understanding how such a system can achieve Luhmann-level support and speculate about why and how so many implementations have fallen short. I synthesize aspects of information theory, cognitive science, human interface, movie production practice, data modeling, and library science to illuminate key concepts underlying highly effective thinking support. Remember Topic Maps and back-of-the-book indexes?

Thursday 17:15

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday, August 7, 2026

Friday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

On beyond Invisible XML: parsing with implicit (and explicit) next-generation markup

Ronald Haentjens Dekker, DHLab and Huygens Institute, Royal Netherlands Academy of Sciences &
David J. Birnbaum, University of Pittsburgh

We explore what Invisible-XML-like (ixml-like) processing might look like when applied to what we refer to as “next-generation features” of markup: structural properties of textual documents that XML was not designed to represent, such as overlap, discontinuity, and others. This ixml-like approach is intended to interpret implicit, untagged markup and express it with explicit markup. These features may be implicit within a document, or explicitly tagged using markup languages such as LMNL, TexMECS, and TagML. We identify two approaches to ixml-like processing of next-generation features using mildly context-sensitive grammars to overcome the limitations of context-free grammars.

Friday 09:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Music Notation Markup Languages with a MusicXML Case Study

Joshua Lubell

How can a retired markup geek who sings bass in a choir create practice tracks from electronic sources? Music notation software enables musicians and scholars to edit and analyze musical scores, convert them into audio formats, and share them in searchable databases. Music notation markup languages attempt to encode the multitudinous information present in a piece of sheet music. Efforts to standardize music in descriptive markup started with the SGML-based SMDL (Standard Music Description Language), which was incorporated in HyTime (Hypermedia Time-based Document Structuring Language). Modern XML offerings include MusicXML and MEI (Music Encoding Initiative). ABC and ChordPro are popular text-based music annotation approaches. Do any of these meet my need? Let's find out!

Friday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

A Centralised Index of the Sisterhood of Markup Events

Sheila E. Thomson

When I have an idea for a paper, I like to check if someone's already presented it or something similar. If it's not a new topic, then maybe I can build on what's gone before - but I need to know what that was. There are also occasions when the research is not driven by a prospective project but simply curiosity or to support learning. Each time I go through this process, I think to myself "Wouldn't it be nice if there was a centralised index of all the markup papers?" This paper is a case study on the creation of such an index, in particular its scope, processes, challenges, and solutions.

Friday 11:30 12:00 EDT (+ Q&A 11:15 - 11:30)

Scripture Pipelines: Declarative AI Pipelines for Biblical and Linguistic Scholarship

Jonathan Robie, Biblica, Nida Institute

Biblical studies and linguistics have produced an abundance of open, freely licensed data, but there is a bottleneck in the human capacity to use that data. Scripture Pipelines is an open-source declarative pipeline system for LLM-assisted workflows, built to scale access to these materials without compromising rigor, reproducibility, or accountability. Each workflow is a YAML pipeline composed of a sequence of named steps, each with explicit inputs, outputs, and prompt contracts, all persistent and auditable. This paper describes the design decisions behind Scripture Pipelines and reports on its application in several Nida Institute projects which produce biblical reference materials. We examine how Scripture Pipelines chooses the best markup for each purpose, how shared identifiers make format translation mechanical, and how its iteration mechanisms support scholarly work.

Friday 13:30 14:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Friday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Making Hierarchy out of Nothing at All: Federal Register Data Modernization

Betty Harvey, Electronic Commerce Connection, Inc.

The raw material for the Federal Register, published daily, comes to GPO from agencies as Microsoft Word files that often include legacy SGML tags as text. The challenge is turning this stream of paragraphs into a deep tree. Getting these into the United States Legislative Markup (USLM) XML format that GPO uses is a complex process, broken into many small incremental steps that are managed individually. Because Word inserts many redundant codes, a large part of the conversion process is filtering these out and merging similar formatting before developing the first levels of hierarchy. Recognizing section headings often depends on multiple rules for recognizing patterns of both formatting and text (such as numbering styles) in the content. Word tables are converted to HTML tables; these are currently converted to an old GPO table model but in the future will be CALS tables. Because Word styles are inconsistently used, interpretation is necessary. In the future, perhaps we will be able to capture Word styles in the initial conversion to XML. Many tagging decisions now depend on subject-matter experts. The service is a work in progress.

Friday 14:45 15:15 (+ Q&A 15:15 - 15:30)

Late-breaking News

Reserved for a late-breaking presentation

Friday 15:45 16:00 EDT

Closing Administrivia

Debbie Lapeyre, Mulberry Technologies

Announcements, thanks, and other administrative tasks are the necessary plumbing associated with events such as Balisage. In this session we will try to keep the administrivia as short as possible while also asking, telling, thanking, and recognizing as appropriate.

Friday 16:00 16:30 EDT

Conference Closing: Invitations to Future Markup-Related Events

Representatives of a variety of markup-related events.

There are several markup-related conferences and similar events scheduled throughout the year and across the planet. Representatives of those events have been invited to describe the focus, expected attendees, and unique characteristics of those events.

Friday 16:30

It's Been Fun!
post-conference reminiscing

Balisage is over, not just for this year but forever. This is the time to share favorite memories of Balisage through the years. Was there a presentation you keep thinking about? Did a “Balisage Bard” entry make you laugh so hard you remember it? Did you meet a friend, a mentor, or a colleague at Balisage? Was there something memorable about one of the Balisage venues? Come reminisce on what has, on the whole, been a good 30 years.