Balisage 2026 Program

Monday, August 3, 2026

Monday 9:00 9:15 EDT

Welcome to Balisage 2026

Conference logistics, tips for attendees, and other getting started messages.

Monday 9:15 9:45 EDT

There is Nothing So Practical as a Good Theory (Reflection)

B. Tommie Usdin, Mulberry Technologies

“There is nothing so practical as a Good Theory” has been the tag line for “Balisage: The Markup Conference” since it started in 2008. With this as the guiding principle behind a conference about markup, Balisage has been a mix of case studies, updates on specifications, tutorials, discussions of best (and sometimes worst) practice, conversations about how things should and/or do work. We have explored markup; tools used to create, manipulate and store marked-up content; markup-related information management principles; and formal and mathematical models of markup. Some of our content is solidly grounded in reality and the practices of daily production of documents, some can best be described as imaginative and fantastical. Each of these realms informs the other, often in unpredictable but rewarding ways.

Monday 9:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Designing a Notation Using ixml

Steven Pemberton, CWI, Amsterdam

The ixml language was originally designed to allow un-marked-up textual documents to be treated as if they were XML documents with markup. Although the original intent of ixml was to allow text documents to be treated as XML and then further processed using XML tools, it is possible to work in the other direction: if you have an XML document type, you can use ixml to design a textual representation for it. Take, for example, XForms embedded in an HTML document. Each of the components is well defined, so corresponding text can be developed, with appropriate ixml rules to generate the corresponding output. With a gradual approach to components, even complex documents can be produced.

Monday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Track Changes Support for Automated Regex-Based Text Replacement in ContentXML

Gayanthika Udeshani and
Caleb Clauset, both of Typefi Systems

Preserving the editorial history of a document recording who changed what, when — is a staple requirement in many environments. Under the hood, tracked deletions and insertions require one to slice XML trees and insert anchors. What happens when you need to apply global changes to such a file, across and inside slices and anchors? When the replacements have been applied, the new round of changes need to be integrated into the editorial history, and there must be a fair accounting of whether the changes were made to the original text or to one of the insertions. We will take a look at the track changes architecture of Orion Smart Replace, a plugin for the Typefi publishing platform, and in its three-phase operation — unwrap, replace, rewrap — we will consider the technical challenges involved in managing character offsets and nesting tracked changes.

Monday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

E-Book Backlists in, Accessibility out?

Martin Kraetke, le-tex publishing services GmbH

Since the ratification of the European Accessibility Act (EAA) in 2024, many publishers are sitting on a ticking time bomb: their ebook backlists. There are real issues in the gap between past production standards and the new legal requirements. Can XML transformation help? Not as much as you would think. Even when HTML files of the ebook are available, converting them into XML is not straightforward.

My breakthrough thought was that transforming an EPUB 2.0 directly into EPUB 3.0 would be far easier than attempting to use XML as a transitional format. While there are some real limitations, XProc and the XML framework transpect seem to be good tools to orchestrate the unpacking and repacking of EPUBs and manage the mix of XML and binary files, file references, links, and all the other components involved. The software library repub is designed to address issues in outdated ebook formats. While repub cannot eliminate every barrier, it helps improve compliance and increases the likelihood that an ebook will pass accessibility validation. Over 10,000 ebooks have been converted with repub in the past few months. Success!

Monday 13:30 14:00 EDT

Taming Antenna House Formatter ‘CSS Debug XML’ (Sponsor Presentation)

Tony Graham, Antenna House

When you use CSS styles with Antenna House Formatter, its ‘CSS Debug XML’ output option makes explicit the value of every CSS property on every element, which CSS rule specified the property, why that rule had precedence, and which line in which file the rule is on. For better or worse, that’s a lot of information, especially for a large document that has many, large CSS stylesheets. The CSS Debug XML can be more that 50x the size of the source HTML or XML.

This talk covers the variety of ways that XSLT and SaxonJS are used to generate reports suitable to different sizes of documents that tame the CSS Debug XML and make its information understandable.

Monday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

XSpec for XProc, and XProc for XSLT

Amanda Galtman

You set out to build a software pipeline to do important, complex tasks. You want your pipeline to be capable and sturdy over the long haul, and you don't want to chase bugs constantly or have them chase you. In cases like this, proactive testing is not a luxury but a requirement. XML's premier pipeline language, XProc, has finally converged with XML's premier testing language, XSpec. We will see how to write test suites for XProc using the XSpec vocabulary. As the pipeline steps are checked, we'll also peek into the way processors behave and discover how to leverage their features.

Monday 14:45 15:15 EDT (+ Q&A 15:15 - 15:30)

Synthesis and Sustainability: The Evolution of Markup and Toolsets in the Women Writers Project

Julia Flanders &
Ash Clark, both of Northeastern University

In 1988, the Women Writers Project (WWP) started working with markup systems. In 1999, the WWP began working with XML publication tools. In both cases, markup and tools, the WWP has treated the work as active research: not only seeking to build a stable, sustainable working system, but also keeping pace with new developments and exploring their implications for research on early women’s writing. The intertwined history and co-evolution of these two sets of practices within the WWP’s nearly 40 years contributes a valuable perspective on scholarly markup in the humanities. We explore the evolution of the WWP’s markup and publication systems. We consider the reciprocal pressure that tools and markup exert on each other, and the ways in which markup responds to changing tools and technologies.

Monday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Validation-Driven Automation in a Federated XML Ingest Pipeline: The NCBI Bookshelf Case

Kin Ng,
Lisandro Gonzalez &
Stacy Lathrop, all of the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

The NCBI Bookshelf of the National Library of Medicine ingests and makes publicly accessible biomedical books and reports from a variety of sources in many formats and of varying quality. Ensuring data integrity across this content has required substantial manual intervention. We have designed and implemented a validation-driven automation framework for Bookshelf that supports scalable conversion, ingestion, processing, and release of content within a federated architecture. The system integrates rule-based validation, workflow orchestration, and identifier-based reconciliation across multiple systems, including content management, XML processing pipelines, PubMed indexing, and Open Access dataset services. To fix dirty and inconsistent data, we use layered validation (applied at multiple stages in the lifecycle), including Schematron and business-rule enforcement. The pipeline supports multiple conversion and ingestion pathways, including direct XML submission, PDF-to-XML conversion via tagging vendors, and Word-based authoring workflows. These workflows converge on a common processing model governed by validation rules and state transitions tracked in an external workflow system. Validation serves not only as a quality assurance mechanism but as a central organizing principle for workflow automation.

Monday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

Unlocking an Archive of Scholarly Journal Articles with Interactive XQuery (LB)

Vincent M. Lizzi, Taylor & Francis

A BaseX interactive XQuery environment is used to access and query a content archive of more than 5.5 million scholarly journal articles stored in a variety of XML formats. The BASEX database records metadata for each document (dual metadata if needed for different use cases), supporting fast index-based retrieval and synchronization with the content repository. The database offers two complementary interactive user interfaces: web-based and Jupyter Notebook. The web-based interface provides everything from simple document retrieval (by identifier such as DOI or filename) through complex queries across the corpus, including the ability to extract specific portions of selected documents. Available web features include predefined reports, on-screen help, and status monitoring, while also enforcing security restrictions on user-provided XQuery. The Jupyter Notebook interface uses a custom Jupyter Kernel which parallels the functionality of the web interface and enables iterative workflows for data exploration that can provide text, code, and output integrated in a single document. I describe successful design decisions and lessons learned that may benefit others implementing similar XML database solutions.

Monday 17:15 …

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday, August 4, 2026

Tuesday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

The XSLT/XPath/XQuery 4.0 Standards: a High-Level Perspective (Reflection)

Michael Kay, Saxonica

Versions 4.0 of XSLT, XPath, and XQuery are well advanced, and they promise clear advantages for developers. What is the big picture? Not simply new concepts such as JNodes and records, and dozens of new functions — although these are significant — but the place of 4.0 in a long journey that began with 1.0. Even that journey needs to be seen in a wider view, against the backdrop of what standards are, who backs them, who changes them, who implements them, who uses them, and to what end. Reflect, not only on the major changes that 4.0 promises to brings to the XML terrain, but on the place of that XML terrain within the shifting plate tectonics of our day.

Tuesday 9:45 10:30 EDT

Open Mic: Anything Goes

Conference Participants

Short presentations on a variety of topics.

Tuesday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Transcriptional implicature

C. M. Sperberg-McQueen, Black Mesa Technologies LLC,
Claus Huitfeldt, University of Bergen, &
Yves Marcoux, École de bibliothéconomie et des sciences de l'information, Université de Montréal

Many people transcribe many materials in many ways; universal transcription practice is elusive: for every generalization we find exceptions. Is everything in the exemplar transcribed? Not necessarily. Does everything in the transcript reproduce some word or character in the exemplar? Not necessarily. Many scholarly editions account for transcription variations in an explicit statement of transcription practice. Such statements typically describe deviations from the usual practice, but rarely the ways in which it exemplifies usual practice. By transcriptional implicature for a given community, we mean the things, suggested or entailed by the rules of transcription, that members of that community may find unnecessary to mention explicitly. We propose a way of accounting for common practices while also making sense of variations in practice and sketch a formal approach to transcription in hopes that it will inspire future work.

Tuesday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

On beyond Invisible XML: parsing with implicit (and explicit) next-generation markup

Ronald Haentjens Dekker, DHLab and Huygens Institute, Royal Netherlands Academy of Sciences &
David J. Birnbaum, University of Pittsburgh &
Bram Buitendijk, KNAW Humanities Cluster

We explore what Invisible-XML-like (ixml-like) processing might look like when applied to what we refer to as “next-generation features” of markup: structural properties of textual documents that XML was not designed to represent, such as overlap, discontinuity, and others. This ixml-like approach is intended to interpret implicit, untagged markup and express it with explicit markup. These features may be implicit within a document, or explicitly tagged using markup languages such as LMNL, TexMECS, and TagML. We identify two approaches to ixml-like processing of next-generation features using mildly context-sensitive grammars to overcome the limitations of context-free grammars.

Tuesday 13:30 14:00 EDT

Semantic XML for the Age of Document AI (Sponsor Presentation)

Jean Paoli, Docugami

DGML — Document Graph Markup Language — is a semantic XML representation of business documents. Docugami, with Inveniam, is opening the format itself to the industry as a potential standard, together with a reference open-source implementation. Where raw extraction gives you fields, and structural markup gives you shape, DGML gives you meaning: tags that describe what each element is in its domain — a liability cap, an effective date, a payment obligation — not how it appeared on the page. DGML's headline property is cross-document tag consistency: a stable vocabulary across every document of a given type, making a corpus queryable without per-document prompt engineering. A four-layer architecture — semantic tagging, spatial grounding via pixel bounding boxes, cryptographic fragment-level attestation, and browser-native readability — makes DGML suitable for enterprise document AI: economical enough for an agent to process, and provable enough for its answer to be trusted. This paper introduces the format, its schema mechanism, and its packaging.

Tuesday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Achieving Near-Native Performance in XSLT, XQuery, and XPath Using Host-Language User Extension Functions (LB)

O'Neil Delpratt, Saxonica

When might you reach for an extension function written in C++ over its equivalent XPath function? This paper investigates the performance and practical benefits of using host-language extension functions where necessary as an alternative to implementing equivalent logic directly in XSLT, XQuery, or XPath. As a case study, we implement and benchmark equivalent computational workloads using extension functions written in Java, C++, Python. We examine the trade-offs between declarative purity, execution performance, and integration flexibility, including the overhead of crossing language boundaries and the circumstances under which host-language implementations can outperform equivalent XML-language expressions.

Tuesday 14:45 15:15 EDT (+ Q&A 15:15 - 15:30)

It’s Complicated: Holistic Approaches for Considering Complexity in XML Documents

Sarah Connell and Syd Bauman, both of Northeastern University / Library / DSG / WWP

"Complex" : adjective. (1) Composed of two or more parts. (2) Hard to separate, analyze, or solve.

What does "complex" mean, in practice? If you and I agree that one XML file is more complex than another, upon what basis do we agree, and can that be quantified? Are the parts I see as significant the same ones you see as significant? How much weight should be given to hierarchical depth versus the number of siblings versus the type and content of attributes? And how do we even start to fit namespaces into that equation? But mere counting oversimplifies a problem that is rooted in operational complexity, a quality that can be understood only from the perspective of XML users.

Tuesday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Graceful Diagramming with SVG plus AVTs

Steven J. DeRose

SVG is a mature, reliable XML vocabulary for vector graphics, but it is very unlike other XML applications because it lacks semantic structure and the ability to bind components into logical units. MVG (Meta Vector Graphics), an extension layered over SVG, treats drawing objects more like XML users expect, while still producing plain SVG that is compatible with existing tools. In addition to giving structure to SVG, MVG offers a repertoire of familiar geometric primitives. MVG's additions should be highly intuitive to users of XML, XSLT, and SVG, because they reuse notions that are well known and provide a declarative, user-oriented way to construct drawings.

Tuesday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

From XML to Holons (Reflection)

Kurt Cagle

Thirty years of working with markup languages, semantic web standards, and knowledge representation systems has revealed a single recurring problem wearing many different masks: the problem of bounded, scoped, composable meaning. I describe my personal and technical journey from the HTML Document Object Model of 1996 through XML, XSLT, XSD, XQuery, RDF, SHACL, and the neural systems of the present day, arriving at the holonic graph as the resolution of a tension that has persisted, largely unacknowledged, throughout the life of the markup community. The holon — Arthur Koestler's term for a unit that is simultaneously a whole and a part of a larger whole — turns out to have been implicit in every major design decision underlying XML and the semantic web. RDF 1.2, SHACL 1.2, and named graphs now provide the formal apparatus to make this reliance on holons explicit. The implications extend from knowledge governance to the grounding of large language models.

Tuesday 17:15 …

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday, August 5, 2026

Wednesday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

Schemas in EPUB and OOXML: Sanity Checking versus Documentation

Makoto MURATA, Higashi Nippon International University & Information Accessibility Institute, LLC

XML schemas perform two significantly different functions: sanity checking and documentation. By comparing the roles of the schema in EPUB and Office Open XML (OOXML) we show how these differ by ecosystem. In the EPUB ecosystem, schemas serve primarily as validation artifacts to ensure content quality throughout the distribution pipeline. In contrast, the OOXML ecosystem lacks mandatory validation during distribution. However, a validation experiment suggests that third-party office suites produce nearly valid XML once Markup Compatibility Elements (MCE) are preprocessed. This demonstrates that OOXML schemas function successfully as documentation artifacts, guiding developers in software quality assurance to achieve interoperability.

Wednesday 9:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Distributed Text Services 1.0: An API Standard for Publishing and Extending TEI Document Collections

Thibault Clérice, Institut national de recherche en sciences & technologies du numérique
Hugh Cayless, Duke University
Jonathan Robie, &
Ian Scott, Knowledge Commons, Michigan State University

After pouring thousands of hours into marking up collections of documents, many of us face a whole new set of challenges when we are finally ready to share our work. How do we go beyond simply publishing the text for human readers? The first stable release of the Distributed Text Services (DTS) specification offers a blueprint: expose collections of documents through a flexible set of API endpoints, to human and machine clients alike. Though our corpora may differ substantially, we can make our documents findable, accessible, interoperable, and reusable (FAIR) by implementing the DTS 1.0 specification in our own APIs. Beyond making collections FAIR, DTS 1.0 implementers can make complex document structures more readily accessible.

Wednesday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Beyond valid and invalid: implementing complex validation in XProc

Andrew Sales

Bloomsbury Publishing Group plc began as a trade publisher nearly forty years ago, now with an academic division. Our digital products include Bloomsbury Collections (c. 50,000 academic monographs), Drama Online (home of the Arden Shakespeare) and the Churchill Archive (the complete digitized papers of Sir Winston Churchill, 800,000+). We publish around 3,500 titles per year on the academic list.

Our team’s main responsibility is to “spec and check”. We specify documentation requirements, maintain content conversion documentation, and support validation assets. This documentation is used by our supplier base to prepare the XML used to publish online content and to apply quality assurance prior to publication. This paper describes the (re-)implementation of a piece of enterprise middleware that acts as quality gatekeeper for our XML-first publishing workflow. We describe how we refactor equivalent legacy software as a single XProc pipeline with MorganaXProc-III; how we test this pipeline; and how we use the postprocessing of the validation reports it produces to both refine and redefine our validity criteria, based on the “size and shape” of the documents under scrutiny and the extent to which they violate our business rules.

Wednesday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

Abstraction, Paradox, and the End of History (Reflection)

Allen H. Renear, School of Information Sciences, University of Illinois Urbana-Champaign

In computing and information science, abstraction plays a leading role in how we tell the story of the evolution of programming languages, software engineering, data management systems, and document text encoding. For example, we can represent data as columns and tables in a relational database, “abstracting away” physical storage concerns; or we can mark up a document according to its semantics rather than how it will be processed. With ontologies and conceptual models at yet higher levels of abstraction, it may seem that we have now reached the zenith of abstraction and are ready to enjoy the benefits of a perfect match between human understanding and our digital information systems. However, scientific and philosophic traditions have shown that greater abstraction leads to contradiction, ambiguity, and resulting paradoxes. In an automated digital environment, a failure in abstract reasoning can occur without human oversight, with serious consequences.

Wednesday 13:30 14:00 EDT

Typefi Orion: The future of JATS, BITS, and STS editorial tools (Sponsor Presentation)

Guy van der Kolk, &
Lukas Kaefer, both of Typefi

In the world of scholarly publishing, Inera eXtyles has long been the XML editorial standard. This legendary software suite is a Microsoft Word Add-in that automates the most tedious parts of editorial work (cleanup, styling, reference checking, etc.) and enables exporting JATS, BITS, and STS XML directly from Microsoft Word—but support for it ends this August.

When Typefi heard about the impending end of eXtyles, the team immediately set to work building a replacement called Typefi Orion. Over 16 months of intense development, the team iterated and tested almost constantly, met monthly with eXtyles users to get input and feedback, enlisted editorial experts like Robin Dunford, and travelled to events around the world to introduce Orion and get more input from the community.

Typefi Orion matches all the core features and functions of eXtyles, along with significant enhancements based on feedback from real editors. It’s the most robust XML authoring and editorial software for scholarly publishing on the market today.

Join us to see:

An inside look at Orion’s rigorous development process
Technical details about how Orion works
What the toughest problem was during development
How we collaborated with real users to get feedback
A demo of Typefi Orion in action

Wednesday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Schematron Batch Fixes (LB)

Gerrit Imsieke, le-tex publishing services GmbH

Many international standards are coded in the tag set NISO STS (NISO Tag Suite) for interchange and processing by standards partners and clients worldwide. By desiign, NISO STS is an enabling not a prescriptive standard, so even when the Standards Development Organizations (SDOs) all encode using NISO STS, or can convert their preferred tag set to NISO STS, different SDOs employ different conventions on how to mark up and package their standards. Normalizing away all this variability allows DIN to process heterogeneous input from many SDOs into formats and structures that DIN can process.

This case study documents an as-yet-unfinished rewrite of the batch normalization process (Schematron Batch Fixes or SBF) used by DIN for this normalization. SBF relies on the ideas of Schematron Quick Fix but applied to batches of documents, not interactively one document at a time. The process is orchestrated by an XProc pipeline that can re-run all the checks after all fixes have been applied and report which of the fixes have been successful at which input locations. SBF uses a layered Schematron schema extension mechanism that adds an extending Schematron schema to build on a base schema. The extending schema can add, replace or remove fixes, and specify dependencies among fixes. Thus the correction and checking profile is no longer a single Schematron schema but a collection of Schematron files, with an NVDL (Namespace-based Validation Dispatching Language) bracket around them. The re-engineered SBF will provide validations beyond XML files, such as directory listings, Zip archive contents, and (most excitingly) XProc files.

Wednesday 14:45 15:15 EDT (+ Q&A 15:15 - 15:30)

XML as a Technological Pinnacle (LB) (Reflection)

Simon St.Laurent

Even in times of rapid technological change and advancement, occasionally a technology appears that becomes so foundational that it endures, even while all else around it changes. XML seems to be such a pinnacle technology, having survived some three decades after it emerged from the even older SGML. Come and see what makes XML so resilient, and how its place in the general history of technological development grounds it for future survival.

Wednesday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Generators – Deferred Evaluation in XPath 4

Dimitre Novatchev

When processing sequences of a few items, memory and performance are negligible. But when we need to engage in real world data such as news feeds, astronomical databases, or the Fibonacci sequence, we face sequences of items that are humongous or infinite (potentially or actually). Conventional approaches do not work. Generators are designed to address this problem, by allowing developers to get only the data that is needed, without attempting to load into memory the entire sequence. Through a real-world use case of news feed aggregation, we will learn how an XPath-first implementation of generators promises to revolutionize our handling of big data.

Wednesday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

My Whole Career Was a Lie? Or: How I Learned to Stop Worrying and Love WordPress (Reflection)

Jeffrey Beck

We in the declarative markup community have long asserted that declarative markup plays a foundational role in modern document processing by separating content from process and presentation. Declarative markup makes content maintainable, reusable, and adaptable across platforms and technologies. But there is another world out there that has never heard of declarative markup, much less seen a need for it. Building an online archive in that environment does not depend on any of the tools we've grown used to in our world. What we've learned about organization, curation, and other good document-management practices can usefully be carried over into that strange new world.

Wednesday 17:15 …

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday, August 6, 2026

Thursday 8:00 9:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

Bunkankun: Digital Text as distributed layers (LB)

Christian Wittern, Institute for Research in Humanities, Kyoto University, Japan

Markup in XML can be impractical for certain users, and for certain texts. In the case of premodern Chinese texts, TEI has produced mixed results, and alternatives are needed. The notion of the ancient scroll, juan in Chinese, provides a catalyst for a more suitable approach to the markup task, where an archival format, which like the scroll contains the normalized text without space or formatting and little metadata, is complemented by recipe formats that coordinate sets of archives. This approach to markup, conceptually patterned after YAML and Docker recipies, is exactly what has been needed for the markup of premodern Chinese texts.

Thursday 9:45 10:15 EDT (+ Q&A 10:15 - 10:30)

XML, MCP, and Language Models: A Separation of Concerns

Elisa E. Beshero-Bondar, &
Molly Scott Wright, and
Michael Roy Simons, all of Penn State Erie, The Behrend College

This paper reflects on the “DigitAI” project’s experiments to integrate Small Language Models (SLMs) with XML technologies in academic digital humanities work. The project is motivated by two goals: making “explainable AI” systems that help us understand how language models process, retrieve, and generate text; and making local, customized AI systems as an alternative to economically and environmentally costly Large Language Models. We first attempted to provide an SLM with a Retrieval Augmented Generation (RAG) system built from the TEI P5 Guidelines. This approach proved both bloated and disappointing. We came to realize that XML should be kept as XML, held apart from the language model’s internal machinery, and made accessible instead through a Model Context Protocol (MCP) server that allows the SLM to query the TEI document tree directly using XPath and related technologies. We propose a principled, declarative approach to AI-assisted XML work: a “separation of concerns” between the generative model and the structured data it consults.

Thursday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

Extending the TEI Processing Model to Describe Digital Edition Interactions

Peter Boot, Huygens Institute for the History and Culture of the Netherlands

The TEI Processing Model (TEI PM) provides a declarative specification for publishing digital editions, describing a number of behaviours that can be attached to TEI elements. Projects can describe desired rendering of their encoding using these behaviours, which are almost all (currently) static — for example, general textual (structural) behaviours such as paragraph, block, and figure. TEI PM provides no way for a project to describe how the user can interact with a generated site. Digital editions, however, are by nature interactive tools. I discuss a number of issues that arise when defining interactive behaviours and propose a number of new behaviours (experimental, but implemented). Most of the new behaviours are inspired by the interactivity currently available in TEI Publisher, to date the only workable implementation of the TEI PM.

Thursday 11:30 12:00 (+ Q&A 12:00 - 12:15)

Don't Touch My SGML

Ari Nordström, Creative Words

SGML may be old and cranky, but there are many users whose libraries of SGML documents are too vast to be given up and whose use of SGML may be mandated. However, SGML tools are rare, especially when compared to XML tools. So what can a holder of SGML documents do? One solution is to convert documents to XML for processing and then convert them back to SGML. A pipeline that starts with James Clark’s SGML tools and passes through multiple XSLT transformations seems to be the best approach and has the advantage of being reversible at the end. The result is that the users get to use modern processing tools while maintaining their valuable libraries.

Thursday 13:30 14:45 EDT

Open Mic: Anything Goes

Conference Participants

Short presentations on a variety of topics.

Thursday 14:45 15:15 EDT (+ Q&A 15:15 - 15:30)

Heart of Dorkness - A Journey in Vibe Coding an XML-Aware Program (LB)

Benjamin Wolfe, Wolfshafen Press

Ars Magica is a complex tabletop roleplaying game whose rules and source texts were released under open license in 2024. My software “Hermetic Foundry” is a free, open-source character-creation and saga-management application for Ars Magica. The project began as a modest AI-assisted user-interface prototype, but it became an experiment: could an experienced programmer build a serious XML-aware desktop application by “vibe coding” almost everything through ChatGPT and Codex? The result was surprisingly not chaotic XML. The schema set validated cleanly, used stable identifiers and cross-references, and developed into a package architecture with fixtures and regression validation. I have now tried LLM-assisted XML software development, and I argue that a two-LLM workflow – one refining prose, one implementing code – plus expert review can produce surprisingly robust, schema-disciplined software.

Thursday 15:45 16:15 EDT (+ Q&A 16:15 - 16:30)

Scholia 2026: DIY study aids with TEI, XProc, browser and printer

Wendell Piez

Using electronic tools to learn an ancient language might seem obvious. Surprisingly, the best approach may be to build simple tools that imitate classical approaches to language learning. With the availability of online resources (such as digitized ancient texts), we can build tools like graded readers that enable learning by encouraging the student to create new translations and personal annotations rather than providing existing translations and annotations. The author has built a number of tools. One of these, the Laminator, is an XProc-based, iXML-enabled processing stack. It manages and merges a markup notation designed to represent a text, not as a structure of (XML) elements, but as a set of (not-XML) ranges, including overlapping ranges. More tools are evolving as the author extends his studies.

Thursday 16:30 17:00 EDT (+ Q&A 17:00 - 17:15)

The Zettelkasten: Topic Maps, Back to the Future

Thomas B. Passin

The Zettelkasten (“card case”), introduced by the German social scientist Niklas Luhmann in 1981, is an external memory system composed of notes linked through explicit cross-references. A mesh of cross-references allows larger conceptual structures to emerge as the collection grows. Luhmann's Zettelkasten was entirely paper-based. Few efforts to computerize the model seem to have achieved or even understood the levels of engagement and serendipity that Luhmann claimed. I present a framework for understanding how such a system can achieve Luhmann-level support and speculate about why and how so many implementations have fallen short. I synthesize aspects of information theory, cognitive science, human interface, movie production practice, data modeling, and library science to illuminate key concepts underlying highly effective thinking support. Remember Topic Maps and back-of-the-book indexes?

Thursday 17:15 …

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday, August 7, 2026

Friday 8:00 9:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday 9:00 9:30 EDT (+ Q&A 9:30 - 9:45)

Efficient Development of TEI Rendering Web Applications Using Vue.js (LB)

Takatomo Inoue, Waseda University

Critical editions of manuscript sources are important tools for researchers, but paper-based editions cannot be updated. This case study introduces a TEI-encoded Arabic scholarly edition, which was created as part of the author’s dissertation. What did it take to create an interface that is readable, easy to maintain, and light on hosting requirements? This presentation will describe how XSLT, SaxonJS, and the Vue.js front-end framework can be leveraged to publish TEI digital editions.

Friday 9:45 10:15 EDT (+ Q&A 10:15 - 10:30)

Music Notation Markup Languages with a MusicXML Case Study

Joshua Lubell

How can a retired markup geek who sings bass in a choir create practice tracks from electronic sources? Music notation software enables musicians and scholars to edit and analyze musical scores, convert them into audio formats, and share them in searchable databases. Music notation markup languages attempt to encode the multitudinous information present in a piece of sheet music. Efforts to standardize music in descriptive markup started with the SGML-based SMDL (Standard Music Description Language), which was incorporated in HyTime (Hypermedia Time-based Document Structuring Language). Modern XML offerings include MusicXML and MEI (Music Encoding Initiative). ABC and ChordPro are popular text-based music annotation approaches. Do any of these meet my need? Let's find out!

Friday 10:45 11:15 EDT (+ Q&A 11:15 - 11:30)

A Centralised Index of the Sisterhood of Markup Events

Sheila E. Thomson

When I have an idea for a paper, I like to check if someone's already presented it or something similar. If it's not a new topic, then maybe I can build on what's gone before - but I need to know what that was. There are also occasions when the research is not driven by a prospective project but simply curiosity or to support learning. Each time I go through this process, I think to myself "Wouldn't it be nice if there was a centralised index of all the markup papers?" This paper is a case study on the creation of such an index, in particular its scope, processes, challenges, and solutions.

Friday 11:30 12:00 EDT (+ Q&A 12:00 - 12:15)

What does AI mean for Open Source?

Adam Retter, Evolved Binary

Quite a few XML infrastructure projects are now accepting AI/LLM generated code. Some of these projects are quite impressive and some I characterize as "slop".

The sudden meteoric rise of Large Language Models (LLM) and associated tools has caught many software developers unawares, leaving them suddenly ignorant, and arguably under-skilled. LLMs, whilst still advancing, have recently demonstrated impressive capabilities in their ability to assist software developers in their day-to-day tasks (e.g. coding new features, and locating and fixing issues). However, the use and adoption of LLMs presents many challenges for society as a whole; many of which are not in themselves technical concerns.

In this examination of the current and perceived impact of this technology upon Open Source we identify several social, economic, environmental, political, legal, and technical concerns regarding the use of LLMs in Open Source projects.

We contribute guidance around defining an AI Policy for Open Source projects and offer an AI Policy Score Card to assist projects in defining and declaring how they wish to work with AI or not.

Friday 13:30 14:00 EDT

Open Mic: Anything Goes

Conference Participants

Short presentations on a variety of topics.

Friday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Making Hierarchy out of Nothing at All: Federal Register Data Modernization

Betty Harvey, Electronic Commerce Connection, Inc.

The raw material for the Federal Register, published daily, comes to GPO from agencies as Microsoft Word files that often include legacy SGML tags as text. The challenge is turning this stream of paragraphs into a deep tree. Getting these into the United States Legislative Markup (USLM) XML format that GPO uses is a complex process, broken into many small incremental steps that are managed individually. Because Word inserts many redundant codes, a large part of the conversion process is filtering these out and merging similar formatting before developing the first levels of hierarchy. Recognizing section headings often depends on multiple rules for recognizing patterns of both formatting and text (such as numbering styles) in the content. Word tables are converted to HTML tables; these are currently converted to an old GPO table model but in the future will be CALS tables. Because Word styles are inconsistently used, interpretation is necessary. In the future, perhaps we will be able to capture Word styles in the initial conversion to XML. Many tagging decisions now depend on subject-matter experts. The service is a work in progress.

Friday 14:45 15:15 (+ Q&A 15:15 - 15:30)

We Just Can't Have Nice Things (Reflection)

Alex Miłowski

When given a new toy, we may play with it too hard or in a way not intended, and it just breaks, and is irreplaceable. But sometimes, breaking things leads to insights and innovations. We're tempted to get rid of the things that break often because they are “bad” or “poorly designed.” Often, they're just nice things we've failed to curate into better things. We have had so many nice new things . . . HTML, XML, SVG, MathML, XSL-FO . . . that have been broken somewhere, some time. What technologies will sustain value for society? We just can't just have nice things — we have to make them.

Friday 15:45 16:00 EDT

Closing Administrivia

Debbie Lapeyre, Mulberry Technologies

Announcements, thanks, and other administrative tasks are the necessary plumbing associated with events such as Balisage. In this session we will try to keep the administrivia as short as possible while also asking, telling, thanking, and recognizing as appropriate.

Friday 16:00 16:30 EDT

Conference Closing: Invitations to Future Markup-Related Events

Representatives of a variety of markup-related events.

There are several markup-related conferences and similar events scheduled throughout the year and across the planet. Representatives of those events have been invited to describe the focus, expected attendees, and unique characteristics of those events.

Friday 16:30 …

It's Been Fun!
post-conference reminiscing

Balisage is over, not just for this year but forever. This is the time to share favorite memories of Balisage through the years. Was there a presentation you keep thinking about? Did a “Balisage Bard” entry make you laugh so hard you remember it? Did you meet a friend, a mentor, or a colleague at Balisage? Was there something memorable about one of the Balisage venues? Come reminisce on what has, on the whole, been a good run.

Time	Monday 2026-08-03	Tuesday 2026-08-04
8:00am		Birds of a Feather discussion(s)
9:00am
9:15am
9:45am
10:15am	technology break
10:45am
11:30am
12:15pm	mid-day break & social time
1:30pm
2:00pm
2:45pm
3:30pm	technology break
3:45pm
4:30pm
5:15pm	Birds of a Feather Discussion(s)

Balisage 2026 Program

Monday, August 3, 2026

Welcome to Balisage 2026

There is Nothing So Practical as a Good Theory (Reflection)

Designing a Notation Using ixml

Track Changes Support for Automated Regex-Based Text Replacement in ContentXML

E-Book Backlists in, Accessibility out?

Taming Antenna House Formatter ‘CSS Debug XML’ (Sponsor Presentation)

Tony Graham, Antenna House

XSpec for XProc, and XProc for XSLT

Synthesis and Sustainability: The Evolution of Markup and Toolsets in the Women Writers Project

Validation-Driven Automation in a Federated XML Ingest Pipeline: The NCBI Bookshelf Case

Unlocking an Archive of Scholarly Journal Articles with Interactive XQuery (LB)

Birds of a Feather Discussion(s)

Tuesday, August 4, 2026

Birds of a Feather Discussion(s)

The XSLT/XPath/XQuery 4.0 Standards: a High-Level Perspective (Reflection)

Open Mic: Anything Goes

Transcriptional implicature

On beyond Invisible XML: parsing with implicit (and explicit) next-generation markup

Semantic XML for the Age of Document AI (Sponsor Presentation)

Achieving Near-Native Performance in XSLT, XQuery, and XPath Using Host-Language User Extension Functions (LB)

It’s Complicated: Holistic Approaches for Considering Complexity in XML Documents

Graceful Diagramming with SVG plus AVTs

From XML to Holons (Reflection)

Birds of a Feather Discussion(s)

Wednesday, August 5, 2026

Birds of a Feather Discussion(s)

Schemas in EPUB and OOXML: Sanity Checking versus Documentation

Distributed Text Services 1.0: An API Standard for Publishing and Extending TEI Document Collections

Beyond valid and invalid: implementing complex validation in XProc

Abstraction, Paradox, and the End of History (Reflection)

Typefi Orion: The future of JATS, BITS, and STS editorial tools (Sponsor Presentation)

Schematron Batch Fixes (LB)

XML as a Technological Pinnacle (LB) (Reflection)

Generators – Deferred Evaluation in XPath 4

My Whole Career Was a Lie? Or: How I Learned to Stop Worrying and Love WordPress (Reflection)

Birds of a Feather Discussion(s)

Thursday, August 6, 2026

Birds of a Feather Discussion(s)

Bunkankun: Digital Text as distributed layers (LB)

XML, MCP, and Language Models: A Separation of Concerns

Extending the TEI Processing Model to Describe Digital Edition Interactions

Don't Touch My SGML

Open Mic: Anything Goes

Heart of Dorkness - A Journey in Vibe Coding an XML-Aware Program (LB)

Scholia 2026: DIY study aids with TEI, XProc, browser and printer

The Zettelkasten: Topic Maps, Back to the Future

Birds of a Feather Discussion(s)

Friday, August 7, 2026

Birds of a Feather Discussion(s)

Efficient Development of TEI Rendering Web Applications Using Vue.js (LB)

Music Notation Markup Languages with a MusicXML Case Study

A Centralised Index of the Sisterhood of Markup Events

What does AI mean for Open Source?

Open Mic: Anything Goes

Making Hierarchy out of Nothing at All: Federal Register Data Modernization

We Just Can't Have Nice Things (Reflection)

Closing Administrivia

Conference Closing: Invitations to Future Markup-Related Events

It's Been Fun! post-conference reminiscing

There is nothing so practical as a good theory

It's Been Fun!
post-conference reminiscing