Balisage 2025 Program

Monday, August 4, 2025

Monday 10:00 10:15 EDT

Welcome to Balisage 2025

Conference logistics, tips for attendees, and other getting started messages.

Monday 10:15 10:45 EDT

Discussions Fuel a Conference; Questions Fuel Discussion

B. Tommie Usdin, Mulberry Technologies

Conferences are events at which people converse. That is, people talk with each other, learn from each other, enjoy interacting with each other. Performances are events at which the audience watches the performers. Balisage is a conference, not a series of performances. At Balisage, like at most conferences, speakers give presentations. Those presentations are interesting and valuable in and of themselves. But the active discussion after the presentations is the real point. That discussion, based on the content of the presentation, is fueled and shaped by questions and comments. This is why it is important that at Balisage we think carefully about the questions we ask. Good questions prompt the speaker(s) to expand on interesting points, allow the speaker to clarify, to extend, and to explain. Good comments support the speaker. It is important that we refrain from asking questions that demean the speaker, minimize the content of the presentation, or that are designed to show off the questioner’s knowledge at the expense of the speaker. At Balisage we allow substantial time for questions and discussion after each talk and at the beginning and end of each day. Please help us make Balisage lively, interesting, and interactive by crafting questions that lead to lively and interesting interactions.

Monday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Preprocessing XQuery Using Custom Module URI Resolvers

Mary Holstege

Code is complicated. Function libraries that implement the same operations over a variety of data types exhibit a particular kind of complexity: large amounts of duplication except for the data types. XML and XSLT offer some affordances to address these problems, but they aren’t present in XQuery. What we need in XQuery is A Mad Idea.

Monday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Narrative Recipes

Peter Flynn

When is a narrative not a narrative? When it’s a recipe! Publishing recipes with distinct sections enumerating the ingredients and preparation steps is a 19th century invention. Historically, recipes were just narrative prose. Can we leverage markup and style to present narrative recipes in a modern style?

Monday 14:00 14:30 EDT

Sponsor Presentation: Docugami, A Document Foundation Model Generating the Core XML Data Model as an input for AI Agents, Chat and other Workflow processing.

Jean Paoli, Docugami

Unlike traditional LLM + Chat approaches, Docugami’s architecture is built around a troika of LLM ↔ XML Data ↔ Chat (or other processing such as AI Agents, Workflow, Spreadsheets, Databases). Docugami groups documents in “docsets” of semantically similar documents (that will be sharing the same tag set) and generates for each document a semantically rich hierarchical XML tree representing the entire document. Applications are then built on top of this semantically rich set of trees, with for example AI Agents, Chat, Spreadsheets, Databases accessing this data representation to provide high-quality results for users.

Monday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Monday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

Ragnarok: An Experimental Extended XML Environment

Steven J. DeRose

Python looks like a natural fit for processing XML, but current Python tools are either not written in native Python or are out of date, slow, and do not support debugging or validation. Ragnarok offers to change that: it is a native Python implementation, fast, and itself extensible, potentially supporting even non-hierarchical structures. It includes a parser and serializer, supports multiple character sets, and provides more complete support for DOM than current tools. Come see the latest updates!

Monday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday, August 5, 2025

Tuesday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

A Schema Language and Parser for Next-generation Markup Languages

Ronald Haentjens Dekker, DHLab, Huygens Institute for the History of the Netherlands
David J. Birnbaum, Department of Slavic Languages and Literatures, University of Pittsburgh
Bram Buitendijk, KNAW Humanities Cluster
Joris J. van Zundert, Huygens Institute for the History of the Netherlands

Imagine a next-generation method to model, parse, and validate complex text features such as overlap (including self-overlap), discontinuity, and partially ordered and unordered content. From the early days of declarative markup languages, users have recognized that simple trees could not describe all the structures in documents. Balisage has a long history of attempts to understand the nature of complex structures and how they can be represented in markup and processed. Most of these approaches have assumed that the markup techniques would result in context-free grammars. Would a system that loosened the constraint of being context free, a mildly context-sensitive grammar, offer increased representational capabilities? A prototype parser has been developed to support this approach.

Tuesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Designing for fidelity and usability in transforming legacy journal article XML to JATS (LB)

Vincent Lizzi, Taylor & Francis

Taylor & Francis is working to develop a transformation process to convert a large archive of journal article files from an obsolete XML DTD to the current version of the JATS. Two essential design principles guide the project: fidelity and usability. Fidelity guarantees preservation of all content from the original files, ensuring nothing is unknowingly lost. Usability ensures the transformation is easy to use in a variety of scenarios—from staff processing individual files to automated batch transformations of numerous files. Implementation required resolving multiple technical challenges inherent in the obsolete format while addressing complexities arising from strict design principles. To validate accuracy of the transformation, a specialized comparison tool was created that identifies content missing in the output XML compared to the input XML. The approach is analysis-driven, involving comparison of the two DTDs, and utilizing standard technologies including XQuery 3.1, XSLT 3.0, BaseX, Saxon, XSpec, and DTDAnalyzer.

Tuesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

JNodes: a new model for navigating JSON trees (LB)

Michael Kay, Saxonica

W3C Community Group work is defining 4.0 versions of XPath, XQuery, and XSLT. A key requirement is improving navigation and transformation of data derived from JSON files. A study determined that significant problems remain with how the 4.0 specifications handle these requirements. The key issues relate to the fact that the lookup operator ("?", introduced in 3.0/3.1) loses information because the result is a flattened sequence of values: empty values are lost, only the values (not the keys) are retained, and there is no information about the parents or ancestors of the located values. To address these limitations, we propose adding a new type of item to the data model: a "JNode", as a peer of the traditional XML-based nodes (XNodes) in the type hierarchy. A JNode acts as a dynamically-constructed wrapper around a map or array, or around the values contained in a map or array, allowing navigation around the tree of maps and arrays using the axis mechanism used for XML navigation since XPath 1.0.

Tuesday 14:00 14:30 EDT

Sponsor Presentation: Antenna House -
Antenna House Overview and Formatter 7.5 Feature Preview

Alex Critchfield, Antenna House

An overview of Antenna House and it's products along with a preview of the new features in our upcoming Formatter v7.5 release.

Tuesday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Tuesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

I Know Why The Semantic Web Failed (FP)

Patrick Durusau

We use identifiers to encode the meaning of subject concepts for comparison and processing. Dictionaries (both historical and recent LLMs) rely on existing data as a starting point for these identifiers. There has always been the problem that one identifier can denote many possible subjects and conversely, any particular subject can be denoted by multiple identifiers. The Semantic Web made this conundrum worse by inviting users to create new identifiers for subjects that already possessed identifiers. What if, instead of adding to the sea of identifiers for subjects, we take inspiration from probabilistic database and large language models to develop a data-driven approach for subject identity? Instead of a universal exactness of subject identity, the degree of certainty or rather uncertainty, could be acknowledged as a matter of design.

Tuesday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday, August 6, 2025

Wednesday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Implementing Version Handling in Yet Another CMS

Ari Nordström

The culture of markup-land has long encouraged its residents to develop their own tools. Why stop with developing document applications: why not build a content management system to hold the documents? Among the largest challenges in designing a CMS is handling versions of components as documents are developed and revised. Fortunately, the XML application stack provides all the tools needed to construct a metadata language for tracking resources as their versions evolve and to generate an XForms user interface for displaying and managing resource information.

Wednesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

The infospace, Foxpath, and iXML

Hans-Jürgen Rennau & Hauke Brandes

Foxpath, short for folder XPath, is an expression language that enables XPath-like addressing of the files and folders in a file system. Both file systems and REST resources addressable through URIs can be thought of as a tree of folders, and thus navigated by path expressions in Foxpath. Foxpath is a superset of XPath 3.0 with node tree navigation retained but file system navigation added and a free combination of both functionalities allowed within a single path expression. Foxpath is a language with a strong focus on interactive use and the power of succinct expressions. Invisible XML allows us to extend Foxpath navigation to more resources. A new configuration mechanism associates grammars with file name patterns, enhancing the experience of a pervasive tree structure which we call an infospace.

Wednesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Language Identification for Program Code in Documents (LB)

Steven J. DeRose

Programming code (and data formats) could be identified easily for code embedded in structured documents using standard language attributes with new values. Both HTML and XML provide built-in language identification through the HTML @lang and @xml:lang attributes respectively, which follow BCP 47 language tag conventions. I propose using a single reserved language code ("qpr") as a base language code, conforming to ISO 630, to cover the space of programming languages, then creating a subspace for programming language identifiers (or data format identifiers) which would appear as region-like suffixes in the attribute values. The XML NOTATION Declaration can be leveraged to map names to URIs and thus formalize the identification of the programming languages. This mechanism requires no extensions to existing XML tools and processing, and applications that wish to leverage the information can do so in a clear and reasonably familiar way. An agreed-upon language identification mechanism would facilitate language-specific processing including syntax highlighting, spell checking, and validation while maintaining backward compatibility with existing document processing systems.

Wednesday 14:00 14:30 EDT

Sponsor Presentation: Saxonica -
SaxonJS 3 coding improvements

Debbie Lockett, Saxonica

SaxonJS 3 includes features that address issues raised by users of previous versions of SaxonJS. User inquiries are often of the form “Here’s a problem I’m trying to solve. This is what I can do. But what about X? How can I do it with SaxonJS?” Sometimes the SaxonJS 2 solutions are unsatisfactory—yes we can code that; but the code isn’t “pretty”, intuitive, or easy to write… or perhaps there remain limitations… In some cases there are limitations with the SaxonJS 2 processor; some problems require integrating JavaScript as well. In other cases, the SaxonJS 2 solution is complicated or requires putting features together in an unfamiliar way. In many situations, new SaxonJS 3 features can be used to write much cleaner solutions. This presentation discusses several new features and shows how they might be used.

Wednesday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Wednesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

How I Stopped Worrying & Learned to Love AI (FP)

Dale Waldt

This is a chronicle of the author's personal and professional journey from skepticism to appreciation of artificial intelligence (AI), particularly in the context of structured content and XML. I describe a hands-on exploration of integrating AI tools—primarily ChatGPT—into the process of writing, editing, and converting a technical document. I illustrate some of the practical methods and challenges, from initial prompt-learning, iterative content drafting in Microsoft Word, converting XHTML table conversion, and attempting DocBook XML tagging. Emphasizing AI as a contextual assistant, rather than an autonomous writer, highlights the tool’s strengths in formatting and summarization. But there need to be clearer publishing guidelines around AI-generated content, and there are also ethical concerns such as copyright. I am not the only person with ethical and moral concerns, and the session ends with a discussion of some of these issues with other attendees.

Wednesday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday, August 7, 2025

Thursday 9:00 10:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Schematron as XSLT Interface and Advanced Validation Models

Joel Kalvesmaki

Schematron is perhaps the most expressive and powerful of all XML schema languages. It is so expressive, it can be conceived of as a kind of API for a constrained flavor of XSLT. Through this lens, we can see all of Schematron and all of XSLT beyond it. Working together, they can conquer the world.

Thursday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Pipe cleaner: ensuring the correctness of XProc pipelines (LB)

Norm Tovey-Walsh, Saxonica

As users begin to explore using XProc 3.x pipelines, and migrate existing 1.0 pipelines to 3.x, they naturally have questions about how to tell if a pipeline will work and will produce the correct result. This breaks down, broadly, into four categories:

  • Is the pipeline written correctly: is it syntactically valid?
  • Is the pipeline written correctly: is logically valid?
  • Does it do what the user intended: does it produce the correct results?
  • If it doesn’t, how can the user figure out why?

A demonstration will show a running example and work through the various features. Full disclosure: some of the avenues I plan to explore are specific to the speaker's implementation.

Thursday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

The Book of Doublends Jined: Parsing Finnegans Wake with ixml (LB)

Steven Pemberton, CWI, Amsterdam

Finnegans Wake by James Joyce is probably the hardest book to read in the English language. A principal hurdle is the length and convolutedness of the sentences. This paper reports an attempt to handle this complexity by parsing the sentences (at a structural, not a semantic level) to reveal their top-level structure. It takes the reader step-by-step through the construction of an ixml grammar for dealing with one chapter of the book.

Thursday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Writing Maintainable XSLT Conversions: From EEBO TEI to Web HTML Seven Ways

Liam Quin, Delightful Computing

XSLT conversions are often reused. It is easy to say "write for maintainability" but far harder to actually write maintainable XSLT. There is no consensus on what it is that makes some XSLT more maintainable than other XSLT. Here, a complex transformation consisting of a series of steps, most written in XSLT, are connected using several approaches: XSLT modes and variables, the XPath 3 transform() function, XProc, the Unix make program, and even with a batch script. These methods are compared in terms of maintainability: skills and knowledge needed, managing interdependencies, difficulty of revision, and ease of reuse. Recommendations are made for structuring multi-step XSLT transformations depending on context, people, and data. Guidelines are included to help project designers make the choice.

Thursday 14:45 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Thursday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

To be announced

Thursday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday, August 8, 2025

Friday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Is Invisible XML Ready for Undergraduates? Trying iXML and XProc on a Music Analysis Project in an Undergraduate Text Analysis Course

Michael Simons & Elisa E. Beshero-Bondar, both of Penn State Erie, the Behrend College

University students in the Digital Media, Arts, and Technology program at Penn State are offered a course in “Large-Scale Text Analysis”. Going into this course, students have experience in encoding text with XML, transforming XML with XSLT, and web development with HTML and CSS. In the past, the Text Analysis course has been a procedural “Python-and-Regex course”: preparing text corpora by generating simple XML from regularly-patterned files using regular expression search-and-replace operations, using XQuery to extract the portions of the texts to analyze, and producing plain-text inputs to provide to Python. Python has dominated the experience of the pipeline. This year’s course tried a different approach. Students were taught iXML grammars as a way to prepare XML for analysis and XProc for pipelining. Regular expression matching was accomplished via XSLT, and the entire XML stack was used before approaching Python. Did the students learn a lot? Yes, both new concepts and very different ways of thinking about text. But I, the instructor, learned a lot too, and will share my experience.

Friday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

(Re)building the TEI website: a bit of history and new directions

Hugh Cayless, Duke Collaboratory for Classics Computing (DC3)

The Text Encoding Initiative has had a web presence for almost thirty years. It’s instructive to consider how a large, robust, and widely-used XML vocabulary defines its presence on the web. How it has weathered the storms of change (management, institutional, technological) to be where it is today. And how it imagines its future.

Friday 12:00 12:15 EDT

Closing Administrivia

Debbie Lapeyre, Mulberry Technologies

Announcements, thanks, and other administrative tasks are the necessary plumbing associated with events such as Balisage. In this session we will try to keep the administrivia as short as possible while also asking, telling, thanking, and recognizing as appropriate.

Friday 12:15 12:30 EDT

Markup and Community

Video: All of us: thoughts on technology and community in the words of Michael Sperberg-McQueen, by Bethan Tovey-Walsh
Introduced by Tommie Usdin, Mulberry Technologies

Balisage is a gathering of people who are interested in, and users of, markup in many forms. We have diverse backgrounds, work in many environments, and use a mixture of tools including polished commercial products, mass-market software, bespoke one-use wonders, and rusty antiques. We generally value declarative markup, reusability, longevity, and separation of content from form and tools.

The markup community is generally supportive of each other, respectful of the requirements of others, and willing to step up to provide for “edge cases”. It is these attitudes, and this community, that have made declarative markup, and SGML/XML in particular, so ubiquitious that many see it as a given and see no need for conferences to discuss it.

For many years the closing talk at Balisage (and Extreme Markup Languages, Markup Technologies, the XML Conference, and before that the SGML Conference) was one of the highlights of the event. In a tour de force of last minute writing, Michael Sperberg-McQueen wove patterns of the various talks and conversations of the week. He consolidated the conference into a coherent narrative. In his view all speakers were important, all presentations significant, and the investment of time and energy participating in Balisage worthwhile. An ongoing theme in those conference closings is the health, strength, and importance of the markup community.

Michael is no longer with us, and it seems not only impossible but inappropriate to try to replicate this rhetorical feat. So we are not going to try. This year we will close Balisage with a short review of some of what Michael said about us, the markup community. Then we will open the floor for discussion.

Friday 12:30

Markup and Community: Open Discussion

Let's talk about the markup community. Let's talk about communities linked by technology and approach to documents/data. Let’s talk about how we are going to build a future we will be proud of. Let’s talk.

Timezone: 24-hour clock

Interactive schedule-at-a-glance
Time Monday Tuesday Wednesday Thursday Friday
technology break
technology break

mid-day break & social time
technology break