Balisage 2025 Preliminary Program

Monday, August 4, 2025

Monday 10:00 10:15 EDT

Welcome to Balisage 2025

Conference logistics, tips for attendees, and other getting started messages.

Monday 10:15 10:45 EDT

Discussions Fuel a Conference; Questions Fuel Discussion

B. Tommie Usdin, Mulberry Technologies

Conferences are events at which people converse. That is, people talk with each other, learn from each other, enjoy interacting with each other. Performances are events at which the audience watches the performers. Balisage is a conference, not a series of performances. At Balisage, like at most conferences, speakers give presentations. Those presentations are interesting and valuable in and of themselves. But the active discussion after the presentations is the real point. That discussion, based on the content of the presentation, is fueled and shaped by questions and comments. This is why it is important that at Balisage we think carefully about the questions we ask. Good questions prompt the speaker(s) to expand on interesting points, allow the speaker to clarify, to extend, and to explain. Good comments support the speaker. It is important that we refrain from asking questions that demean the speaker, minimize the content of the presentation, or that are designed to show off the questioner’s knowledge at the expense of the speaker. At Balisage we allow substantial time for questions and discussion after each talk and at the beginning and end of each day. Please help us make Balisage lively, interesting, and interactive by crafting questions that lead to lively and interesting interactions.

Monday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Preprocessing XQuery Using Custom Module URI Resolvers

Mary Holstege

Code is complicated. Function libraries that implement the same operations over a variety of data types exhibit a particular kind of complexity: large amounts of duplication except for the data types. XML and XSLT offer some affordances to address these problems, but they aren’t present in XQuery. What we need in XQuery is A Mad Idea.

Monday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Narrative Recipes

Peter Flynn

When is a narrative not a narrative? When it’s a recipe! Publishing recipes with distinct sections enumerating the ingredients and preparation steps is a 19th century invention. Historically, recipes were just narrative prose. Can we leverage markup and style to present narrative recipes in a modern style?

Monday 14:00 14:30 EDT

Sponsor Presentation

A Balisage Sponsor will make a presentation.

Monday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Monday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

Ragnarok: An Experimental Extended XML Environment

Steven J. DeRose

Python looks like a natural fit for processing XML, but current Python tools are either not written in native Python or are out of date, slow, and do not support debugging or validation. Ragnarok offers to change that: it is a native Python implementation, fast, and itself extensible, potentially supporting even non-hierarchical structures. It includes a parser and serializer, supports multiple character sets, and provides more complete support for DOM than current tools. Come see the latest updates!

Monday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday, August 5, 2025

Tuesday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Tuesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

A Schema Language and Parser for Next-generation Markup Languages

Ronald Haentjens Dekker, DHLab, Huygens Institute for the History of the Netherlands
David J. Birnbaum, Department of Slavic Languages and Literatures, University of Pittsburgh
Bram Buitendijk, KNAW Humanities Cluster
Joris J. van Zundert, Huygens Institute for the History of the Netherlands

Imagine a next-generation method to model, parse, and validate complex text features such as overlap (including self-overlap), discontinuity, and partially ordered and unordered content. From the early days of declarative markup languages, users have recognized that simple trees could not describe all the structures in documents. Balisage has a long history of attempts to understand the nature of complex structures and how they can be represented in markup and processed. Most of these approaches have assumed that the markup techniques would result in context-free grammars. Would a system that loosened the constraint of being context free, a mildly context-sensitive grammar, offer increased representational capabilities? A prototype parser has been developed to support this approach.

Tuesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Reserved for Late-breaking News (LB)

Tuesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Reserved for Late-breaking News (LB)

Tuesday 14:00 14:30 EDT

Sponsor Presentation: Docugami

Jean Paoli, Docugami

Tuesday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Tuesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

I Know Why The Semantic Web Failed (FP)

Patrick Durusau

We use identifiers to encode the meaning of subject concepts for comparison and processing. Dictionaries (both historical and recent LLMs) rely on existing data as a starting point for these identifiers. There has always been the problem that one identifier can denote many possible subjects and conversely, any particular subject can be denoted by multiple identifiers. The Semantic Web made this conundrum worse by inviting users to create new identifiers for subjects that already possessed identifiers. What if, instead of adding to the sea of identifiers for subjects, we take inspiration from probabilistic database and large language models to develop a data-driven approach for subject identity? Instead of a universal exactness of subject identity, the degree of certainty or rather uncertainty, could be acknowledged as a matter of design.

Tuesday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday, August 6, 2025

Wednesday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Wednesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Implementing Version Handling in Yet Another CMS

Ari Nordström

The culture of markup-land has long encouraged its residents to develop their own tools. Why stop with developing document applications: why not build a content management system to hold the documents? Among the largest challenges in designing a CMS is handling versions of components as documents are developed and revised. Fortunately, the XML application stack provides all the tools needed to construct a metadata language for tracking resources as their versions evolve and to generate an XForms user interface for displaying and managing resource information.

Wednesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

The infospace, Foxpath, and iXML

Hans-Jürgen Rennau & Hauke Brandes

Foxpath, short for folder XPath, is an expression language that enables XPath-like addressing of the files and folders in a file system. Both file systems and resources addressable through URIs can be thought of as a tree of folders, and thus addressed as node trees in Foxpath. Foxpath is a superset of XPath 3.0 with node navigation retained but file system navigation added and a free combination of both functionalities allowed within a single path expression. Foxpath is a language with a strong focus on interactive use and the power of single expressions, an extended version of XPath. Invisible XML allows us to extend Foxpath navigation to more resources A new configuration mechanism associates grammars with sets of resources and maps names to resources.

Wednesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Reserved for Late-breaking News

Wednesday 14:00 14:30 EDT

Sponsor Presentation: Saxonica -
SaxonJS 3 coding improvements

Debbie Lockett, Saxonica

SaxonJS 3 includes features that address issues raised by users of previous versions of SaxonJS. User inquiries are often of the form “Here’s a problem I’m trying to solve. This is what I can do. But what about X? How can I do it with SaxonJS?” Sometimes the SaxonJS 2 solutions are unsatisfactory—yes we can code that; but the code isn’t “pretty”, intuitive, or easy to write… or perhaps there remain limitations… In some cases there are limitations with the SaxonJS 2 processor; some problems require integrating JavaScript as well. In other cases, the SaxonJS 2 solution is complicated or requires putting features together in an unfamiliar way. In many situations, new SaxonJS 3 features can be used to write much cleaner solutions. This presentation discusses several new features and shows how they might be used.

Wednesday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Wednesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

How I Stopped Worrying & Learned to Love AI (FP)

Dale Waldt

This is a chronicle of the author's personal and professional journey from skepticism to appreciation of artificial intelligence (AI), particularly in the context of structured content and XML. I describe a hands-on exploration of integrating AI tools—primarily ChatGPT—into the process of writing, editing, and converting a technical document. I illustrate some of the practical methods and challenges, from initial prompt-learning, iterative content drafting in Microsoft Word, converting XHTML table conversion, and attempting DocBook XML tagging. Emphasizing AI as a contextual assistant, rather than an autonomous writer, highlights the tool’s strengths in formatting and summarization. But there need to be clearer publishing guidelines around AI-generated content, and there are also ethical concerns such as copyright. I am not the only person with ethical and moral concerns, and the session ends with a discussion of some of these issues with other attendees.

Wednesday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday, August 7, 2025

Thursday 9:00 10:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Thursday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Schematron as XSLT Interface and Advanced Validation Models

Joel Kalvesmaki

Schematron is perhaps the most expressive and powerful of all XML schema languages. It is so expressive, it can be conceived of as a kind of API for a constrained flavor of XSLT. Through this lens, we can see all of Schematron and all of XSLT beyond it. Working together, they can conquer the world.

Thursday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Reserved for Late-breaking News

Thursday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Writing Maintainable XSLT Conversions: From EEBO TEI to Web HTML Seven Ways

Liam Quin, Delightful Computing

XSLT conversions are often reused. It is easy to say "write for maintainability" but far harder to actually write maintainable XSLT. There is no consensus on what it is that makes some XSLT more maintainable than other XSLT. Here, a complex transformation consisting of a series of steps, most written in XSLT, are connected using several approaches: XSLT modes and variables, the XPath 3 transform() function, XProc, the Unix make program, and even with a batch script. These methods are compared in terms of maintainability: skills and knowledge needed, managing interdependencies, difficulty of revision, and ease of reuse. Recommendations are made for structuring multi-step XSLT transformations depending on context, people, and data. Guidelines are included to help project designers make the choice.

Thursday 14:00 14:30 EDT

Sponsor Presentation

A Balisage Sponsor will make a presentation.

Thursday 14:30 15:00 EDT

Open Mike: Anything Goes

Conference Participants

All conference attendees are invited to give a 2 to 10 minute presentations on ANY topic! This is an open microphone for any short subject (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just be a talking head. Anything goes!
Click for the Open Mike Call including details on how to sign up.

Thursday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

A Matter of Context: XML, RDF and Latent Spaces (FP)

Kurt Cagle

Tired of hallucinations from large language models? A solution may be to give them not just better data or more data, but prompts with better structured data. Reducing hallucination has heretofore been approached by supplying better contextual narratives in prompts. However, recognizing that knowledge is best represented by graphs leads to the suggestion that XML may be a better fit for input to LLMS than either RDF or JSON.

Thursday 16:00

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday, August 8, 2025

Friday 9:00 10:00 EDT

Birds of a Feather Discussion(s)

BoFs are meetings, or mini-meetings, devoted to a topic of interest. See the BOF page to request a BoF and the conference portal for the BoF schedule.

Friday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Is Invisible XML Ready for Undergraduates?

Elisa E. Beshero-Bondar & Michael Simons, both of Penn State Erie, the Behrend College

University students in the Digital Media, Arts, and Technology program at Penn State are offered a course in “Large-Scale Text Analysis”. Going into this course, students have experience in encoding text with XML, transforming XML with XSLT, and web development with HTML and CSS. In the past, the Text Analysis course has been a procedural “Python-and-Regex course”: preparing text corpora by generating simple XML from regularly-patterned files using regular expression search-and-replace operations, using XQuery to extract the portions of the texts to analyze, and producing plain-text inputs to provide to Python. Python has dominated the experience of the pipeline. This year’s course tried a different approach. Students were taught iXML grammars as a way to prepare XML for analysis and XProc for pipelining. Regular expression matching was accomplished via XSLT, and the entire XML stack was used before approaching Python. Did the students learn a lot? Yes, both new concepts and very different ways of thinking about text. But I, the instructor, learned a lot too, and will share my experience.

Friday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Reserved for Late-breaking News

Friday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

(Re)building the TEI website: a bit of history and new directions

Hugh Cayless, Duke Collaboratory for Classics Computing (DC3)

The Text Encoding Initiative has had a web presence for almost thirty years. It’s instructive to consider how a large, robust, and widely-used XML vocabulary defines its presence on the web. How it has weathered the storms of change (management, institutional, technological) to be where it is today. And how it imagines its future.

Friday 12:45 13:00 EDT

Closing Administrivia

Debbie Lapeyre, Mulberry Technologies

Announcements, thanks, and other administrative tasks are the necessary plumbing associated with events such as Balisage. In this session we will try to keep the administrivia as short as possible while also asking, telling, thanking, and recognizing as appropriate.

Friday 13:00 13:15 EDT

Markup and Community

Video: All of us: thoughts on technology and community in the words of Michael Sperberg-McQueen, by Bethan Tovey-Walsh
Introduced by Tommie Usdin, Mulberry Technoloies

Balisage is a gathering of people who are interested in, and users of, markup in many forms. We have diverse backgrounds, work in many environments, and use a mixture of tools including polished commercial products, mass-market software, bespoke one-use wonders, and rusty antiques. We generally value declarative markup, reusability, longevity, and separation of content from form and tools.

The markup community is generally supportive of each other, respectful of the requirements of others, and willing to step up to provide for “edge cases”. It is these attitudes, and this community, that have made declarative markup, and SGML/XML in particular, so ubiquitious that many see it as a given and see no need for conferences to discuss it.

For many years the closing talk at Balisage (and Extreme Markup Languages, Markup Technologies, the XML Conference, and before that the SGML Conference) was one of the highlights of the event. In a tour de force of last minute writing, Michael Sperberg-McQueen wove patterns of the various talks and conversations of the week. He consolidated the conference into a coherent narrative. In his view all speakers were important, all presentations significant, and the investment of time and energy participating in Balisage worthwhile. An ongoing theme in those conference closings is the health, strength, and importance of the markup community.

Michael is no longer with us, and it seems not only impossible but inappropriate to try to replicate this rhetorical feat. So we are not going to try. This year we will close Balisage with a short review of some of what Michael said about us, the markup community. Then we will open the floor for discussion.

Friday 13:15

Markup and Community: Open Discussion

Let's talk about the markup community. Let's talk about communities linked by technology and approach to documents/data. Let’s talk about how we are going to build a future we will be proud of. Let’s talk.

Timezone: 24-hour clock

Interactive schedule-at-a-glance
Time Monday Tuesday Wednesday Thursday Friday
technology break
technology break
mid-day break & social time
technology break