Balisage 2024 Program

Pre-conference Event: Sunday, July 28, 2024

Sunday 12:00 12:30 EDT

Dress Rehearsal & Social Time

Conference Attendees

Balisage is using the Whova Conference Portal, which is unfamiliar to some attendees and has changed since some of us used it last year at Balisage. In order to provide an opportunity for us all to figure out how the portal works, we will do a “Dress Rehearsal” on the Sunday before the conference. The Dress Rehearsal will start with some social time including coaching to help people get logged in to Whova, a bit of conference-lite content, a Q&A session, and some small group social time.

Sunday 12:30 13:00 EDT (+ Q&A 13:00 - 13:15)

"Why are some technologies more successful than others? And why are my predictions usually wrong?"

Michael Kay Saxonica

If asked “what made XML successful”, I could come up with many answers, ranging from ease of implementation through availability of support tools to dumb luck and good timing. But if, back in the day, I had been asked “will the World Wide Web take off”, I would have given the wrong answer: I could see the positive things it shared with current succesful technologies, like SGML, but I couldn’t have predicted the rise of TCP/IP that made the Web feasible. It’s not too hard to recognize the importance of low cost or potential benefits of promising technologies but much more difficult to predict the effects of timing or the endorsement of trusted influencers. And it is most difficult for us to see our own strong inbuilt biases that make prediction risky.

Sunday 13:30 14:00 EDT

Small Group Social Time

There will be several social spaces available throughout Balisage. This is a good time to take a look at them and chat with other conference attendees.

Monday, July 29, 2024

Monday 10:00 10:15 EDT

Welcome to Balisage 2024

Conference logistics, tips for attendees, and other getting started messages.

Monday 10:15 10:45 EDT

Break up the Bundle; Sell the Components

B. Tommie Usdin, Mulberry Technologies

The XML market has been moderately successful selling a package that includes: some philosophies (declarative markup and generic markup), a syntax, some programming languages (XSLT), some associated specifications, and some tools. We tell would-be users that there are significant advantages to creating, managing, and deploying their content our way, and if they cannot do that they should up-convert their content as soon as possible. This way they will be able to do multiple things with it, use many tools, be vendor independent, and they may find that their documents can suddenly play the piano and tap dance. They should use explicit, generic markup. AND they must, we tell them, use pointy brackets. AND they must leave Perl and Python in the dust and commit to XSLT (or iXML and XSLT). AND all of their code must be declarative and side-effect free. We tell them that their documents are trash, the programs they have worked for years to master are useless, and they are once-again beginners. We push this as an all or nothing proposition.

Not only is this approach arrogant and off-putting, it is wrong. And we are beginning to see this. There is no essential link between generic markup and pointy brackets. The power of declarative markup is distinct from descriptive markup. XSLT and much of the rest of the XML tool stack CAN be used in other environments. It is time we stopped insulting our would-be users, customers, colleagues and their (often highly successful) documents and environments. It is time we unbundled this package and helped people use the parts that work for them in their contexts.

Monday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Visualizing textual collation—Why: Imagining an effective alignment visualization

Ronald Haentjens Dekker, DH Lab, Huygens Institute for the History of the Netherlands and David J. Birnbaum, University of Pittsburgh

Textual scholars often align moments of agreement and moments of variation across manuscript witnesses in order to explore how those relationships contribute to a theory of the text, that is, to understanding the history of the transmission of a work. How to identify (an analytic task), model (an interpretative task), and represent (a rhetorical task) the structural relationships among witnesses are distinct but related aspects of the study of textual transmission. Our presentation explains and demonstrates how effective visualizations of collation results must integrate both textual and graphic methods in order to communicate a theory of the text.

Monday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Visualizing textual collation—How: Expressing a dynamic alignment ribbon in SVG

David J. Birnbaum, University of Pittsburgh and Ronald Haentjens Dekker, DH Lab, Huygens Institute for the History of the Netherlands

Creating an alignment ribbon, a new model of textual collation, raises several specific challenges for a dynamic SVG representation, including the fact that SVG text does not know its own length, the absence of support for a z-index property in SVG, and the need to adjust the positioning of large numbers of SVG elements simultaneously and in a scalable manner in response to user-generated events. Our presentation explains and demonstrates how we identified and implemented ways of overcoming these challenges using common and less common SVG features along with other structured-document standards.

Monday 14:00 14:30 EDT

Sponsor Presentation: Evolved Binary

A sponsor presentation from Evolved Binary

Monday 14:30 15:00 EDT

Open Mic: Anything Goes

Conference Participants

Balisage short subject open microphone. All conference participants are invited to give a 2 to 10 minute presentation on ANY topic (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just yourself as a talking head. Anything goes!
Click for details including how to sign up

Monday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

Using iXML to produce XML to produce iXML to produce ...

John Lumley

Invisible XML offers a concise syntax for writing grammars to map structured textual data into XML. Conveniently, iXML can even parse itself. And then it’s visible XML! We have a rich family of tools for processing XML. What happens if we leverage this synergy? Parse iXML into XML, transform the XML to produce new grammars. Use those grammars to produce new XML. Repeat. How deep is this rabbit hole? And what clever solutions to challenging problems will we find as we explore it?

Monday 16:00

Birds of a Feather Discussion(s)

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Specific topics and discussion leaders will be announced in the conference portal.

Topics will include:

  • Manuscript collation, led by R. Haentjens Dekker & D. Birnbaum
  • additional topics to be announced in the conference portal

Tuesday, July 30, 2024

Tuesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Stretching XPath: three testing tales

Amanda Galtman

Writing good tests is challenging. Tests need to be correct: neither passing when they should fail, nor failing when they should pass. They need to be concise, not lost in a sea of boilerplate and repetition. Also, they need to be easy to maintain. If maintaining the test artifacts is tedious, then they are less likely to be up to date. XPath and other XML languages offer versatile functionality: sometimes, if you look just a little off the beaten track, you’ll find new tools for writing correct, concise, and maintainable tests. And maybe your other code will benefit too.

Tuesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Clean SOAP: Evaluating AI-based structured document generation in a medical context (LB)

Paul Prescod & Phill Tornroth, Elation Health

Healthcare systems worldwide are in crisis: there are far too few Primary Care Physicians and they are badly overworked. Modern Electronic Health Records (EHR) systems, while solving critical issues with paper-based records, have greatly increased the monumental burden of medical case documentation. An “AI Scribe” is an artificial intelligence system designed to streamline the process of clinical documentation by automatically generating “doctor visit notes”, which all physicians must produce following a patient visit. AI Scribe systems leverage natural language processing (NLP) and machine learning to interpret and document patient encounters in a format called SOAP (Subjective, Objective, Assessment, Plan). However, the implementation of AI in medical documentation raises multiple risks, including hallucinations, critical detail omissions, miscategorizations, narrative quality and organization issues, security and privacy concerns, bias and discrimination, as well as legal and ethical issues. Automated testing of AI-generated SOAP Notes is essential, but provides its own existential challenges. The way to automate and scale semantic and contextual SOAP testing is to use Large Language Models to test the output of other Large Language Models. What can we test successfully and how do we avoid an infinite regress of validators?

Tuesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

From Word to XML via iXML: a Word-first XML workflow

C. M. Sperberg-McQueen, Black Mesa Technologies

Many readers will be familiar with the challenge of taking the output from Microsoft Word, an editor that many authors choose, and producing from it robust, structured, usefully tagged XML. Fully general solutions are a daunting effort, but what if full generality isn’t a requirement? The TLRR project (Trials in the Late Roman Republic, second edition) is preparing a small corpus of structured texts. Can the TLRR ignore (most of) the markup and leverage parsing the text itself? Will that be simpler? Easier? Successful?

Tuesday 14:00 14:30 EDT

Sponsor Presentation: Docugami KG-RAG, A Document Foundation Model Generating the Core XML Data Model and enabling higher-quality RAG

Jean Paoli, Zubin Rustom Wadia & Gregory Renard, Docugami

Docugami groups documents in "docsets" of semantically similar documents (that will be sharing the same tag set), and generates for each document a semantically rich hierarchical XML tree representing the entire document. Docugami is a proprietary Business Document Foundation Model, a vertically integrated GenAI Stack that trains, composes and fine tunes multiple Open-Source Large Language Models trained on millions of Business Documents. Retrieval Augmented Generation (RAG) has gained traction as a popular use case that allows Large Language Models (LLMs) to reason over business-critical data that is often private to enterprises. RAG over simple text is a start, but RAG over semantic XML Knowledge Graphs (KG-RAG) is a game changer. In this demo you will see how the Docugami's Foundation Model can automatically generate XML structures and semantic tags and how to feed this information into open-source LLMs like Llama3 while addressing noise issues to avoid hallucinations, leading to dramatically higher-quality outputs.

Tuesday 14:30 15:00 EDT

Open Mic: Anything Goes

Conference Participants

Balisage short subject open microphone. All conference participants are invited to give a 2 to 10 minute presentation on ANY topic (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just yourself as a talking head. Anything goes!
Click for details including how to sign up

Tuesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

Invisible Fish: API experimentation with invisible XML

Mary Holstege

Fishing for a more consise and user-friently way to specify the verbose input to an advanced program for generating artistic graphics? How about constructing your own descriptive language? But then a language needs to be interpreted and turned into code. Invisible XML to the rescue! An iXML grammar can consume the input, convert it to XML, and pass it on to an XQuery driver that creates the verbose input to the graphics program.

Tuesday 16:00

Birds of a Feather Discussion(s)

Discussion Leader(s) to be Announced

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Specific topics and discussion leaders will be announced during Balisage.

Topics will include:

  • Discussion of Docugami’s presentation earlier today
  • additional topics to be announced in the conference portal

Wednesday, July 31, 2024

Wednesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

XML and LLMs: this might work!

Steven J. DeRose

Unlike most attendees at this conference, the artificial-intelligence programs called “Large Language Models” do not think in pointy brackets. They have usually been trained on really dumb raw text, and sometimes don’t even recognize punctuation. But what if you trained an LLM to do simple markup? Some of the latest LLMs are even capable of recognizing some markup and imitating it. The results of further training of such an LLM could be interesting and perhaps even useful.

Wednesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Pulse, parse, and ponder: dissecting a domain-specific language

Joseph Michael Courtney & Michael Robert Gryk, UCONN Health (US), Department of Molecular Biology and Biophysics

Nuclear Magnetic Resonance (NMR) involves using complex sequences of radio-frequency (RF) pulses to excite material, which has been placed inside a strong magnetic field, and then detecting the resulting signals. The Bruker Pulse Programming Language is designed to control Bruker NMR spectrometers by specifying pulse sequences. Our goal is processing pulse data into structured data to enable automatic comparison of experiments, run simulations, help determine reasonable data parameters, and simplify many other tasks currently done by human experts. We used iXML to parse Bruker pulse code, and we report on some successes, but also complexities. We found iXML to be an effective tool for parsing pulse code, particularly in not requiring a new parser to be built for each grammar modification. But our work would be easier if iXML supported more complex rules (regular expressions, negative matches, and priority, for example).

Wednesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

DTD (document type definition) declarations exposed in XSLT (LB)

Liam Quin

The tools frequently used to process XML documents, such as XSLT stylesheets, XPath expressions, and XQuery expressions, have no access to any Document Type Definition (DTD) that governs a document. But what if an application needs to know about the document grammar generated by a DTD, perhaps to create a visual representation of structure? The unique language of DTDs cannot itself be processed directly by tools such as XSLT, but ingenious use of string matching may make it possible enable the scanning and processing of DTDs.

Wednesday 14:00 14:30 EDT

Sponsor Presentation: Antenna House -
Printing should be invisible not be irritating

Tony Graham, Antenna House

In “Printing Should Be Invisible”, Beatrice Warde likened good typography to a crystal, rather than gold, wine goblet. The crystal goblet, she argues, is better because “everything about it is calculated to reveal rather than hide the beautiful thing that it was meant to contain.” As the manuals many of us produce tend more to the prosaic than the beautiful, this talk aims for the lower bar that printing (formatting) should not be irritating. A whirlwind tour of aspects of formatting from styling tables of contents to formatting indexes, this talk covers ways that the styling can aid rather than impede the comprehension of the text.

Wednesday 14:30 15:00 EDT

Open Mic: Anything Goes

Conference Participants

Balisage short subject open microphone. All conference participants are invited to give a 2 to 10 minute presentation on ANY topic (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just yourself as a talking head. Anything goes!
Click for details including how to sign up

Wednesday 15:15 15:45 EDT (+ Q&A 15:45 - 16:00)

Deviant Causal Chains: A Problem for the Conceptual Modeling of Influence (ENCORE)

Jingzhu Wei, School of Information Management, Sun Yat-Sen University and Allen Renear, School of Information Sciences, University of Illinois at Urbana-Champaign

An Encore Presentation, originally given at the SIG-CM Workshop on Conceptual Modeling, JCDL 2020 Wuhan, China. August 1, 2020

Wednesday 16:00

Birds of a Feather Discussion(s)

Discussion Leader(s) to be Announced

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Specific topics and discussion leaders will be announced during Balisage.

Thursday, August 1, 2024

Thursday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Sponsor Presentation: SaxonJS 3.0: Major new functionality!

Norm Tovey-Walsh & Debbie Lockett, Saxonica

SaxonJS is an XSLT 3.0 processor, written mainly in JavaScript but partly in XSLT, that is available for two JavaScript environments: the browser and Node.js. Saxonica is planning a major release (SaxonJS 3.0) this summer, which introduces a new API for calling JavaScript functions from XPath, and for calling XSLT functions from JavaScript. This API means, for example, that a SaxonJS application can directly access JavaScript APIs from XPath and that JavaScript APIs can make calls to XSLT functions.

In addition, SaxonJS 3.0 expands mechanisms for handling asynchrony, a critical feature of well-behaved JavaScript applications. The JavaScript engine has a single thread and demands that applications manage access to that thread in a responsible way by releasing control of the thread whenever they do something that might block processing (for example, disk I/O or web service requests). SaxonJS has long supported asynchronous evaluation of templates through ixsl:schedule-action. But in 2024, “promises” have become the standard mechanism for managing JavaScript asynchrony. A promise associates a request that the JavaScript environment can perform in the background with the code (the function) that should eventually run when the request completes. SaxonJS 3.0 adds support for asynchrony through a set of functions that allow XSLT programmers to create, manage, and respond to promises.

Thursday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Adventures in mainframes, text-based messaging, and iXML

Ari Nordström

Valuable old data, an older data format, and a computing environment older still and about to be shut down. When pointy brackets meant SGML, an aerospace working group produced a text-based messaging specification, S2000M Issue 2.1, intended to run on already-old hardware. The current version of that specification prescribes XML. But content written to the old one (Issue 2.1, a mystifying mix of compact syntax and semantics), is not going to go away anytime soon. However, the hardware that hosts it, and the 30+ year old software used to access it, are going away. Soon.

The data is still in use, it cannot be converted to a newer format in the available time (if at all), and the current host will vanish soon. iXML to the rescue! By using iXML to view the documents as XML, the entire XML stack can be used to manage the content while leaving it in the historical format.

Thursday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Using a testbed to assess XML database performance (LB)

Alan Paxton & Adam Retter, Evolved Binary

It seems that the benchmarking of XML Databases has stagnated and new directions are needed. So we propose adopting the testbed approach, already in use for NoSQL and relational databases, to XML databases. Our goal is to run flexible, repeatable, and potentially large benchmarks against any standard-compliant XML database implementation in order to perform both application level benchmarking (broad characterization of performance against a particular workload) and microbenchmarking (tailored queries and workloads run to characterize detailed performance of particular code or data paths). We have already implemented 2 key components necessary for adopting the NoSQLBench testbed for XML Database benchmarking: an adapter for NoSQLBench that implements the ‘XML:DB’ standard and drives XML databases and a generating tool (‘xmlgen2’) to produce synthetic XML data to populate a database prior to benchmarking. (Note: Our ‘xmlgen2’ is modeled on Albrecht Schmidt’s XMark ‘xmlgen’, but can generate diverse workloads (rather than a single XML corpus) using the Virtual Data Set features of NoSQLBench.) We will demonstrate automated benchmarking of XMark queries, but this is a late-breaking paper, and our exact conclusions will depend on our findings between paper submission (June) and August.

Thursday 14:00 14:45 EDT

Open Mic: Anything Goes

Conference Participants

Balisage short subject open microphone. All conference participants are invited to give a 2 to 10 minute presentation on ANY topic (within the limits of the conference Code of Conduct). Use video, sound, bullet point slides, cartoons, visualizations, SW demonstrations, or just yourself as a talking head. Anything goes!
Click for details including how to sign up

Thursday 15:00 15:30 EDT (+ Q&A 15:30 - 15:45)

Graph Query Language — a new kid on the block!

Alex Miłowski

There is a new kid on the block and they may have stolen your lunch! Or maybe you’ll make a new friend? In April of this year, ISO/IEC JTC 1/SC 32, the same group that publishes and maintains SQL, published a new query language: ISO/IEC 39075:2024 GQL. This new standard defines the property graph data model and syntax for querying and manipulating graphs. What is a property graph? What does a GQL query look like? Where might you find and use GQL? We’ll explore a short history, the basic aspects of this new standard, show some examples, and consider future applications.

Thursday 16:00

Birds of a Feather Discussion(s)

Discussion Leader(s) to be Announced

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Specific topics and discussion leaders will be announced during Balisage.

Friday, August 2, 2024

Friday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Two XPaths are better than one

Syd Bauman, Northeastern University/Library/DSG/Women Writer’s Project

XPath (by which I mean what XPath 1.0 calls “location paths” and XPath 2.0 and 3.1 call “path expressions”) is the Swiss army knife that leverages the power of XML data: it slices, it dices, it shreds, it chops — but wait, there’s more! Even better, XPaths are synergistic: the power of two XPaths is greater that twice the power of one. Let me show you some examples. I will describe the role of Schematron, and you will understand how layering XPaths reduces complexity and increases readability.

Friday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

When women do algorithms: a semi-generative approach to overlay crochet with iXML and XSLT (LB)

Bethan Tovey-Walsh

"Crochet" is what we call it when women do algorithms. A crochet pattern can be quite a complicated algorithm. For example, overlay crochet involves using two (usually contrasting color) yarns to create interwoven grids. Overlay patterns can be expressed as grids of squares, so they are very easy to represent using a coordinate system, which can then be translated into a visual representation.

Enter iXML! To translate textual material into an overlay crochet design, I first wrote a set of pattern specifications for the letter shapes of the Latin alphabet. Using iXML, I then processed a text to extract a list of capital letters, and inserted the patterns in place of their matching letters. An XSLT stylesheet extracted these pattern pieces as a text file, and a second iXML grammar joined the individual pattern specifications into a single scarf pattern. Transforming this into SVG with an XSLT stylesheet allowed manual tweaking to ensure that the letter-shapes sat well in combination with one another. The result is a scarf pattern with a hidden message, generated semi-randomly by patterns in the input text, with a little aesthetic tweaking to make it harmonious. And the result of *that* is a very nice, warm scarf!

Friday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Leveraging markup to process narrative recipes

Peter Flynn

When is a narrative not a story? When it’s a recipe! Publishing recipes with distinct sections enumerating the ingredients and preparation steps is a 19th century invention. Historically, recipes were just narrative prose. Can we leverage markup and style to present narrative recipes in a modern style?

Friday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Ensuring XML quality and compatibility in large collections that span decades of content (LB)

Mark Gross, Data Conversion Laboratory

One of the promises of XML applications and related technologies is consistency and temporal durability of data created using them. In practice, however, for large applications such as JATS, there are many users of varying skill levels, with differing interests in the process. When these diverse users may also be using different versions of the standard tag set, errors and non-error inconsistencies tend to accumulate across large document collections. Although issues such as missing DOIs, bad xrefs, duplicate IDs, invalid assets, and other structural problems in the XML may not always result in invalid documents, they are nonetheless problems for the usability and sustainability of a collection. The presentation will highlight automated solutions for fixing massive amounts of JATS and BITS XML, ensuring that content is current with the latest DTDs and more internally consistent.

Friday 15:00 15:45 EDT

Visible / invisible

C. M. Sperberg-McQueen, Black Mesa Technologies

Invisible XML seems to be attracting attention. What can we learn from it?

Friday 16:00 16:30 EDT

Feedback

What did you like at Balisage 2024? What could have been better? What changes would you suggest for future Balisage conferences? Tell us what you think.

Timezone: 24-hour clock

Interactive schedule-at-a-glance
Time Sunday
Pre-conference
Monday Tuesday Wednesday Thursday Friday
technology break
technology break
mid-day break & social time
technology break
technology break