Balisage 2022 Program

Pre-conference Event: Sunday, July 31, 2022

Sunday 11:00 12:00 EDT

Dress Rehearsal & Social Time

Conference Attendees

Balisage is using the Whova Conference Portal, which is unfamiliar to some attendees and has changed since some of us used it last year at Balisage. In order to provide an opportunity for us all to figure out how the portal works, we will do a “Dress Rehearsal” on the Sunday before the conference. The Dress Rehearsal will start with some social time including coaching to help people get logged in to Whova, a bit of conference-lite content including Q&As, and some small group social time.

Sunday 12:00 12:15 EDT (+ Q&A 12:15 - 12:25)

Electronic Verse Engineer

Wendell Piez

Suppose you want to publish an anthology, with TEI or BITS, for example. Part of the workflow is going to be capturing verse. Electronic Verse Engineer is a Markdown-like interface for capturing verse (as opposed to unstructured paragraphs), with some annotation capability. It can capture prose, if that’s what you have. You type and it makes XML for you. It is meant to be a building block in an Anthology Builder application, to publish your anthology. (I hear that Anthology Building is going to be the next craze—when it happens, you heard it from me.)

Sunday 12:30 12:45 EDT (+ Q&A 12:45 - 12:55)

Customisation of Akoma Ntoso using Schematron

Geert Bormans, C-Moria BV

Akoma Ntoso, an Oasis Open standard for legal documents, is gaining popularity. Since the standard aims to cover a wide variety of legislative and parliamentary traditions, it is fairly broad and generic. Projects usually start by customisation of the standard to a local tradition. Such a customisation is usualy achieved by generating a so-called subschema through publicaly available tooling. This presentation shows an alternative approach for some common use cases using Schematron instead.

Sunday 13:00 13:30 EDT

Small Group Social Time

There will be several social spaces available throughout Balisage. This is a good time to take a look at them and chat with other conference attendees.

Monday, August 1, 2022

Monday 10:00 10:15 EDT

Welcome to Balisage 2022

Conference logistics, tips for attendees, and other getting started messages.

Monday 10:15 10:45 EDT Session in Conference Portal

Destructive Consistency

B. Tommie Usdin, Mulberry Technologies

It seems to be in the nature of people drawn to discussions of markup, that is, people likely to be at Balisage, that we value consistency. We want both our physical and virtual worlds to make sense and we expect consistency. We chafe when forced to drive on the “wrong” side of the road. We want the names used in our markup vocabularies to be formed in consistent ways—either use CamelCase or don’t, for example. We want things that are parallel to be handled in parallel ways. This feels comfortable to us, and we think we are doing good by pushing the world toward our comfort zone. This, it seems to me, is one of the reasons that explicit markup has not taken the world by storm. We are not meeting people where they are; we are trying to change where they are. Worse, we are (deliberately? frequently?) blind to their styles and desires. This just plain doesn’t work! Our obsession with consistency is destructive.

Monday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Correcting Collation Problems with XSLT

Elisa E. Beshero-Bondar, Penn State Erie, The Behrend College

The word by word, comma by comma, and sometimes tag by tag comparison of manuscripts and editions (called “collation”) is notoriously tedious and error-prone. But computer-aided collation is like a power loom that inevitably tangles up threads caught in the machinery. We need new tooling to help us unsnarl the threads.

To this point, we aligned variant passages in the Frankenstein Variorum project using a Python script to feed collateX. Now we are experimenting with the Text Alignment Network’s tandiff XSLT to handle the string comparison completely with XPath and XSLT. How far can we take XSLT and Schematron in automating the preparation, collation, and correction of electronic editions?

Monday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Invisible XML Coming into Focus

Tomos Hillman, Evolved Binary; John Lumley; Steven Pemberton, CWI; C. M. Sperberg-McQueen, Black Mesa Technologies LLC; Bethan Tovey-Walsh, Swansea University; & Norm Tovey-Walsh, Saxonica

Invisible XML has had a long incubation process, but in the last year things have heated up. A W3C Community Group has been formed; the specification has been improved and is on the verge of publication, with implementations on the way. Infrastructure in preparation includes a test suite to improve interoperability of implementations, tutorial materials, and collections of sample Invisible XML grammars.

Monday 13:30 14:00 EDT

Sponsor Presentation: Oxygen

George Bina, Syncro Soft

The Oxygen XML set of tools evolved over time and they now include visual authoring, publishing, review and collaboration, support for automation, and a powerful SDK in addition to the core XML editing and development. During this presentation, we will provide an overview of these tools, as well as highlight the most important recent additions to each tool, such as concurrent XML editing and JSON support.

Monday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Serverless Searching with XSLT and JavaScript

Joey Takeda, Simon Fraser University; & Martin Holmes, University of Victoria

Paper books may go out of print, but they do not cease to function after a few years. Web-based digital editions, by contrast, have often been tied to particular software systems and vanished entirely when that software became obsolete. How can this be avoided?

For many digital humanities projects, “serverless” is the answer: sites that are purely static on the server side. But digital editions also require sophisticated search capabilities with stemming, wildcards, and filters tuned to the contents of the edition itself. To meet this need, staticSearch builds search indexes offline with XSLT and queries them using JavaScript in the browser. This straightforward, serverless approach is not without interesting and challenging problems of its own.

Monday 15:00 15:30 EDT (+ Q&A 15:30 - 15:45)

JATS Superhighway: Onramp to a Backward-incompatible Version

Jeff Beck, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health

Tag sets change over time. Tag set designers manage a complex system where everything is connected to everything else and new user requirements continue to surface. Tag set users manage complex systems to create, manage, and archive documents. Users strongly resist backward-incompatible change, so as JATS has grown we have made compromises in the design to meet new requirements while maintaining backward-compatability. We think it is time to consolidate redundant models, remove deprecated items, and generally reduce confusion. Can we guide users towards a new, backwards incompatible version in a way that they'll find palatable?

We have a plan. We're going to extend the JATS 1.x schema so that it contains the new, 2.0 models in addition to the old models. Then we'll make an "Onramp" subset of 1.x that has the deprecated items removed. Documents valid against the onramp subset of 1.x will also be valid against 2.0.

Monday 16:00 17:00 EDT

Birds of a Feather Discussion(s)

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Topic(s) include:

  • Getting data into and out of Microsoft Office and Open Office

Additional topics will be announced during Balisage.

Tuesday, August 2, 2022

Tuesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Getting Useful XML out of Microsoft Excel

Gayanthika Udeshani, Typefi

Many users create and maintain useful data in Excel spreadsheets. Besides the tables themselves, there are also charts and graphs derived from them. Powerful Java tools exist to extract this data, but they can be difficult to configure. An XSLT function library to extract CALS tables and SVG diagrams makes Excel data more easily accessible.

Tuesday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

BITS for Government Information?

Ravit H. David, University of Toronto

Scholars Portal, a service of the Ontario Council of University Libraries, provides multiple levels of access and preservation to scholarly packages of Ebooks. We have created an in-house custom modification of the Book Interchange Tag Suite (BITS, a sister vocabulary to the Journal Article Tag Suite, JATS) to describe EBooks. A recent strategic decision to host government information has posed several challenges with our BITS modification and required a new in-house schema to accommodate specific metadata requirements posed by the somehow different nature of govinfo content. We’ll look at some of these challenges, then examine ways BITS can accommodate metadata that isn’t necessarily standard Ebook metadata.

Tuesday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Designing for Change: Pragmas as an Extensibility Mechanism for Invisible XML

Tomos Hillman, Evolved Binary; C. M. Sperberg-McQueen, Black Mesa Technologies LLC; Bethan Tovey-Walsh, Swansea University; & Norm Tovey-Walsh, Saxonica

Invisible XML (ixml) is a method for treating non-XML documents as if they were XML. A specification for Invisible XML is under committee development. But no technology foresees all of its use cases, especially in 1.0. How can ixml allow experimentation, and channel experimentation in useful ways, to allow ideas to be expressed in ixml grammars that go beyond what is foreseen, without compromising interoperability or the value of strict conformance to the specification?

Many programming languages (C, JavaScript, Pascal, XQuery, etc.) address this question with pragmas. A pragma is a semi-formal way to instruct a processor/compiler/interpreter how it should operate. Typical pragmas extend a specification but are not a part of it. We propose pragmas as an optional add-on to ixml to allow implementation of non-standardized functionality in a way that does not interfere with standard ixml processing. We describe our general framework for pragmas, some specific pragmas (to illustrate how pragmas can be used), and a few pragmatic implementations.

Tuesday 13:30 14:00 EDT

Sponsor Presentation: Docugami

Jean Paoli, Docugami
Zubin Rustom Wadia, Docugami
Owen Ambur,
Steve DeRose, Docugami

We will demonstrate the usage of Docugami's XML AI to transform U.S. Government Annual Performance Reports (PDF) to StratML (XML), to help accelerate GPRAMA Sec 10 compliance for U.S. Federal Agencies. Docugami implements the Core XML Document Tenets (doctypes, hierarchical model with links, very variable depths, mixed content, node order critical, descriptive/declarative names for content, namespaces) and brings its advantages to business users and business documents without lengthy setup, programming, document analysis, creating stylesheets or other friction.

Tuesday 14:00 14:30 EDT

Updates on Active Markup-Related Specifications

Patrick Durusau,
David Maus,
Steven Pemberton,
Norm Tovey-Walsh,
Robert Wheeler, ASME

We rely on specifications to make our markup tools and projects sing. Those specifications continue to evolve to meet changing needs, even if (thankfully?) they are not changing as quickly now as they were a decade or two ago.

  • The Schematron story is complicated. As of the 2020 version, ISO Schematron is behind a paywall and inactive. Schematron use is growing, as is the list of wished-for improvements to the Schematron specification. This update will discuss the current state of Schematron.

  • NISO Z39.102/NISO STS (Standards Tag Suite) provides a common XML format that developers, publishers, and distributors of standards, including national standards bodies, regional and international standards bodies, and standards development organizations, can use to publish and exchange full-text content and metadata of standards. STS is based on ANSI/NISO Z39.96 JATS (the Journal Article Tag Suite). NISO STS 1.1 is ready for balloting.

  • OpenDocumentFormat (ODF), an XML format for office suite software, was formally approved by ISO/IEC in 2005. But like sharks, ODF had to continue to move or die. Much has happened since 2005 and even more changes are planned for the future! Much has happened since 2005 and even more changes are planned for the future! ODF 1.4 is now in preparation.

  • XSLT 4 is a revised version of the XSLT 3.0 Recommendation. The changes are relatively minor usability enhancements. There are no changes to the data model or processing model. Instead, the specification attempts to fill a number of gaps in functionality resulting from feedback from XSLT 3.0 users.

  • XForms is an XML-based declarative programming language used in projects and apps, large and small, around the world. We report on recent changes and developments.

Tuesday 15:00 15:30 EDT (+ Q&A 15:30 - 15:45)

XML in an AsciiDoc World: SaxonJS to the Rescue (LB)

Evan Lenz, Lenz Consulting Group, Inc. and the C++ Alliance

Static website generation has long been an effective use case for XML and XSLT. Today, static site generators remain popular, but they rarely use XML. Antora is a static site generator for software documentation. It runs on Node.js and uses AsciiDoc for its source content. It has desirable features including git integration, site versioning, and pluggable modern UI bundles. However, Antora doesn't natively handle complex content generation needs. Now, thanks to SaxonJS and Antora's new extension mechanism, we can weave in the power of XML and XSLT. The docca project generates reference documentation for Boost C++ libraries via an Antora extension that invokes SaxonJS, seamlessly integrating auto-generated and manually-authored content into the result. This presentation introduces key project components (Doxygen, Antora, AsciiDoc, and SaxonJS running on Node.js) and includes sample code, a demo, and a brief discussion of other ways XML and SaxonJS might complement AsciiDoc and Antora.

Tuesday 16:00 17:00 EDT

Birds of a Feather Discussion(s)

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Topic(s) include:

  • iXML (Invisible XML)

Additional topics will be announced during Balisage.

Wednesday, August 3, 2022

Wednesday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

Migrating DocBook to Uncharted Waters

Ari Nordström, Creative Words & Jean Kaplansky

Converting DocBook to HTML sounds like it should be straightforward. But 50 GB of DocBook data, some dating back to the SGML days and replete with interpolations specific to a proprietary publishing and content-management system, provides plenty of, um, opportunities for creative conversion approaches and has taught us a number of valuable lessons. Using pipelines of many XSLT transformations matched to individual problems is a basic strategy for conversion, but many other tools, some not even designed for DocBook, have become parts of the pipeline as well. We are lucky to have such a vibrant and supportive XML community to help when code alone isn’t enough.

Wednesday 11:00 12:15 EDT

Open Mic: Anything Goes

Conference Participants

Open Microphone. Balisage participants are invited to give 2 to 10 minutes presentations on any topic (within the limits of the conference Code of Conduct). Sign up to speak by sending email to info@balisage.net, including:

  • your name and affiliation if you want an affiliation on the conference site
  • the title of your mini-presentation
  • how long you want (10 minutes maximum)

(In the unlikely event that more people ask to present than we have time for the conference chair will select speakers.)

Wednesday 12:15 12:45 EDT

Sponsor Presentation: Typefi

Damien Gibbs, Typefi

Typefi will demonstrate how to produce print, mobile and online content faster using single-source publishing platform in a seamless end-to-end automated workflow.

Wednesday 13:30 14:00 EDT

Sponsor Presentation: Antenna House

Mike Miller, Antenna House

Antenna House offers way more than XSL-FO and CSS Formatters. This presentation will introduce you to Antenna House’s newest products, HTML on Word and Word API. HTML on Word converts an MS Word file to clean HTML that somebody can style with CSS to produce web pages or print. Word API is a server-side library that provides for data-merge, unlink, document comparison, and inspect of MS Word documents. Both products rely on the MS Word files being DOCX. Neither requires MS Word. DOCX is the tie-in to XML.

Wednesday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

Human Readability of Data Files

Michael Robert Gryk, Department of Molecular Biology and Biophysics, UCONN Health

“It’s human readable” has not only been an assertion but also a stated requirement for SGML and XML since those projects began. But what constitutes readability, and what does readability imply about the use of any markup technique? Some kinds of data can be marked up in more than one syntax—XML, JSON, and STAR for example. We can compare the effect of different markup techniques on the readability of the resulting data files. There’s plenty to discuss, from the concept of readability itself to the design choices required when converting from one markup technology to another.

Wednesday 15:00 15:30 EDT (+ Q&A 15:30 - 15:45)

Markup Category Terminology over the Years: a First Look (LB)

Allen H. Renear, School of Information Sciences, University of Illinois at Urbana Champaign & Steven J. DeRose, Independent Consultant

We’ve been doing the “markup” thing for more than half a century, since the beginnings of computerized text processing. In that time we’ve put a lot of adjectives in front of “markup” that reflect how we think of and apply the markup. These qualifiers have tended to fall into two categories, those that suggest what will happen, particularly in presentation, as a result of the application of markup to a string of data and those that reflect what we think about the data itself. Beginning with broad terms, like “generic” we have made many attempts to elucidate what our our markup is intended to accomplish: “conceptual”, “declarative”,“logical”, “structural”, and “semantic” have all had their times in the spotlight. What do the changing fashions in terminology say about our data and about what we, the practitioners, think about our work?

Wednesday 16:00 17:00 EDT

Birds of a Feather Discussion(s)

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Topic(s) include:

  • Teaching XML and related topics

Additional topics will be announced during Balisage.

Thursday, August 4, 2022

Thursday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

XSLT Extensions for JSON Processing

Michael Kay, Saxonica

XSLT 3.0 contains basic facilities for transforming JSON as well as XML. But looking at actual use cases, it’s clear that some things are a lot harder than they need to be. How could we extend XSLT to make JSON transformations as easy as XML transformations, using the same rule-based tree-walking paradigm? Some of these extensions are already implemented in current Saxon releases, so we are starting to get user feedback.

Thursday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Project Mirabel: XQuery-Based Documentation Reporting and Exploration System

Eliot Kimber, ServiceNow

Project Mirabel is a multi-function XQuery-based system for capturing and reporting on the results of applying Schematron validation to DITA documents. Mirabel has experienced a very 'organic' development, growing (all within the span of three weeks!) from a simple reporting utility into a multi-function platform for general reporting and exploration over large and complex sets of DITA content that represent non-trivial hyperdocuments. Whew!

Thursday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Biblical Linguistics in the GitHub Jungle (LB)

Jonathan Robie

A great deal of Biblical analysis is available on GitHub under open licenses, including well-established reference systems for the verses in a Bible and the words used in the original languages. But data integration is still problematic since there are different traditions, with different sets of books and different ways of dividing up individual books. Even a concept as simple as "what is a word" becomes complicated, since linguists employ a range of different criteria, which are not uniformly applicable across contexts and languages.

Clear Bible Inc. develops software for Bible translators based on machine learning and natural language processing. We create and curate linguistic datasets for biblical Hebrew and Greek and align translations to the original Hebrew and Greek words. This alignment allows images, maps, articles, and other resources to be associated with the text. To make all of this possible, we have had to discover ways to integrate across datasets that were not designed to be used together. Challenges? Yes. Solutions? Some.

Thursday 13:00 13:45 EDT

Balisage Bard

Lynne Price, Gamemaster

Once again, Balisage Bard gives you the opportunity to exercise your literary creativity with original poems, short stories, jokes, songs, photos, recipes, trivia questions, and other masterpieces. Subject matter must be related to Balisage—possibilities include markup, papers presented this or previous years, virtual conferences, attendees’ interests (whether or not pertinent to markup), and so forth. Read your effort, play it on video, or show photos or text during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit per presentation. Sign up by entering your name in the Bard chat room. Presentation sequence at the gamemaster’s discretion. One submission per person/team unless there is time for more at the end. And listen closely. Vote for your favorite three works after the last presentation. Who will be the 2022 Balisage Poet Laureate?

Thursday 14:00 14:30 EDT (+ Q&A 14:30 - 14:45)

On Translating the TEI (LB)

Hugh Cayless, Duke University

Few tagsets are more robustly committed to supporting international and multilingual texts than the TEI. It seems only fitting that the TEI Guidelines, the principal documentation of the TEI, written in the TEI, should be available in a wide variety of languages. Several attempts have been made, but the practical difficulties of keeping translations up-to-date has caused them to languish. In 2020, the TEI Consortium and Duke University received a grant to begin reviving the TEI’s internationalization project. We hope the addition of a browser-based GUI for translating the TEI Specification pages into target languages will help make this ambitious project a success.

Thursday 15:00 15:30 EDT (+ Q&A 15:30 - 15:45)

Metadata for Creators

Mary Holstege

Metadata can be more than an external description, especially in markup systems where the line between data and metadata is especially blurry. Signature metadata allows a creator to assert authorship of a work and communicate with the audience. Process metadata captures key details, serves as an integral part of project record-keeping, and can even play a crucial role in the creation of resources. When metadata is used creatively and managed actively, new opportunities become visible. Important lessons can be learned about the creation and management of metadata in a creative process. With beautiful pictures.

Thursday 16:00 17:00 EDT

Birds of a Feather Discussion(s)

Balisage participants will choose topics we want to discuss and discussion leaders to keep the conversation on topic. These topics may be inspired by conference presentations or may be other subjects of interest to the markup community. Topic(s) include:

  • Living with wiki markup and Asciidoc and their like -- better than nothing, or a nail in the coffin of descriptive markup?

Additional topics will be announced during Balisage.

Friday, August 5, 2022

Friday 10:00 10:30 EDT (+ Q&A 10:30 - 10:45)

The Impossible Task of Comparing CALS Tables

Robin La Fontaine & John Francis, DeltaXML

Finding out what has changed in a CALS table is remarkably complicated. Some variant of the CALS standard is often used to represent tabular data in XML, but it permits considerable flexibility in the form of headers, footers, and spans. Additional complexity arises when authors use empty columns for layout or use column or row spans specified in unusual ways, or when applications simply do not follow the standard. In practice, comparing CALS tables directly is impossible. But maybe that’s OK if all we need is a clear representation of the changes. And if we can represent them in a CALS table!

Friday 11:00 11:30 EDT (+ Q&A 11:30 - 11:45)

Multiple String Comparison in XSLT

Joel Kalvesmaki

Introducing tan:collate(), an XSLT function for comparing three or more strings. The tan:collate() function was written primarily for use with textual criticism, and therefore operates under the assumption that the input strings are descended from a common archetype, which the user wishes the output to respect (prefer) as much as possible. The function considers pairwise approaches that build the guide tree incrementally, as well as a staggered-sample approach that leverages the binary-comparison function tan:diff() to detect the ‘superskeleton’ of the strings. The tan:collate() function has been released as part of an open-source function library enabling developers to incorporate text comparison directly into XML applications.

Friday 12:00 12:30 EDT (+ Q&A 12:30 - 12:45)

Excel to Excel using XML, really? (LB)

Geert Bormans, C-Moria BV

Merging several Excel documents, from different sources with different forms, into a single XML document that has to satisfy externally specified reporting requirements is a somewhat daunting prospect. You might imagine that a tool like Python (and Pandas) would be the appropriate place to start. But we were able to develop a solution using the XML stack in less time than was forecast for the Python-based solution. We use a single XProc 3.0 pipeline to merge the Excel files and produce both the required XML format and Excel files for internal use. We believe our success depended largely on the fact that the project design enabled precise, filtered, early feedback which reduced the number of iterations required to create the critical deliverable. In addition to ingesting and providing final data in the user’s preferred format (Excel) we were able to provide intermediate files, change tracking, and process documents in Excel.

Friday 13:30 14:15 EDT

Constructive Inconsistency

C. M. Sperberg-McQueen, Black Mesa Technologies

A foolish consistency, said Emerson, is the hobgoblin of little minds. But how can we know when consistency is foolish and when it is wise?

Friday 14:30 15:00 EDT

Feedback

What did you like at Balisage 2022? What could have been better? What changes would you suggest for future Balisage conferences? Tell us what you think.