Balisage 2019 Program
Monday, July 29, 2019
Symposium: Markup Vocabulary Customization
Tuesday, July 30, 2019
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
Welcome and Introductions
Explicit markup: a fool’s errand or the next big thing?
B. Tommie Usdin, Mulberry TechnologiesIn 1998, at a Balisage predecessor conference, Brian Reid told us we couldn’t have the world we wanted. XML wouldn’t deliver. He used twenty-year-old slides, slides that he had originally presented at a conference in 1981 to make his point. I still want the world that Brian Reid told us we could not have; I still want Brian Reid to have been wrong. I still believe that separating meaning from format will enable our documents to be displayed in many forms and media, that a markup format that makes hierarchy explicit makes complex documents tractable, that when content creators author in systems that make declarative markup visible and use the author’s knowledge to add value to their content, we will be able to make documents sing! And I have the twenty-year-old slides to prove it.
Implementing TEI standoff annotation in the browser
Hugh Cayless, Duke Collaboratory for Classics Computing (DC3)Standoff markup allows you to add information to a text without modifying the source. Often this can be achieved by linking between different documents. Various mechanisms exist for handling the connections involved. But some cases such as named entity recognition appear to require inline markup. Could we do this with standoff markup too? The answer is yes, using the TEI Critical Apparatus model, but it isn’t completely straightforward.
Break
Eating your own dog food
Ari NordströmDeclarative solutions generally—and XML specifically—invite experimentation, iterative development, and play. In this way they encourage the self-described “non-programmer” to build rich models, extensive workflows, and robust systems. But can you build the whole application this way? And if the application is critical to getting paid, do you have the courage to do so? We Swedes are a courageous lot.
Rules for the Rulemakers: JATS4R's Self Guidance on Attributes (LB)
Jeff Beck, NCBI/NLM/NIH, JATS4R Steering CommitteeMaximal flexibility of rules, or ease of reuse — choose one. The tighter the rules, the more consistent documents will be and the easier it will be to reuse them, but only if the rules are reasonable enough to be adopted. (If all the data creators ignore the rules, reuse doesn't get easier.) JATS4R (JATS for Reuse) is a NISO working group devoted to optimizing the reusability of scholarly content by developing best-practice recommendations for tagging content in JATS XML. The group has devoted particular attention to the flexibility/reuse tradeoff for rules on attribute use and controlled values, and we eventually decided that we needed some rules for ourselves, on how to write rules for attributes in our recommendations. In the process of developing our guidance document for writing rules for attribute values in our recommendations, we learned (or at least articulated) some things along the way.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Encore Presentation:
Merging The Swedish Code of Statutes (SFS)
Ari Nordström, Karnov Group
Paper available in proceedings of XML Prague
Application of Brzozowski derivatives to JSON Schema
Mary Holstege, MarkLogic CorporationIn 1964, Janusz Brzozowski defined a new technique for computing whether a string of symbols is in the language defined by an extended regular expression. Brzozowski derivatives have been used for content model validation in several XML schema processors; they can also be applied to the task of model validation for JSON Schema. As it turns out, applying them to JSON Schema requires several extensions to cover “type-tagged” expressions, which sheds light on certain interesting matching problems outside the original problem scope of JSON Schema validation.
Break
XProc 3.0 (LB)
Achim Berndzen, <xml-project />Geert Bormans
Gerrit Imsieke, le-tex
Norm Walsh
XProc is an XML pipeline language designed for XML centric workflows. XProc 3 is currently under development. The editorial team believes that the core language specification is in “last call”. XProc 3.0 is designed to improve the usability of XProc. Features include: handling XML, text, binary, and JSON documents, text value and attribute value templates, typed variables and options using XDM 3.1, and a lot of shortcuts. XProc’s language design is still about encapsulated data processing steps with defined inputs, outputs, and options. What sets it apart from other scripting languages, Make, and Ant is this: It is a truly functional language with immutable inputs and state. This allows composition of arbitrarily complex steps without risking unexpected side effects and without jeopardizing manageability. In contrast to other functional languages, it offers multiple return “values” (on the named output ports) that don’t have to be consumed at once, or at all. Apart from becoming less verbose, XProc 3.0’s major strength is that JSON, text, HTML and binary data are now first-class citizens, making it suitable for data processing in the Web age. In addition to describing the new XProc 3.0 we will show code (including both XProc 1.0 and 3.0) and demonstrate XProc tools.
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. We'll have coffee, a comfortable place to talk, and possibly a toy or two worth a look.
Wednesday, July 31, 2019
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
Text and markup processing languages, past, present, and future
Sam WilmottProgramming language design is in continual flux, with significant new languages coming along every few years. In the field of text and markup programming languages, things seem stable at the moment, with XSLT in a dominant position and a few other languages filling in the gaps. But text and markup processing is no more exempt from change than any other field. What should the next language for this application domain look like? Can we make text and markup processing easier than it is now? What direction should we take? For the last ten years or so, I have been working on this problem. I have a plan.
“With one voice”: streamlining character data for tokenization
Ashley M. Clark, Northeastern University Women Writers Project and the Digital Scholarship GroupSome full-text search and textual analysis tools operate exclusively on sequences of tokens. Deriving input for these tools from XML documents can be challenging and depends heavily on the encoding practices and assumptions which produced the XML. Does metadata information, for example, carry the same weight as the text? If a document includes annotations about nuances of the transcription, including those annotations may aid researchers attempting to find relevant documents, but may hinder a process that is performing textual analysis of the work authored. Rather than attempting to make all tools powerful enough to deal with these issues, a modular approach to tokenization has been developed.
Break
Graphical user interfaces in the X stack
Zahra Al-Awadai, Anne Brüggemann-Klein, Christina Grubmüller, & Philipp Ulrich, Technical University of Munich (TUM)“XML Everywhere” isn’t just a slogan: it actually works, up and down the XML application stack. Recent developments, such as the inclusion of custom elements in HTML5, allow the declarative approach of XML to come into the browser/server interaction. XForms, supported by SVG and CSS, can serve as the basis for a graphical user interface. A custom WebSocket element can support client-to-client and server-push communication of XML data. Applications of State Chart XML (SCXML) mean that the “XML Everywhere” approach can be extended all the way to models of operations in an application. Interactive games offer living proof of the stack.
Multitasking algorithms in XForms
John M. Boyer, IBM CanadaVia declarative expressions, XForms simplifies interactive XML data processing, but XForms isn’t just a declarative language. When it’s needed, XForms authors can also rely on event-driven procedural scripting. Best of all, scripted data changes can automatically trigger additional updates from declarative expressions, so authors are free to use the best method for solving each interactive data processing need. With live demonstrations and markup discussions, this presentation will focus on advanced procedural techniques in XForms, event-driven methods for non-blocking procedures and non-preemptive multitasking, and the hybrid combination of procedural and declarative computations. Come to this presentation to see the full power of interactive XML data processing that you can access directly within current web browsers. There’s a lot more to XForms than you might have expected!
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Encore Presentation:
Using BITS for conference paper conversion
Alexander B. Schwarzman and Jennifer Mayfield, Optical Society of America
Paper available in proceedings of JATS-Con 2019
We created document dysfunction. It is time to fix it. (LB)
Jean Paoli, Docugami Inc.Some of us building software need to take a hard look in the mirror. For years, we have promised that technology would solve the world’s information management problems, but 85% of business information is still “dark data,” with potentially useful insights lost in a rising tide of disconnected documents, emails, Slack conversations, voice-to-text messages, etc. We need an effective approach to documents and want to start a public conversation about these issues. We believe that effective solutions should be based on: Declarative Markup; AI sympathetic to “Small Data”; focus on company-specific documents; applying AI to documents as a whole; and solutions that do not disrupt existing workflows or require massive investment. The future isn’t about AI making human beings obsolete; the future is about AI making human beings and companies more productive, effective, and creative.
Break
Do we really want to see markup?
James David MasonMarkup fanatics have long cried, “We need to see the markup!” Yet since the earliest stages of developing the SGML standard, there has been an urge even among standards developers to avoid having to write tags everywhere. The recent urge to create “Invisible XML” is but the latest symptom of a smoldering disease, from which I, too, suffer.
Aparecium: an XQuery/XSLT library for invisible XML
C. M. Sperberg-McQueen, Black Mesa Technologies LLCThis paper introduces Aparecium, a library intended to make the use of “invisible XML” convenient for users of XSLT and XQuery. Invisible XML, a method for treating non-XML documents as if they were XML, holds great promise for immediately and easily bringing our array of XML technologies to bear on the non-XML data that we encounter (CSS, wiki markup, domain-specific notations, JSON, LaTeX, etc.). Aparecium uses an Earley parser to ensure that any context-free grammar can be used.
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. Will someone bring out a card game this evening?
Thursday, August 1, 2019
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
Encore Presentation:
XSLT 3, fn:Transform() and XProc: which, when, why
Liam Quin, Delightful Computing
Based on a talk delivered at XML Prague
Encore Presentation:
A TEI Customization for using TEI Customizations
Syd Bauman,
Northeastern University DSGPreviously given at TEI 2017
Break
SCAP composer: a DITA Open Toolkit plug-in for packaging security content
Joshua Lubell, National Institute of Standards and TechnologyThe Security Content Automation Protocol (SCAP) schema for source data stream collections standardizes the requirements for packaging XML security content into bundles for easy deployment. SCAP bundles must be self-contained such that each bundle contains all necessary information without external references, and reversible such that XML components are unmodified when unbundled and re-bundled into new collections. These requirements (along with the need for very long, globally unique identifiers) make authoring the content and bundling a challenge. SCAP Composer, an authoring product that uses a DITA specialized element type for source data stream collections, makes the authoring process easier. SCAP Composer takes an incremental approach to aiding SCAP content authors: it helps only with creating source data stream collections; it does not offer any help with creating the XML resources encapsulated in a data stream collection. SCAP Composer is implemented using the DITA Open Toolkit and can be used with any DITA authoring software that includes the Toolkit, or with a standalone Toolkit.
Accessibility: Not just a good idea (LB)
Chandi Perera, TypefiAround 15% of the global population has a permanent disability, including approximately 285 million people with a visual impairment and an estimated 700 million people with dyslexia, the most common form of learning disability. The World Blind Union estimates less than 10% of published works are made into accessible formats in developed countries which drops to less than 1% in developing countries. As mark-up professionals and content models experts there is a lot we can do make a positive impact making more content accessible. This session will look at the accessibility, our social, ethical and legal responsibilities around content accessibility and what we can do to make content more accessible.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Balisage Bard
Lynne Price, GamemasterOnce again, Balisage Bard gives you the opportunity to exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit for each presentation. As many submissions as time permits will be taken; authors will be called in the order they sign up (there will be a sign-up sheet at conference registration). If time permits, additional volunteers will be accepted during the game.
Encore Presentation:
Bespoke, Bewildered, and Bebothered
Debbie Lapeyre, Mulberry Technologies, Inc.
Originally delivered at XML London 2017
Extending vocabularies: the rack and the weeds
Liam Quin, Delightful ComputingMarkup languages such as XML, JSON, and SGML divide documents into two parts: markup and content. While in theory markup could be created ad hoc for every document, this would mean that markup had no meaning (and thus no value) to anyone but the creator of the document. In order to realize the value of marked up documents for interchange and longevity, we create, write documentation for, and share markup vocabularies. Vocabularies are created in specific contexts and for specific purposes. Like all human constructs, they are flawed and need to be repaired and changed over time. As people bump up against the limitations of their markup vocabularies, they often want to extend those vocabularies. Understanding these processes requires sensitivity of the human needs involved and the social contexts in which people interact with and around the vocabularies. This paper characterizes some of these contexts and their properties, and in the light of this characterization describes changes to vocabularies both successful and unsuccessful.
Break
You're not the POS of me: the centrality of markup for part-of-speech tagging (LB)
Bethan Tovey, Prifysgol Abertawe (Swansea University)Part of speech tagging, labeling every token in a text with its grammatical category, is a complicated business. Natural language is messy, especially when that language consists of social-media conversations between bilinguals. The process can be done with or without human intervention, in a supervised or unsupervised manner, on a statistical basis or by the application of rules. Often, it involves a combination of these methods. It is, on the one hand, an obvious markup problem: mark up the tokens with appropriate grammatical categories. But it is also much richer than that. Theoretical problems that have been identified in the domain of markup can throw light on the problem of grammatical category disambiguation. Topics considered include subjectivity and objectivity, the semantics of tag sets, licensing of inference, proleptic and metaleptic markup, and the interesting characteristics of the Welsh “verbnoun”
Encoding
Allen H. Renear, University of Illinois at Urbana-ChampaignIn their model of digital objects, David Dubin and others postulate three entity types (propositions, symbols, and documents) with three relationships: “expresses”, “encodes”, and “inscribes”. We can “express” an assertion with a sentence. We can also “inscribe” symbols in physical media. I’d like to investigate the cascade of “encodings” that we find in every digital computing system, and the articulation of those encodings that is bound up in everything we do. Encoding can be recursive, but do we really understand it? What is happening when we encode a sentence as a character string? A character as an integer? An integer as an octet? Is encoding a well-understood linguistic or mathematical relationship? Is encoding just a mapping (function)? Is it the same as the relationship between a name and its referent? Is it the same as the relationship between a sentence and the proposition it expresses? I don’t think so. So let’s explore some possibilities.
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. We might be talking about markup or the organization of electronic materials, but we might just as easily be talking about astronomy, butterflies, scuba diving, antique cars, or ... something else entirely.
Friday, August 2, 2019
Breakfast
Join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
The Open Security Controls Assessment Language (OSCAL): schema and metaschema
Wendell Piez, National Institute of Standards and Technologies / Information Technology LaboratoryThe Information Technology Lab at NIST is developing technical standards for documentation related to systems security. The Open Security Controls Assessment Language (OSCAL) defines lightweight schemas, along with related infrastructure, for tagging system security information to support routine tasks like crosschecking, validating against arbitrary constraints, and producing punchlists. OSCAL is not conceived as “another big XML application” but as a metaschema. This approach allows us to simplify the design and maintenance of schemas and related tooling; support generation of documentation; produce multiple parallel schemas for XML, JSON, and YAML; and construct conversion tools more easily. Documents and tools leverage basic HTML, or even Markdown, for simplicity even though it limits the complexity of what can be directly imported. Conversion is simplified by the metaschema approach, even when multiple schemas apply to a single data collection. We hope that these simplifications will lead not only to more documents but also to more useful documents.
Loose-leaf publishing using Antenna House and CSS
Eliot Kimber, Contrext, LLCLoose-leaf publishing is the ability to typeset and print only the pages in a document that have changed since its last publication. This presents many interesting challenges. We developed a loose-leaf publication system using Antenna House Formatter, CSS for pagination, and XSLT for post processing the area tree into “change packages” which include only the changed pages. Both the CSS markup and the publication workflow warrant a closer look.
Break
Reese’s Peanut Butter Cups and eXist-db: integration of XML databases and content management systems in digital editions
David J. Birnbaum, University of PittsburghHugh Cayless, Duke University
Emmanuelle Morlock, French National Center for Scientific Research (CNRS)
Leif-Jöran Olsson, University of Gothenburg (Sweden)
Joseph Wicentowski, Office of the Historian, US Department of State
We have identified four models for integrating digital edition content into eXist-db: TEI Publisher; the eXist-db app framework using HTML templating; the eXist-db app framework without HTML templating; and Apache and PHP mediating between the user and eXist-db, so that eXist-db provides only XML database services. We examine and compare these ways of conceptualizing and implementing the infrastructure for a digital edition. Each of them has advantages and disadvantages, primarily from the perspective of sustainability. Our considerations apply to edition frameworks generally and are therefore not specific to eXist-db.
Thinking, wishing, saying
C. M. Sperberg-McQueen, Black Mesa Technologies LLCCan we have rules for our documents we cannot write down in a schema language? If a conformance requirement is not mechanically checkable, is it a conformance requirement? If a rule is not testable, is it a rule?
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Relax at the Cambria and enjoy talking about markup over lunch. For participants who must rush off, wrapping materials and bags are supplied so you can take your sandwich with you to enjoy in the cab or at the airport (but do not eat on Metro!).