Balisage 2018 Program
Monday, July 30, 2018
symposium: Markup Vocabulary Ecosystems
Tuesday, July 31, 2018
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
Welcome and Introductions
YAMC? Why are we here? Why are we here again?
B. Tommie Usdin, Mulberry TechnologiesThere is nothing new about markup, or even generic markup. (I have been working with generic markup for 40 years!) So what is there to talk about after all this time? What are we accomplishing by gathering at Balisage: The Markup Conference? Why do some of us find events like this one valuable? What can you do to make it valuable to you and to the others here? Not only is markup old hat, XML is 20 years old, and some people in the outside world keep trying to tell us that its time has passed.
Groups are still gathering to create shared markup vocabularies in order to enable high quality information sharing. Scholars are using bespoke markup vocabularies to enable them to focus on the works they are reading, interpreting, and writing. Trendy end user displays are being populated by solid maintainable XML content. An ever improving tool set is available to users of marked up documents. We learn from each others’ projects, tools, techniques, and experiences — and enjoy the process!
In praise of XML
Steven Pemberton, CWIIt’s about sixty years since the start of public computing; fifty years since the term “software crisis” was coined; Europe is celebrating thirty years of the internet this year; and we’re celebrating twenty years of XML this year too. It is a milestone year for XML, and an important juncture as well. The last W3C XML WG has finished its work. What next? Where should we be heading, what do we want to achieve next, and how should we do it?
Break
Copy-fitting for fun and profit (LB)
Tony Graham, Antenna HouseCopy-fitting is the fitting of words into the space available for them or, sometimes, adjusting the space available to fit the words. Copy-fitting is included in “Extensible Stylesheet Language (XSL) Requirements Version 2.0” and is a common feature of making real-world documents. This talk describes an ongoing internal project for automatically finding and fixing problems that can be fixed by copy-fitting in XSL-FO.
Documentation of XSLT stylesheets with code intelligence
Vasu Chakkera, Sapient CorporationWe benefit more from documenting why certain functionality was implemented, or coded in a particular way in an XSLT stylesheet, than from the typical “what the code does” comment. K7:XSLTDocuMentor is a personal project (non-commercial) to create XSLT stylesheet documentation from both inline stylesheet comments and documentation living outside the stylesheet. The external documentation lives in XML files, written in a variant of DocBook, that are generated by script and populated by XSLT analysts. These files are then used to generate configurable HTML documentation that provides the text as well as 1) hyperlinks to named templates, global variables and functions, imported/included templates and 2) reports of code violations such as potentially overridden functions, single-expression <xsl:choose>s, unused variables, and the like. Code violation criteria are defined in user-configurable rule sets.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
The Markup Declaration (LB)
Bethan Tovey, Prifysgol Abertawe / Swansea UniversityNorman Walsh
The Markup Declaration is an idea that grew from the 2018 Markup UK conference in London. Markup is at an interesting point of development. Reports of its death have been greatly exaggerated, but the slow drift away from declarative markup towards procedural or imperative approaches makes it necessary, in our view, to make a statement about what markup practitioners can do, what they should do, and what they should try to avoid. This talk will place generic markup into a broader context, showing that the mindset behind declarative markup has in fact characterised human approaches to text since ancient times. We suggest some ways that the community could build on this inbuilt tendency to mark up our data, using non-technical forms of markup as metaphors to encourage greater engagement with markup languages outside their traditional user groups. We also discuss how a potential expansion in the user base for markup languages must be predicated on a more principled understanding of what markup is - hence the Markup Declaration. Finally, we present some initial attempts at writing parts of the Declaration, and ask for audience feedback and input.
Implementing and using concurrent document structures
C. M. Sperberg-McQueen, Black Mesa TechnologiesMarkup necessarily expresses a view of the text that it marks up; it codifies the boundaries of structures and their interrelationships in a precise way. It is incontrovertibly the case that multiple structures exist in many documents; lines of verse and sentences within them, for example. In cases where these different structures are applied to the same text, when the collected sequences of characters in the leaf nodes are the same, it is possible to document the structures concurrently. The syntax, and the possibilities of data models that support both dominant and recessive hierarchies, open up interesting avenues for exploration.
Break
Using Excel spreadsheets to communicate XML analysis to subject matter experts
Betty Harvey, Electronic Commerce ConnectionWhat is the best approach for analyzing large XML datasets? Reading thousands (or possibly millions) of pages of raw XML to fully understand the markup constructs is not feasible. CSS stylesheets are useful for displaying XML, but that’s not practical at large scales either. Creating Excel spreadsheets to hold analysis information is a very useful tool for understanding the full range of XML data constructs. This approach is also understandable to the stakeholders who control the datasets. This paper will describe an approach for creating document analysis Excel spreadsheets using XSLT and XML datafiles bundled using XLink master documents.
In defense of style guides
Ari Nordström, Karnov GroupThe markup world has for more than four decades obsessed over document structures and schema languages for representing and validating those structures. From time to time, markup enthusiasts have managed to pry themselves away from schema languages long enough to create languages for manipulating structures and rendering those structures on presentation surfaces. But they’re missing the point! To be sure, all this language development has enabled mechanical processing of data, but there’s no assurance that it will make the data comprehensible. That’s where real style guides and the editors who apply and enforce them come in. It’s all about supporting semantics, the real information of which all that markup should be the servant. Come pay your respects to Strunk and White!
Reception
Please join us for cheese, wine, and conversation!
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. We'll have desserts, coffee, a comfortable place to talk, and possibly a toy or two worth a look.
Wednesday, August 1, 2018
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
TAGML: A markup language of many dimensions
Ronald Haentjens Dekker, Elli Bleeker, Bram Buitendijk, & Astrid Kulsdom, KNAW Humanities ClusterDavid J. Birnbaum, University of Pittsburgh
The virtues and limitations of the XML tree paradigm have been discussed and criticized ad infinitum, but a more general question is how any model and markup language (need to) align with the functional requirements of an intuitive and effective workflow. How should we make decisions about document modeling? The TAG document structure, the TAGML markup language, and the Alexandria TAG reference implementation point toward a combination of model, syntax, repository, and workflow that begins to offer users an integrated framework for expressing their interpretation of the structural properties of text and document.
Metaphors we code by: Taking things a little too seriously
Mary Holstege, MarkLogic CorporationComputer information and software are abstractions. We comprehend them through the use of metaphors. Different metaphors lead us to understand our information and our processing of it in different ways. They lead us to focus on certain aspects of the experience over other aspects. We use metaphors in talking about markup, but rarely think about them. Being mindful of what our metaphors are telling us implicitly, allows us to see what we are missing. By taking the metaphor a little too seriously, we can look to the non-metaphorical domain as a source of inspiration for good practices.
Break
A lite DITA+ model for technical manuals (LB)
Pradeep Jain, IctectJoe Gollner, Independent Advisor, Gnostyx Research Inc
Recent work with two major healthcare organizations on DITA XML revealed two somewhat contradictory requirements: (1) The DITA must be lightweight so that it is easy for non-technical users to understand, and (2) Additional elements and attributes were needed to manage organization-specific information. The content architecture team developed a method for creating derivatives of the DITA XML schema that accommodate these two requirements. The final XML content is compliant with DITA, and therefore allows open source and commercial software systems to be used. This presentation will describe the process and present some examples. We call it “Lite DITA Plus”!
Encore Presentation
The originally scheduled speaker is unable to present at Balisage. Several participants have offered to repeat presentations that had been given at previous events and the attendees have selected one.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
The journey of “The History of the Accademia di San Luca, c. 1590–1635” into and out of XML
Peter M. Lukehart with support from Benjamin Zweig, National Gallery of ArtWhen you have spent years building a rich site based on the guidelines established by the Text Encoding Initiative, why would you reduce it to HTML? While our project on documentation of the Accademia de San Luca in Rome, one of the first artists’ academies in Europe, was performing well for the research community, we found ourselves in crisis mode when our Web architect suddenly died. Our difficulties were magnified when our host, the National Gallery of Art, changed its platform and required that all projects be based in HTML. Although we may lament the loss of some functionality with our abandoning TEI, we realize we have gained through greater interoperability with other NGA sites and moreover with simplification of input from contributors who are not TEI specialists.
Panel discussion: Why successful XML/SGML projects are reimplemented or decommissioned
James MasonBob Yencha
We’ve seen it happen again and again. The responsible parties survey the available technologies, choose XML/SGML, and complete the project successfully. The solution meets the (original) requirements and solves the (initial) problems. And then at some point, for whatever reasons, the project is reimplemented with other technologies or decommissioned entirely. Perhaps it’s XRX (XML, REST, XQuery) applications rewritten in Ruby. Perhaps it’s an XML application using XQuery and XSLT reworked in Javascript and HTML. There are plenty of examples of projects first built in XML/SGML and then rebuilt using non-XML technology or retired. Is this a problem? Is it an opportunity? Is it BOTH a problem and an opportunity? How can we ensure that our projects can survive not just changes in technology but changes in organizational techno-culture?
Break
Stand-off bridges in the Frankenstein Variorum Project
Elisa E. Beshero-Bondar, University of Pittsburgh at GreenburgRaffaele Viglianti, Maryland Institute for Technology in the Humanities at the University of Maryland
The Frankenstein Variorum Project works with multiple editions of a single novel, originating in several divergent markup systems. To reconcile these editions, we have had to flatten the original hierarchical structures and identify low-level units of lateral intersection, points shared in common across editions, in order to construct “bridge” or intermediary formats that can be compared automatically. We transform the output of the comparison into a TEI format we call stand-off parallel segmentation, in which stand-off pointing mechanisms operate like a switchboard: they connect the individual editions, which for the most part can remain undisturbed by the comparison process. The TEI “stand-off bridge” can help overcome the silo effect of specially encoded editions. Far from being an ephemeral support structure, the stand-off bridge provides a “backbone” for the variorum project because it improves the interoperability and interchangeability of all the markup ecosystems involved. The stand-off bridge allows us to reconstitute the hierarchies in a way that expresses intersections essentially as a graph structure of nodes with edge pointers to comparable nodes.
Markup ethics: Trolley problems for text encoders
Allen H. Renear, University of Illinois – Urbana-ChampaignWe are engineering; we are solving problems, improving reliability, effectiveness, efficiency. But more generally encoding decisions determine whether, how, how much, when, and for whom the information in our documents will be useful. This seems to be important not just instrumentally, but with respect to larger human interests as well, or even to the very largest human interests. Just how else to explain the earnestness, anger, fear, and tears one sees at Balisage? But problems abound, and some tradeoffs appear not just incalculable, but incommensurable. Left track or right?
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. Will someone bring out a card game this evening?
Thursday, August 2, 2018
Conference Registration & Breakfast
Pick up your conference badge in the Gleason Boardroom and join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
Flattening and unflattening XML markup (LB)
David J. Birnbaum, University of PittsburghElisa E. Beshero-Bondar, University of Pittsburgh at Greenburg
C. M. Sperberg-McQueen, Black Mesa Technologies
From time to time, it may be necessary or expedient to flatten our XML documents by replacing the start- and end-tags of conventional XML content elements with empty place-marker elements (variously known as milestone elements or as Trojan horse markup). When we do, we will often wish, later, to restore the content elements we flattened. The purpose of this late-breaking presentation is to present a survey of ways to perform the task of unflattening or of raising: restoring a conventional XML element structure of content elements from a flattened XML document instance (or part of one), and comparing different solutions to see what we can learn from them.
White-hat web crawling: Industrial strength web crawling for serious content acquisition
Mark Gross, Tammy Bilitzky, Rich Dominelli, & Allan Lieberman, Data Conversion Laboratory (DCL)Much original source material today appears only on the web or with the web version as the copy of record. We have been developing methods and bots to facilitate high-volume data retrieval from hundreds of websites, in a variety of source formats (HTML, RTF, DOCX, TXT, XML, etc.), in both European and Asian languages. We produce a unified data stream which we then convert into XML for ingestion into derivative databases, data analytics platforms, and other downstream systems. We will examine the thought behind our approaches, the analysis techniques we used to detect and deal with website and content anomalies, our methods to detect meaningful content changes, and our approaches to verification.
Break
Dynamic style
Steven J. DeRose, ConsultantMany capabilities of hypertext can be realized by declarative markup and styling technologies. This has been made evident on the web through the widespread adoption of HTML markup styled with CSS properties. But sometimes you need to reach beyond declarative limitations to enable truly interactive hypertext. This, also, can be seen on the web through the near ubiquity of JavaScript. But today’s JavaScript solutions are stand-off. Using them requires knowing how to navigate the structure of the document and find the relevant elements. What if JavaScript could be employed from within CSS? Could this simplify things? How would it make long-neglected hypertext capabilities easier to achieve?
CETEIcean: TEI in the browser
Hugh Cayless, Duke Collaboratory for Classics Computing (DC3)Raffaele Viglianti, Maryland Institute for Technology in the Humanities (MITH)
The typical method for displaying a TEI document on the web is to use XSLT or XQuery to pre-transform it into static HTML or dynamically transform it when requested. CETEIcean is a Javascript library designed to render TEI XML directly in a modern web browser using custom elements. CETEIcean was developed to support a lightweight TEI presentation workflow that requires neither pre-display document transformation nor a complex server-side architecture. This makes possible a distributed web-based document preparation workflow. Method explained; examples shown; limitations discussed.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Balisage Bard
Lynne Price, GamemasterOnce again, Balisage Bard gives you the opportunity to exercise your literary creativity with poems, short stories, jokes, and songs. Subject matter must be related to Balisage (markup, venue, papers, and so forth). Read your effort during the game session. Translations of works in languages other than English are not required but will be appreciated. There is a two-minute time limit for each presentation. As many submissions as time permits will be taken; authors will be called in the order they sign up (there will be a sign-up sheet at conference registration). If time permits, additional volunteers will be accepted during the game.
Easing the road to declarative programming in XSLT for imperative programmers
Abel Braaksma, AbrasoftProgrammers who learned their trade in mainstream languages like C, C#, Java, Python, PHP, Objective-C or Ruby sometimes find it challenging to switch their mindset from such imperative languages to the declarative nature of XSLT. In imperative languages you tell the computer what to do, step by step. In declarative and functional languages, you tell the computer what result you wish for, and how that depends on your input. You guide the processor with a soft hand and give it suggestions, instead of imperatively making finite decisions for the compiler one by one. There’s no need to become a fully fledged functional programmer and understand all its paradigms before you can be relatively versatile with writing effective XSLT stylesheets. After mastering the basics of the declarative mindset and learning to think not in opening and closing tags, but instead in trees and traversals, we can work with XSLT stylesheets without resorting to frustratingly deeply nested <xsl:if> and <xsl:choose> elements. I hope to help both seasoned XSLT programmers and interested interpretive programmers to not be afraid of the wolf.
XForms 2.0: What's new (LB)
Steven Pemberton, CWIXForms was originally designed as a new XML-based markup language for forms on the web, and version 1.0 was just that. However, after initial experience, it was realised that the design had followed HTML too slavishly. With some generalisation XForms became more powerful in the form of XForms 1.1, a Turing-complete language that could still do forms, but very much more as well. Rather than procedural or functional, XForms's programming model is declarative, and this has proven to be very successful in reducing the time and costs of producing applications, typically by a factor of ten. XForms 2.0 is a new version of XForms in preparation by the working group, and continues the generalisation. This talk will present what has changed since version 1.1.
Break
An adventure with client-side XSLT to an architecture for building bridges with javascript
Katherine Ford & Will Thompson, O’Connor’sXML technologies launched with aspirations to bring the Web and XML closer together. Twenty years later, their communities operate primarily in technology silos. It's JavaScript or XML. However, through the long process of building a new feature, we discovered that this is not necessarily the end of the story.
Our desire to utilize client-side XSLT led us on an adventure, putting existing browser-based XSLT engines to our task and uniting them with a modern, functional JavaScript architecture. Opportunity exists for a future that breaks down silos, and unrealized dividends await developers who reject the one-size-fits-all Web and embrace the right technology for the job. It's JavaScript and XML and more.
Fractal information is
Wendell PiezWe wrestle often with the granularity of data formats, object models, interfaces, and APIs: their strengths, their weaknesses, and the supports they provide to creators and consumers. Opinion is often muddled or extrapolated from limited experience: “X is lightweight”, “Y is ‘self-describing’”, “everyone prefers Z”. This is a fractal experience; there is self-similarity across scales. Issues that arise at one level of the system have weird echoes elsewhere. Indeed, one way of discriminating among options (XML, HTML, Markdown, JSON, YAML, SAX, DOM, etc.) is to consider their different approaches to the problem of managing the chaos and representing (ir)regularity. This examination leads to a better understanding of how to exploit their differences to make them work better together.
Balisage Hospitality
Stop in to the Balisage Coffee and Conversation room. We might be talking about markup or the organization of electronic materials, but we might just as easily be talking about astronomy, butterflies, scuba diving, antique cars, or ... something else entirely.
Friday, August 3, 2018
Breakfast
Join us for breakfast in Baker before taking your seat in Sinequa, the conference room.
PreTeXt: An XML vocabulary for scholarly documents
Robert A. Beezer, University of Puget SoundNeed to write a textbook or scholarly article for mathematics and the physical sciences? Have too many special challenges for existing vocabularies like DocBook and TEI? Try PreTeXt instead. Researchers in the sciences are often comfortable with LaTeX, but it lacks the flexibility of XML for repurposing text for multiple outputs, particularly those based on HTML. PreTeXt combines an easy-to-learn XML vocabulary with simple escapes to TeX notation and graphics formats common in the sciences. While PreTeXt borrows common elements like paragraphs and lists from HTML, it adds elements appropriate to the subject matter, like “theorem” and “proof”. Supported by an online community, PreTeXt has already been used for dozens of books.
How are dependent works realized?
Jacob Jett & David Dubin, University of IllinoisWhen a work of authorship is published in a new edition, what exactly is the relationship between the edition and the contribution of the author or authors? Specifications in the FRBR family offer contrasting accounts of how we should understand the relationships among the edition, its text, and the work of authorship realized by the both of them. The intellectual contribution of markup in a digital edition adds a further wrinkle.
Break
Scaling XML using a Beowulf cluster
John J. Chelsom, Seven InformaticsJay H. Chelsom, Abingdon School
Scale up or scale out? A large, XML-centric application such as the cityEHR health records system lends itself naturally to implementation with XML technologies: XForms, REST, and XQuery. One thing is certain about records systems: there will always be more records. Eventually, the system will outgrow its initial environment — and, in the long run, the bigger one that replaces it. Scaling out becomes the only answer. Can you scale a mission-critical medical records application on a cluster of Raspberry Pi computers in a Beowulf cluster? The possibly surprising answer is “yes”.
Why are we here?
C. M. Sperberg-McQueen, Black Mesa TechnologiesSometimes our technological specifications give rise to unexpected ecological niches which in turn give rise to unexpected communities.
Lunch
Please check computer bags, backpacks, brief cases, suitcases, and other bags and bundles with conference staff in the Gleason Boardroom. Lunch is a serve-yourself buffet with limited space.
Relax at the Cambria and enjoy talking about markup over lunch. For participants who must rush off, wrapping materials and bags are supplied so you can take your sandwich with you to enjoy in the cab or at the airport (but do not eat on Metro!).