|
Balisage 2009 Program
Tuesday, August 11, 2009
|
Tuesday 10:00 am - 10:30 am
(FP) Standards considered harmful
Tommie Usdin, Mulberry Technologies
Standards and shared specifications allow us to share data, build
general purpose tools, and significantly reduce training and
customization costs and startup time. That is, the use of
appropriate specifications can help us reduce costs, reduce startup
time, and increase quality, usability, and reusability of content.
Some vigorous standards proponents insist that the more standards
used the better. To them I say “mind your own business and
let me mind my own store”. They argue that using standards is
always the right thing to do, because it enables re-use and
interchange. Maybe so. But adoption of a standard that supports an
activity that is not central to your mission is a distraction, an
unwarranted expense, a bad idea.
|
Tuesday 11:00 am - 11:45 am
XML in the browser: the next decade
Alex Milowski, Appolux
XML in the browser was first demonstrated by Netscape in 1999.
Since then, XML has become ubiquitous and browser technology has
matured into a platform for delivery of complicated services and
applications, based largely on a combination of HTML and Javascript
— which does not quite match the original vision of
ubiquitous delivery of information over the web via specialized XML
documents. All major browsers have the needed core technologies,
yet XML applications can’t use them the way HTML applications
can. We can deliver XML content augmented with application
semantics within some of today’s browsers, although there are
some limitations. These limitations and what can be done to
overcome them in the near term are discussed, as are future
directions. And, of course, there will be a cool demo or two.
|
Tuesday 11:45 am - 12:30 am
(FP) Automatic XML namespaces
Liam Quin, W3C
The XML community has lived with XML namespaces for a decade.
They are useful to the point of seeming indispensable, they are
ubiquitous, and yet they are at the same time unwieldy and flawed.
Namespace declarations can be inconvenient to remember, and errors
in them are frequently the source of subtle and hard-to-diagnose
errors. From a programming perspective, namespaces provide scope
and disambiguation; from a document authoring perspective,
namespaces provide headaches. By introducing a single new feature
namespace declarations could be simplified and namespace
functionality enhanced without losing the existing benefits of
namespaces. Let’s talk about making namespace lemonade from
namespace lemons.
|
Tuesday 2:00 pm - 2:45 pm
Engineering document applications — from UML models to XML schemas
Dennis Pagano &
Anne Brüggemann-Klein, Technische Universität München
UML models for documents need to be exchanged like other models.
XML Metadata Interchange (XMI) satisfies this interchange
requirement by representing UML models as XML documents. But it
seems to us that UML class diagrams, which model persistent data,
are more closely aligned with XML schemas, which model the XML
representation of persistent data as documents. The Object
Management Group (OMG) has defined a method to turn models written
in its MOF modeling language into equivalent XSD schemas using XMI.
Since MOF models can be considered to be a specific case of UML
models, we patterned our method (named uml2xsd) after the OMG
translation of MOF models into XSD, extending it to concepts
present in UML class diagrams but not in MOF models. Our tool
uml2xsd transforms an XMI representation of a UML class diagram
into an XSD schema that constrains XML instances of the UML
model.
|
Tuesday 2:00 pm - 2:45 pm
(LB) An XML user steps into, and escapes from, XPath quicksand
David J. Birnbaum University of Pittsburgh
The otherwise admirable and impressive eXist XML database sometimes fails to optimize queries with numerical predicates. For example, a search for $i/following::word[1] retrieves all following <word> elements and only then applies the predicate as a filter to return only the first of them, which can be enormously inefficient when $i points to a node near the beginning of a very large document, with many thousands of following <word> elements. As an end-user without the Java programming skills to write optimization code for eXist, the author describes two types of optimization in the more familiar XML, XPath, and XQuery, which reduce the number of nodes that need to be accessed and thus improve response time substantially.
|
Tuesday 2:45 pm - 3:30 pm
Prying apart semantics and implementation: Generating XML
schemata directly from ontologically sound conceptual models
Bruce Todd Bauman U.S. Department of Defense
Models produced with mainstream modeling languages (e.g. UML, ERD) or
directly in implementation languages (XSD, RDFS, OWL) reflect technology
specific design decisions. These modeling languages both obscure the
expression of domain semantics, and inherently limit the potential for model
reuse in other designs. Frustrated by this, we have started to use an
ontological profile of UML defined by Giancarlo Guizzardi in “Foundations
for Structural Conceptual Models” (2005) to create conceptual models that
document a shared, implementation-neutral understanding of a domain targeted
for human understanding. Physical models are generated from these conceptual
models and annotated with encoding directives that custom software uses to
compile an XSD. By way of example, this talk introduces the constructs of
the conceptual modeling language, the physical XSD annotations, and XSD
complier.
|
Tuesday 2:45 pm - 3:30 pm
Formal and informal meaning from documents through
skeleton sentences: Complementing formal tag-set descriptions with intertextual semantics and vice-versa
Yves Marcoux, Université de Montréal,
C. M. Sperberg-McQueen, Black Mesa Technologies,
&
Claus Huitfeldt, University of Bergen
What do we mean when we add markup to a document? Proponents of
two approaches to markup semantics (formal tag-set
description and intertextual semantics) show
how these two approaches can be combined to generate analytical
tools for documents. With examples of increasing complexity, they
demonstrate how an intertextual semantics approach can generate the
materials for building a formal tag-set description.
|
Tuesday 4:00 pm - 4:45 pm
(LB) Akara - Spicy Bean Fritters and an XML Data services Platform
Uche Ogbuji, Zepheira
Akara is an open-source XML/Web mashup platform supporting XML
processing in an environment of RESTful data services. It includes “Web
triggers”, which build on REST architecture to support orchestration of
Web events. This is a powerful system for integrating services and
components across the Web in a declarative way, so that perhaps a Web
request could access information from a service running on Amazon EC2 to
analyze information gathered from social networks, run through a remote
spam detector service. Akara is designed from ground up to support such
rich interactions, using the latest conventions and standards of the Web
2.0 era. It’s also designed for performance, modern processor
conventions and architectures, and for ready integration with other
tools and components.
|
Tuesday 4:00 pm - 4:45 pm
Documents cannot be edited
Allen H. Renear &
Karen M. Wickett, University of Illinois at Urbana-Champaign
What is a document? We often say that they are strings of
characters (perhaps among other things). But strings or
sequences of any kind are extensional objects: timeless, eternal,
unchanging. How can an immutable object be edited?
|
Tuesday 4:45 pm - 5:30 pm
(LB) Visual Designers: Those XML tools with no angle bracket at all!
Jean Michel Cau &
Mohamed Zergaoui, Innovimax
Is the future of XML planned to be without XML? Visual tools are everywhere and XProc might be the first XML dialect to be immediately available with its visual editor. After erratic evolutions, visual tools have become more and more precise (even HTML+CSS tools are now very powerful), and are become more and more main stream. Could we imagine dealing with XML Schema without descent Visual Tools? We will show in this presentation an overview of where we do XML without seeing any angle bracket and the places where we expect to have some equivalent tools soon.
|
Tuesday 4:45 pm - 5:30 pm
(FP) How to play XML: Markup technologies as nomic game
Wendell Piez, Mulberry Technologies
Projects involving markup technologies are game-like: they have
players (teams and individuals), equipment, rules, victories, and defeats. In
many of the markup games we play, the making of the game’s
rules is part of the game itself. When the playing of a game
involves the modification of the game’s own rules, it is said
to be a “nomic game”. The process of legislation, for
example — including the collaborative development of markup
vocabularies and other markup standards — is a nomic game.
This meditation considers how the experiences of earlier nomic games are influencing today’s contests, the far-reaching
influence
today’s nomic games will exert on those to be played later,
and things to consider as we engage each other in the nomic games
of markup theory and practice.
|
Wednesday, August 12, 2009
|
Wednesday 9:00 am - 9:45 am
TEI feature structures as a representation format for multiple annotation and generic XML documents
Jens Stegmann, Bielefeld University &
Andreas Witt, Institute for the German Language (IDS), Mannheim
Annotated texts can usefully be represented in terms of
feature structures — rooted labeled directed
acyclic graphs. The ISO-standard tag set for the representation of
the structural features of texts based on the feature-structure
markup of TEI P5 can be used to integrate sets of annotation
documents, such as different linguistic analyses of a common core;
such an approach is known to facilitate the representation and
processing of overlapping structures. Less frequently discussed is
the possibility that any XML documents and document
sets might be usefully represented in terms of feature structures,
thus making the tools of computational linguists, and specifically
the operations of unification and
generalization, available in XML processing
contexts.
|
Wednesday 9:45 am - 10:30 am
Towards markup support for full Goddags and beyond: the
EARMARK approach
Silvio Peroni, Fabio Vitali, &
Angelo Di Iorio, University of Bologna
For representing overlapping structures, why not use something
designed for graphs? Why not use ... RDF?
EARMARK (Extreme Annotational RDF Markup) uses RDF to encode
non-hierarchical structures (overlap, repetitions, transpositions) which
have been previously addressed by TEI, LMNL, TexMecs, and XConcur,
among others. OWL provides a standardized, well supported
notation for declaring the document model for such complex structures.
EARMARK documents can be translated, of course, into other notations,
using well known techniques for working around restrictions of
existing syntaxes. EARMARK thus provides a unifying model for a wide
variety of phenomena of interest both in markup theory and in
practice. And since it exploits RDF and OWL, Earmark can be processed
conveniently using existing RDF and OWL tools and technologies.
|
Wednesday 11:00am - 11:45am
Markup, meaning, and mereology
Claus Huitfeldt, University of Bergen,
C. M. Sperberg-McQueen, Black Mesa Technologies, &
Yves Marcoux, Université de Montréal
XML markup divides a whole (a document) into parts (elements),
which may themselves be further subdivided in to parts (also
elements). Thus the part-whole relationship is central to our
understanding of XML markup. Mereology is a branch of logic that
deals with theories of part-whole relationships, without reference to the idea of sets and their members and dealing instead with part-whole and sum relationships between individuals. But documents are more complex than mere
part-wholes, and the propagation of properties between the various
parts of a document can follow diverse patterns. Some of these
patterns are difficult to specify along the more commonly used
containment/dominance dimensions of XPath (for example). We
investigate whether some of these patterns can be more conveniently
or usefully described with the formalism of the “Calculus of
Individuals”, a mereological system worked out by Nelson Goodman and
Henry S. Leonard that may have application to marked-up text.
|
Wednesday 11:45 am - 12:30 pm
TNTBase: Versioned storage for XML
Vyacheslav Zholudev &
Michael Kohlhase, Jacobs University Bremen
Version Control systems like CVS and Subversion have transformed
collaboration workflows in software engineering and made possible
globally distributed project teams. Even though XML, as a
text-based format, is amenable to version control, the fact that
most version control systems work on files makes the integration of
fragment access techniques like XPath and XQuery difficult. The
TNTBase system is an open-source versioned XML database created by
integrating Berkeley DB XML into the Subversion Server. The system
is intended as a basis for collaborative editing and sharing
XML-based documents that integrates the versioning and fragment
access needed for fine-grained document content management. Our aim
is to make possible the kinds of workflows and globally distributed
project teams familiar from open source projects.
|
Wednesday 2:00 pm - 2:45 pm
Investigating the streamability of XProc pipelines
Norman Walsh, Marklogic
High-performance XML processing, particularly on very large
documents, requires that processing components be usable in
streamed pipelines. XProc is a W3C specification for describing a
sequence of XML operations to be performed over a set of documents.
The spec imposes no streamability constraints, leaving it up to the
implementation whether or not to stream. A streaming
implementation could be expected to outperform a similar
non-streaming implementation, but not all steps in such a pipeline
may be streamable. So the question arises: Would a majority of
real world pipelines benefit from streaming? As you read this,
comparison data is being collected from thousands of pipeline runs
(where the pipelines were not constructed by the author).
Conclusions will be drawn.
|
Wednesday 2:45 pm - 3:30 pm
You pull, I’ll push: On the polarity of pipelines
Michael Kay, Saxonica
What's the most effective way to move XML data through a processing pipeline? The answer isn't always simple. Control flow in the pipeline can run either with the data flow ("push") or against it ("pull"), reflecting the "push" and "pull" styles familiar to XSLT authors; each is useful in some situations. Mixing them, however, presents challenges: buffering the data leads to latency and memory problems, while using multiple threads leads to coordination overheads. The concept of program inversion, originally developed to eliminate bottlenecks in magnetic-tape-based processes, offers help. In particular, ideas derived from Jackson Structured Programming allow processes written in a convenient pull style to be compiled into push-style code; this can reduce both coordination overhead and latency. |
Thursday, August 13, 2009
|
Thursday 9:00 am - 9:45 am
A toolkit for multi-dimensional markup: The development of SGF to XStandoff
Maik Stührenberg &
Daniel Jettka, University of Bielefeld
The Sekimo Generic Format (SGF) and its successor, XStandoff, use
stand-off annotation to handle overlapping structures in richly
annotated XML documents, while retaining XML compatibility and allowing
the use of standard XML tools like XSLT. This paper describes the
changes introduced by XStandoff and its suite of XSLT stylesheets. These
can be used to create XStandoff instances from documents with inline
annotations, to merge two XStandoff documents over the same primary data
into a single instance, to delete one or more levels of annotation, and
to serialize an XStandoff document in XML with inline markup and
milestone elements.
|
Thursday 9:00 am - 9:45 am
Managing electronic records business objects using XForms and Genericode at the National Archives and Records Administration
Quyen L. Nguyen, National Archives and Records Adminstration &
Betty Harvey, Electronic Commerce Connection
The Electronic Records Archives (ERA) system at the U.S. National
Archives and Records Administration (NARA) is intended to handle large
volumes of electronic records on widely varying topics in many
different formats. It must be extensible, evolvable, and scalable.
Naturally, XML is used where possible. XForms and Genericode are used
within ERA to manage transfer requests, records schedules, and other
archival business objects; they make it convenient to verify
controlled fields against authority lists and to check inter-field
dependencies. This case study outlines the design and construction of
the Electronic Records Archives system and describes how it permits
agile responses to the ongoing evolution of requirements at NARA.
|
Thursday 9:45 am - 10:30 am
Methods for the construction of multi-structured
documents
Pierre-Edouard Portier &
Sylvie Calabretto, Université de Lyon
In recent years, numerous methods have been proposed for
representing complex overlapping structures. But how are
multi-structured documents to be created? This paper
presents methods for creating and interacting with multi-structured
documents using the MultiX2 model. The basic operations have been
implemented using the functional language Haskell; this prototype
implementation will be described.
|
Thursday 9:45 am - 10:30 am
Gracefully handling a level of change in a complex
specification: Configuration management for community-scale implementation of an HL7v3 messaging
specification
Charlie McCay, Ramsey Systems,
Michael Odling-Smee, XML Solutions,
Joseph Waller, XML Solutions, &
Ann Wrightson, Informing Healthcare (NHS Wales)
Change management for a complex specification is always difficult.
When the specification involves the life-and-death issues of healthcare messaging, a full strategy for handling both changes to the specification and the variations in data over time becomes essential. The technical requirement is to make interfaces both flexible and breakable, to accommodate change and enforce necessary compliance. The authors describe an in-depth analytical method and the resulting maintenance process for a key interoperability specification for the English National Health Service.
|
Thursday 11:00 am - 11:45 am
Merging multi-version texts: A generic solution to the overlap problem
Desmond Schmidt, Queensland University of Technology
In XML processing contexts, “multi-version documents” (MVDs, as
proposed in a published 2009 paper by Schmidt and Colomb) can
represent overlapping (separate, partial, conditional) hierarchies
and variations (insertions, deletions, alternatives, and
transpositions) in texts. The MVD data structure allows most
desired operations on texts to be simple and fast. However,
creating and editing MVDs is a much harder and more complex
operation with no approaches known to be both optimal and
practical. The problem is similar to the multiple-sequence
alignment problem in molecular biology. A heuristic algorithm
partly derived from recent biological alignment programs offers
satisfactory speed and quality for creation and editing
operations, which means that MVDs can be considered as a practical
and editable format suitable for overlapping structures in digital
texts.
|
Thursday 11:00 am - 11:45 am
(LB) hData - A Simplified Approach to Health Data Exchange
Gerald Beuchelt, Robert Dingwell, Andy Gregorowicz, Harry Sleeper,
MITRE Corporation
Interoperability issues have limited the expected benefits of Electronic Health Record (EHR) systems. Ideally, the medical history of a patient is recorded in a set of digital continuity of care documents which are securely available to the patient and their care providers on demand. The history of continuity of care standards includes multiple standards organizations, differing goals, and ongoing efforts to reconcile the various specifications. Existing standards define a format that is too complex for exchanging continuity of care information effectively. We propose hData, a simplified XML framework to describe health information. hData addresses the challenges of the current HL7 Continuity of Care Document format and is explicitly designed for extensibility to address health information exchange needs, in general. hData applies established best practices for XML document architectures to the vertical health domain, which has experienced significant XML-based interoperability issues.
|
Thursday 11:45 am - 12:30 pm XSAQCT: An XML
queryable compressor
Tomasz Müldner, Acadia University, Christopher Fry, Acadia University, Jan Krzysztof Miziołek, University of Warsaw, & Scott Durno, Acadia University
An XML-aware compressor reduces the size of a document by taking advantage of the redundancy in the XML syntax. A queryable XML compressor furthermore stores the compressed data in a form that can be queried without first decompressing the full document. XSAQCT (pronounced "exact") is a queryable, grammar-free compressor (meaning it is informed only by the document instance and not by a schema) that separates the document structure from the text and attribute values, storing the structure as an annotated tree and the data values in containers. Both are compressed; a decompressor can restore the original document, but a query processor operates on the compressed document, lazily decompressing as little as possible. Preliminary results look good, XSAQCT compresses documents in our corpus to 12% of the original size and outperforms the other XML compressors we have tested.
|
Thursday 11:45 am - 12:30 pm (LB)The Graphic Visualization of XML Documents
Zoe Borovsky University of California, Los Angeles; David J. Birnbaum, University of Pittsburgh; Lewis R. Lancaster, University of California, Berkeley; James A. Danowski University of Illinois at Chicago
We propose to show how graphic visualizations of deeply encoded XML documents allow Humanities scholars to reap the rewards of their work. These visualizations become, in turn, objects that scholars can analyze and interpret. Beginning with a short overview outlining the history of development in visualization strategies of Humanities computing technologies, we present Birnbaum’s Repertorium Workstation as an early attempt at graphic visualization of a large collection of XML encoded texts. Borovsky’s work shows how graphs of encoded data can themselves become objects of analysis; she will present examples of visual queries and results. Lancaster’s work envisions a visual query system using large graphs—a framework designed for exploring structurally complex Humanities data sets. Our work leads us to conclude that graphic visualization isn’t just something one can do with XML data; it is often crucial to making the data usable in research.
|
Thursday 2:00 pm - 3:30 pm
XML best practices: panel discussion
Peter F Brown, Pensive,
David Chesnutt,
Chet Ensign, Bloomberg,
Betty Harvey, Electronic Commerce Connection,
Laura Kelly, National Library of Medicine, &
Mary McRae, OASIS
Who doesn't want to do things well? Who doesn't want to stand on
the shoulders of giants? Who doesn't want to share hard earned
wisdom with others? So why is it that "best practices" are so
elusive? In this panel discussion we consider how "best practices"
(and practices that, for whatever reasons, masquerade as "best")
can be discovered, recognized, verified, modified, replaced,
debunked, enforced, promulgated, etc.
|
Thursday 4:00 pm - 4:45 pm
EXPath: A practical introduction: Collaboratively defining open standards for portable XPath extensions
Florent Georges
The EXPath project was established in April 2009 to define
libraries of extension functions for XPath. These functions will
exist outside of any existing processor, allowing processors to
implement them natively or to install them as external packages.
Ideally, these expressions will become portable across every
processor and will be usefully employed in XQuery, XSLT, and
other XPath-based languages. Come find out about the project and
enjoy some dynamite examples of submitted functions.
|
Thursday 4:45 pm - 5:30 pm
Managing XML references through the XRM vocabulary
Jean-Yves Vion-Dury, Xerox Research Centre Europe
XML References Management (XRM) is a method and
vocabulary for formalizing knowledge about the types of links
found in a given family of XML documents (i.e., in an XML document
type). With such formalized knowledge in hand, instances of the
document type are amenable to automated link verification, and to
transformation and derivation with predictable and desirable
results. The XRM approach allows link description
(the definition of link types in terms of the contexts in which
they appear), link validation description (the
properties of valid instances of each type of link), and
link translation description (the rules that should
govern transformations and derivations).
|
Friday, August 14, 2009
|
Friday 9:00 am - 9:45 am
Why writers don’t use XML: The usability of editing software for structured documents
Peter Flynn, University College Cork
Why can’t people stand XML editors? The details are legion, but
the root cause is simple: XML editors routinely put XML and its
structure at the center of their concerns, whereas the central
concerns of most writers of prose lie elsewhere. Even when they are
agreeing on the importance of document structure, XML thinkers and
actual writers often mean different things by it. What would it take
to make the editor’s interface support the user’s mostly bottom-up
model of text, instead of insisting on the top-down XML model most
easily and commonly implemented? A
user survey can help reveal whether a more task- and user-centered
interface would help make XML editors more usable.
|
Friday 9:45 am - 10:30 am
(FP) Open data and the XML community
Kurt Cagle
The world of XML is changing. Large “super schemas”
like OOXML, XBRL, NIEMs, HL7, and so on, push the limits of
existing XML software, while also encouraging the creation of
ecosystems built around them, in order to exploit the large
quantities of important data now or soon to be available in these
formats. Standardization around these formats is driven less by
existing proprietary formats and less by industry consortia than by
government adoption. The super schemas are often formulated less
as definitions of single concrete vocabularies than as
meta-definitions of families of vocabularies. The confluence of
emerging Open Data standards, the government-as-database
conjecture, and a shift towards RESTful services will serve to
turbocharge the XML community.
|
Friday 11:00 am - 11:45 am
Documenting and implementing guidelines with Schematron
Joshua Lubell, National Institute of Standards and Technology
Data exchange specifications must be broad and general to achieve
acceptance; for actual interoperability, the data need to be more
tightly constrained by specific business rules. Naming and Design
Rules (NDR) are guidelines for constraining the development of new
schemas or extending existing schemas to ensure interoperability.
Both the business rules and the NDR can be implemented in
Schematron, a rule-checking and reporting language for XML
documents. Schematron is also a particularly useful literate
programming tool for documenting and implementing guidelines. The Schematron literate-programming approach is compared to a previously implemented NDR document model approach with embedded Schematron for enforcing guidelines.
|
Friday 11:45 am - 12:30 am
Test assertions on steroids for XML artifacts
Jacques Durand, Fujitsu America,
Stephen Greene, Document Engineering Services,
Serm Kulvatunyou, Oracle, &
Tom Rutt, Fujitsu America
Testing of XML material — either XML-native business documents
or XML-formatted inputs of various sources — combines diverse
validation requirements that are in general not well supported by
any single validation tool. Schema-based validation must be
complemented with additional syntactic and semantic rules.
Known tools in this space are limited in their expressive power,
and/or they can’t make reports that are sufficiently nuanced,
and/or they can’t take additional information, such as metadata
and operational artifacts, into account. This paper describes a
more integrated XML testing paradigm that supports chaining and
parameterization of test cases, and the modularization of reusable
tests. It combines the familiar notion of test assertions
(as described in the OASIS TAG model) with XPath2.0 and
XSLT2.0.
|
Friday 12:30 pm - 1:15 pm
(FP) Sometimes a question of scale
C. M. Sperberg-McQueen, Black Mesa Technologies
Reflections on size, scale, scaleability, and value.
|
|