|
Balisage 2008 Detailed Program
|
Tuesday
9:45—10:30
Cool versus useful
B. Tommie Usdin,
Mulberry Technologies
True versus Useful, or True versus Likely-to-be-useful, are
tradeoffs we find ourselves making in document modeling and many
other markup-related situations all the time. But Cool versus
Useful is a far more difficult tradeoff, especially since our world
now includes a number of very cool techniques, tools, and
specifications. Cool toys can have a lot of gravitational pull;
attracting attention, users, projects, and funding. Unfortunately,
there is sometimes a disconnect between the appeal of a particular
tool/technology and its applicability in a particular
circumstance.
|
|
Tuesday
11:00—11:45
REST Oriented Architectures (ROA): Taking a resourceful approach to web data
Kurt Cagle,
O’Reilly Networks
The paradigm shift away from Service Oriented Architectures
(SOAs) toward Resource Oriented Architectures (ROAs) can be
expected to continue. The cost of developing “editors”
for the service-oriented silos of data now piling up on the web
often exceeds the value of those silos to the services that need to
provide such “editors”. Within organizations, the
complexity and heterogeneity of data increasingly resists
management via name/value approaches. It is simpler and more
efficient to view the web as a giant database, refocusing
development efforts on query-oriented substrates, rather than on
verb-oriented ones. In the general case, it is easier to get data
from users and to provide it to them via ROAs.
|
|
Tuesday
11:45—12:30
Informal ontology design:
A wiki-based assertion framework
Murray Altheim,
Ceryle.org
Wiki software has historically provided little support for even
simple organizational structure. But when the wiki grows large, a
flat organization no longer works well. The addition of
‘category links’ to each page helps, but these are
often ad hoc or undefined and themselves need structuring. Our wiki
architecture permits expression of an underlying structure using a
wiki-like syntax: user-authored assertions are dynamically
harvested to create a Topic Map graph, which mirrors the explicit
structure of the wiki and provides it with an underlying
classification system. The resulting implementation is a
combination of wiki plugins, event handlers to capture and process
their output, a harvester to scan the assertions on the wiki, and a
manager to maintain the set of assertions and to respond to
queries.
|
|
Tuesday
2:00—2:45
(FP) XML: It was not televised after all ...
Eduardo Gutentag,
Sun
Few may remember that XML was launched with an explicit
technical and social agenda centered on the revolutionary idea that
you own the content you produce. This has now come full circle
and has become an almost trivial assertion (albeit far from being
universally true). Yet in the meantime, while no TV cameras were
watching, it facilitated another revolution, which in turn has
had a global transformative effect on the way we define the words
“content”, “ownership” and even “freedom”. Will this now unleash
another deep and almost antithetical change, centered on the equally
revolutionary concept that ownership of an idea is not necessarily
vested upon the person who comes up with it?
|
|
Tuesday
2:45—3:30
Optimized Cartesian product:
A hybrid approach to derivation-chain checking in XSD 1.1
Maurizio Casimirri,
Paolo Marinelli, &
Fabio Vitali,
University of Bologna
Conditional type assignment in XSD 1.1 allows an element’s
type to be determined at validation time by evaluating XPath
expressions to determine which of several possible types to assign.
Conditional type assignment on child elements makes it challenging
to verify statically whether one type is a legal restriction of
another. The current draft of XSD 1.1 adopts a dynamic approach to
the problem, which means that some schema errors may remain
undetected if not exposed by the document instance. We propose a
hybrid solution, partly dynamic and partly based on static
analysis, for verifying that a restriction actually restricts its
base type.
|
|
Tuesday
4:00—4:45
Office Suite Markup: Is It Worthwhile?
Patrick Durusau
There is a lot of smoke but is there any fire? Microsoft, Sun, IBM, Oracle, Google, Redhat and others are contending over two XML vocabularies for office documents (word processor, spread sheet, presentation tool, etc.) at the moment. Fans of OOXML or ODF are hard to find among people who have spent the last twenty years researching and developing markup and markup systems. In fact, many in the markup community dismiss these efforts as unimportant and/or uninteresting. Patrick Durusau, the editor of ODF and who has called for the co-evolution of OOXML and ODF in an ISO context thinks they are important, possibly very important. Come find out what has captured the interest of at least one topic map and overlapping markup theorist.
|
|
Wednesday
9:00—9:45
Topic maps in near-real time
Sam Hunting,
Universal Pantograph
A topic map is an editorial product in which everything known to
the topic map about each member of some set of subjects of
conversation is co-located. When a community allows its members to
contribute content about existing and new members of the set of
subjects, the resulting publishable topic map may take time to
produce, even if it is produced without human editorial
intervention, because the co-location entailment may require
significant changes to the graph of nodes that must ultimately bear
a one-to-one correspondence to the subjects under discussion. When
a topic map will be published as a corresponding set of
interconnected web pages, it may be economically vital to use
existing web publishing tools, such as Drupal, even if they were
not originally designed especially for topic map publishing. A new
topic map module for the Drupal open-source content management
system now supports the publication of collaboratively written
topic maps in near-real time, using a plug-in architecture that can
be extended to support specific information sets.
|
|
Wednesday
9:00—9:45
SGF: An integrated model for multiple
annotations and its application in a linguistic domain
Maik Stührenberg &
Daniela Goecke,
University of Bielefeld
Linguists must often merge multiple annotation layers, in
heterogeneous formats, that describe the same primary data. In
recent years several approaches have been proposed for storing such
multiple annotations: Prolog fact-based architectures, XML-related
approaches, and graph-based models using XML syntax. In real-world
application, however, these architectures have serious practical
disadvantages. The XML-based Sekimo Generic Format (SGF) is based
on graph-based design principles but uses the tree structures
inherent in XML to reduce processing complexity and costs. SGF
data can be analyzed using standard XML tools such as XPath or
XQuery, as illustrated by our own project on the detection of
anaphoric antecedents.
|
|
Wednesday
9:45—10:30
Using Atom categorization to build dynamic applications
Alex Milowski
Atom feeds provide the ability to categorize both the feed and its
entries. This categorization provides a simple and way for feed authors
to associated terms and semantics to their feed contents. This talk
will demonstrate how such author-generated categorization can be used
to build web applications dynamically from feed and how Atom categories
map into the world of RDF and the “Semantic Web”.
|
|
Wednesday
9:45—10:30
(LB) An event-centric API for processing concurrent markup
Oliver Schonefeld,
University of Tübingen
A programmer can basically choose from two different APIs when
working with XML documents. One provides an event-centric view (SAX)
of the document, while the other offers an object-centric view (DOM). This
presentation introduces an event-centric programming interface to work
with XCONCUR documents which is inspired by the XML’s SAX-API. It provides
a very easy-to-use API for parsing XCONCUR documents.
|
|
Wednesday
11:00—11:45
(LB) xmlsh - a command language (shell) based on the philosophy of the Unix Shells, designed for XML
David A. Lee,
Epocrates
xmlsh, an Open Source project, is a command language (shell) modeled after the philosophy of traditional Unix Shells but designed to support XML natively. Largely backwards compatible with the unix shells, xmlsh is designed for both interactive and script use. It has built-in support for XML data (documents and sequences) as expressions, variables, files and pipelines. Support for full multithreading xml and text pipelines as well as direct execution of OS processes in and commands in portable and familiar syntax allows developers to construct complex jobs composed of xml tasks and traditional text and file operations easily and portably. Written in ‘pure java’, and integrated closely with the Saxon XQuery and XSLT library, xmlsh is portable to any platform which runs the java 1.6 JDK.
|
|
Wednesday
11:00—11:45
Discontinuity in TexMecs, Goddag structures, and rabbit/duck grammars
C. M. Sperberg-McQueen,
World Wide Web Consortium / MIT
Claus Huitfeldt, University of Bergen
Our Montréal conferences have long had a fascination with
the problems caused by overlapping structures in markup. One
special category of markup problems not fully examined in past
conferences comprises those caused by discontinuous structures:
document components that may be logical units but which cannot be
easily marked as such because of interruptions. It may be possible
to construct a graph structure which more nearly reflects our
intuitive notions about how documents are constructed, if we retain
the principle that parent/child and ancestor/descendant relations
imply that the ancestor contain the descendant, but jettison the
converse principle that any element properly contained by another
element is necessarily a descendant of (dominated by) that other
element.
|
|
Wednesday
11:45—12:30
(LB) State of the art of streaming: Why W3C XProc, W3C XSLT WGs and ISO
SC34 WG 1 are looking closely at streaming?
Mohamed Zergaoui,
member of XProc and XSLT 2.0 WG
XML has been out for 10 years and is now mainstream.
XML is now recognized for its value (Unicode, Structure, Extensibility) and
not seen only as something heavy and difficult to use. XML needs still to
improve its capacity to be processed more naturally as a stream of
information. After a short presentation about where we come from, we will look
very closely ongoing the work relating to streaming processing, and
especially from the XSLT WG, XProc WG and ISO DSDL. We will also discuss some interesting approachs to finding workarounds for streaming and propose some new areas of use for XML.
|
|
Wednesday
11:45—12:30
Graph characterization of overlap-only TexMecs and other overlapping markup formalisms
Yves Marcoux,
Université de Montréal
A criterion for determining whether any given graph can be serialized as a document with overlapping markup is described. This provides an exact and complete characterization of the element-containment relationships expressible in markup formalisms allowing overlap, such as TexMECS (without interrupted or virtual elements). Such a characterization will allow DOM-based applications to determine, for example, whether a given modification to a document would preserve its ability to be serialized using overlapping markup.
|
|
Wednesday
2:00—3:30
(FP) Dirty laundry: Committee disasters, what happened, what we learned
Jon Bosak, Sun
Mavis Cournane,
Cognitran
Patrick Durusau
James David Mason,
Y-12 National Security Complex
David Orchard,
BEA Systems
Lauren Wood,
Sun
Markup standards and projects are created, managed, and sometimes
destroyed through group process. While this process is often a bit
bumpy, there are some occasions when it goes spectacularly badly.
Tales of these committee disasters can be not only entertaining,
but also (and more importantly) informative. Panelists will spend a
maximum of 10 minutes each, describing a committee/working group
disaster of some sort, including: what went wrong, how it could
have been prevented, how it could have been (or how it was)
resolved. Participants may anonymize their tales of woe, provided
they assure us that the events they describe actually occurred and
that that they were actually involved.
|
|
Wednesday
4:00—4:45
(LB) Beyond the Semantic Web: the Semantic Space
Pierre Lévy, University of Ottawa
Today, the sharing of semantics remains a conundrum. Semantics
can be shared within a universe of discourse, but individuals and
communities cannot be relieved of the need to define their own
universes of discourse. The emergence of collective intelligence
is increasingly seen as necessary for human survival, but it is
difficult for people who live in diverse universes of discourse to
know when they are talking about the same things. Collective
intelligence — the ability of a community to exhibit
self-sustaining, rational behaviors — is related to its
participants’ ability to understand each other.
Diverse minds can create, recognize, and think in terms of
diverse sets of distinct concepts and relationships between them.
A conceptual addressing system can map such sets into a shared
abstract “semantic space” that is structured by an algebraically
definable group of transformations. Information Economy Meta
Language (IEML) is such a “semantic space addressing system”; it
defines a very large space of semantic addresses. A small number
of the points in that space — more than 2,500 of them — are now
listed in an “IEML Dictionary”, along with interpretations of each
of them in several natural languages. A language for compactly
specifying sets of locations in the space exists, and a parser
that translates expressions in this language into XML is
available. A programming language for discovering and asserting
relationships between sets of semantics is being developed, along
with a variety of related software tools.
The semantic space research program could provide a scientific
(measurable, principled, experimentally repeatable) foundation on
which technologies and professional disciplines can be created,
including distributed collaborative semantic search engines,
models and simulations of collective intelligences, tools and
editorial practices for the automated production of multimedia
documents, and many more.
|
|
Thursday
9:00—9:45
(LB) Reconsidering Conventional Markup for Knowledge Representation
David Dubin, University of Illinois at Urbana-Champaign
David J. Birnbaum, University of Pittsburgh
The main attraction of semantic web technologies such as RDF and OWL
over conventional markup is the support those tools provide for
expressing precise semantics. Formal grounding for RDF-based languages
(in, for example, description logics) and their integration with logic
programming tools are guided and constrained by issues of decidability
and the tractability of computations. Users of these technologies are
invited to use less expressive representations, and thereby work
within those constraints. Such compromises seem reasonable when
considering the roles automated reasoning agents are expected to play
by the semantic web community. But where expectations differ, it may
be useful to reconsider using conventional markup and inferencing
methods that have been applied with success despite their theoretical
weaknesses. We illustrate these issues with a case study from
manuscript studies and textual transmission.
|
|
Thursday
9:00—9:45
Hybrid parallel processing for XML parsing and schema validation
Yu Wu,
Qi Zhang,
Zhiqiang Yu, &
Jianhui Li,
Intel Corporation
XML parsing and validation is widely regarded as a performance
bottleneck in the processing of very large XML documents. We
propose a novel chunk-based algorithm for parallel processing of
XML using the multi-core architectures now more and more widely
deployed both on desktops and in servers. We partition the XML
document into chunks and process the chunks speculatively in
parallel, both for parsing and for schema validation, before
reintegrating them into a single result. Experimental results show
that this approach provides a great overall performance advantage
by exploiting the parallelism of multi-core platforms.
|
|
Thursday
9:45—10:30
(LB) Linking Page Images to Transcriptions with SVG
Hugh A. Cayless,
Carolina Digital Library and Archives, University of North Carolina at Chapel Hill
This paper will present the results of ongoing experimentation with the linking of manuscript images to TEI transcriptions. The method being tested involves the automated conversion of images containing text to SVG, using Open Source tools. Once the text has been converted to SVG paths, these can be grouped in the document to mark the words therein and these groups can then be linked using standard methods to tokenized versions of the transcriptions. The goal of these experiments is to achieve a much more fine-grained linking and annotation mechanism than is so far possible with available tools, e.g. the Image Markup Tool and TEI P5 facsimile markup, both of which annotate only rectangular sections of an image. The method envisioned here would produce a legible tracing of the word, expressed in XML, to which transcripts and annotations might be attached and which can be superimposed upon the original image.
|
|
Thursday
9:45—10:30
The Apache Qpid XML Exchange: High-speed reliable enterprise
messaging using open standards and open source
Jonathan Robie,
Red Hat
The Advanced Message Queueing Protocol (AMQP) is an open,
language-independent, platform-independent standard for enterprise
messaging; it provides a simple, coherent architecture for
sophisticated messaging applications. Qpid is a multi-language
implementation of AMQP being developed at the Apache Software
Foundation. The Qpid XML Exchange provides XQuery-based routing for
XML content, allowing AMQP and Qpid to be used for mission-critical
XML messaging applications. Together, these tools vastly simplify
the task of writing XML messaging software.
|
|
Thursday
11:00—11:45
An onion of documents and metadata
D. Matthew Kelleher,
Albert J. Klein, &
James David Mason,
Y-12 National Security Complex
This XML stuff is not just theory; it is proving eminently
practical in some large organizations now coping with the problem
of translating fifty-year-old paper-based workflow into online
embedded metadata. The United States DOE Y-12 Security Complex in
Oak Ridge builds products that cannot be tested as finished units,
so each component and assembly must be thoroughly inspected and
tested. Because such products have a potentially long shelf life,
extraordinary measures are necessary to document not only the
products but also the computing environment in which the
documentation has been prepared, as well as the output data from
test equipment. XML applications for the most important paper-based
components exist, and by the time of the Balisage conference we
hope to have a live pilot and early results to report.
|
|
Thursday
11:45—12:30
Structural metadata and the social limitation of interoperability: A sociotechnical
view of XML and digital library standards development
Jerome McDonough,
University of Illinois at Urbana-Champaign
XML is like a rope: it is extraordinarily flexible;
unfortunately, just as with rope, that flexibility makes it all too
easy to hang yourself. The apparent simplicity of XML, combined
with its flexibility, make it an all too obvious choice for
encoding metadata for the catalogs of digital libraries. However,
it may provide too much flexibility. Not only are there competing
metadata schemes, such as METS and MPEG-21 DIDL, but it is also
possible to create variant interpretations of generic high-level
structures within a single metadata scheme. The result? Catalogs
lose interoperability. Establishing a standard for a metadata
scheme is not enough: libraries must also build community consensus
about how to apply the standards.
|
|
Thursday
2:00—2:45
(FP) Parser possibilities: Why write a markup parser?
Norman E. Smith,
Science Applications International Corporation
Since high-quality validating XML parsers for multiple schema
languages are widely available, our community seems to have lost
interest in writing new parsers. But there are still many good
reasons to roll your own: for the learning experience, because none
of the existing ones quite meets your needs, so you can parse
multiple markup languages, to write to your own API, and more. My
mlParser started modestly and has evolved into a
primary tool in my markup toolkit. Let me tell you about the
choices I made along the way.
|
|
Thursday
2:45—3:30
Properties of schema mashups: dynamicity, semantic, mixins, hyperschemas
Philippe Poulard,
INRIA
The Active Schema Language (ASL), based on the Active Tags
engine, allows us to experiment with features that might be useful
in the next generation of XML schema languages. ASL allows us to
build content models on the fly: think of a purchase order that
can have a free-item element only if the
item elements total $500 or more. ASL can specify a
datatype that knows that 68°F is warmer than 19°C and
cooler than 22°C. ASL can mix constraints written in different
schema languages. And, of course, we can write a schema in ASL for
ASL itself. ASL illustrates an important point: Active Tags can
help significantly in the design of runnable XML languages.
|
|
Thursday
4:00—4:45
(LB) Hypertext Links and Relationships in XML Databases
Anne Brüggemann-Klein &
Lorenz Singer,
Technische Universität München
Hypertext links are,
for semistructured data and narrative documents in XML databases,
a fitting analogue to foreign-key references
for structured data in relational databases. We encode hypertext links with XLink. For processing the links,
we use the XLink processor HyQuery, an XQuery module which turns a native, XQuery-enabled XML database
into a hyperdata system. This system is used in a lab course “XML Technology” and in the case study XTunes, a Web application that manages metadata and recordings of classical music.
|
|
Thursday
4:00—4:45
Secure publishing using schema-level role-based access control policies for
fragments of XML documents
Tomasz Müldner &
Robin McNeill,
Acadia University
Jan Krzysztof Miziołek,
University of Warsaw
Increasing use of XML to store large data collections such as
medical records has created a need to generate specialized views of
those collections and to control access to the views according to
the roles of the viewers. A medical practitioner, for example,
might need to view many patients’ records, while a patient
should have access only to an individual chart. Granting secure,
encrypted, and role-based access to XML fragments drawn from a
collection, without requiring a massive encryption overhead, can be
done with creative use of path processing that is aware of the
schemas behind the collection. Minimal keyrings for
encrypting/decrypting can then be generated based on structures
allowed by document schemas, and keyrings can be distributed
according to roles and rights of different users of the
collection.
|
|
Friday
9:00—9:45
Putting it all in context: Context and large-scale information sharing with Topic Maps
Peter F. Brown,
Pensive
Two major concerns in creating large-scale Topic Maps
applications are: first, the role and capture of context within a
specific topic map, and second, how the Topic Maps paradigm —
specifically the XTM specification — allows the linking and
efficient use of Topic Maps deployed on a massive scale, such as
the Internet, to support extremely large-scale information sharing.
These two subjects initially seem to be separate concerns. However,
upon reflection and further discussion, it is clear that they are
intertwined and closely related to the issue of scalability.
|
|
Friday
9:45—10:30
Translation between RDF and Topic Maps: Divide and translate
Christo Dichev,
Darina Dicheva,
Boriana Ditcheva, &
Mike Moran,
Winston-Salem State University
Is translation between RDF and Topic Maps really feasible in the
general case? Any affirmative answer to this question depends on
achieving the right balance between the competing objectives of
semantic fidelity, completeness, and usability of the resulting
translation. Such a balance can only be achieved when the
ontological correspondences between RDF and Topic Maps are
exploited. An analysis of the translation task reveals relevant
requirements. A balanced method has been implemented as a plug-in
for the TM4L topic maps editor.
|
|
Friday
11:00—11:45
Freedom to constrain: Where does attribute constraint come
from, Mommy?
Syd Bauman,
Brown University
There are many ways to express constraints on classes of XML
documents. Some can and should be controlled by document creators,
some at the project or local level, and some by remote schema
developers. Each type is illustrated by explaining ways to
constrain an attribute value to one of an enumerated list of
values. The constraint could be expressed formally in a schema
language such as RELAX NG, in a rules-based process such as
Schematron, in a metadata element such as a TEI header, in a
metaschema file such as a literate program, or independently in a
separate file. Pros and cons will be given for each, but the best
answer may depend on access requirements: will encoders as well as
designers need access and change rights to the constraint?
|
|
Friday
11:45—12:30
Text retrieval of XML-encoded corpora: A lexical
approach
Liam R.E. Quin,
World Wide Web Consortium
The latest XQuery 1.0 and XPath 2.0 Full-Text 1.0
specification extends XPath 2.0 to support full text searching. The
software lq-text is a 1989 open source text retrieval
package. Whereas XPath 2.0 Full-Text is node-based, working over
XML document trees and fully XML-aware, lq-text
operates over text files by indexing the location of each
natural-language word in all the files. Since lq-text
has high precision, good performance over large datasets, and
flexible concordance generation, enhancing it to add XML support
allows interesting comparisons between the two approaches.
|
|
Friday
12:30—1:15
But wait, there’s more!
C. M. Sperberg-McQueen,
World Wide Web Consortium / MIT
XML has been widely adopted and forms part of the infrastructure
of most modern information technology. We have a satisfyingly
large collection of XML vocabularies and XML tools. Is it time to
declare victory and go home yet? Or is there more to do?
|
|
|