Miłowski, R. Alexander, and Norman Walsh. “How to survive the coming namespace winter.” Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). https://doi.org/10.4242/BalisageVol13.Milowski01.
Balisage: The Markup Conference 2014 August 5 - 8, 2014
Balisage Paper: How to survive the coming namespace winter
Is XML condemned to be an orphaned syntax with a dimly lit future within the Web
browser? What can information providers with rich sources of XML do, other than
down-translate to HTML? The evolving Web Components environment may provide a solution!
With some simple translations, stylesheets and scripts, it will be possible to wrap
custom XML in a minimum amount of HTML and serve it over the Web. The browsers will
never know they’re being tricked into delivering XML.
It was a late night, again, at XML Prague, and Norm Walsh,
John Snelson, Charles Greer, and I were walking along attempting
to find dinner. We had been discussing the Web Components
session that had occurred earlier in the day. We expressed our
dismay and depression that we couldn't just have XML. Then it
occurred to us, like a light being turned on (or being
whacked on the back of the head with a ruler), Web Components
are just markup and pretty close to XML. All we needed to do was
use a hypen rather than a colon, and all was well. It is a
compromise and likely the best we will get anytime soon. We get
to put our own pointy brackets into the browser and give it
semantics—accept it and move on.
— Alex Miłowski recounting XML Prague 2014
Forward from Failure
A publisher that has a large amount of information in XML documents has little
recourse in today's world but to transform this information into HTML for delivery
on the
Web or within EPUB ebooks. The ability for the common Web browser to load and process
XML
information, with similar processing semantics to HTML, isn't available; links will
not be
identified, styles and local transformations are fraught with problems, media will
not be
loaded or rendered, and scripts will not execute to provide extensible behaviors.
At the 2009 Balisage Conference, in XML in the Browser: the Next
Decade
balisage-2009, Miłowski enumerated the issues with delivering XML to
the browser and many, if not all, of those issues remain unsolved in 2014. The various
browser vendors have since all but abandoned processing XML except as a legacy format.
In
many ways, it only remains as a serialization format for HTML5 html5
and as a mechanism for receiving data within a Web application.
It was argued that there are intrinsic and non-intrinsic formats for the Web. In
terms
of markup languages, HTML, SVG, and MathML were identified as the triad of intrinsic
markup
languages. This assessment is somewhat validated by the integration of SVG and MathML
into
the HTML5 specification.
This leaves generic XML as an orphaned syntax with dimly lit future within the Web
browser. If the writings on the walls of various mailing lists are any indication,
there is
a strong desire for less or complete removal of the native XML processing that remains
within the browser. While current applications and backlash have prevented such removal,
the days of XML in the browser feel numbered.
Meanwhile, XML has served a purpose for many information publishers. Tag sets, both
custom and standardized, have been developed to encode enormous amounts of data. Within
enterprises, processing pipelines that produce, validate, manipulate, and otherwise
consume
this data have had their benefits. It has become very normal to
transform these documents into the appropriate HTML markup for delivery to whatever
consumer is on the other end of that HTTP connection.
Yet, as Web developers and browser vendors seem to be moving away from custom markup,
they seem to realize they are missing something. Making the Open Web
Platform extensible means that behaviors that need to accompany information
need to packaged as reusable components. That is, information needs to have markup
that
identifies it as a specific kind of information whose scripts, templates, and styling
are
identifiable and loadable over the Web.
Hyphens to the Rescue
Once the desire for extensible markup, outside of the direct control of either the
W3C
or browser vendors, was recognized, the concept of custom elements was introduced
and
eventually formalized custom-elements. For HTML parsing purposes, the
essential distinction is that a custom element's name contains a hyphen—not a colon.
This
allows custom element names to be distinguished from those within HTML itself and
the only
notable exceptions are the handful of element names in SVG and MathML that contain
a
hyphen.
In common usage, custom elements of the same origin share a common
prefix followed by a hyphen (see Figure 1). That
prefix currently has no registration or association with any URI. As such, it is unlike
XML
namespace prefixes which must be declared before being used.
The use of custom elements goes beyond just syntax as it also provides an API for
registering behaviors with the browser for the markup. During parsing, the DOM construction
process assigns certain classes to recognized markup (e.g.
HTMLParagraphElement is used for the p element). When an
unrecognized element is encountered (i.e. a custom element), it is initially constructed
as
HTMLUnknownElement.
A script can register with the document a prototype that defines a new behavior or
assigns an existing HTML behavior to a custom element. For example, the
db-para could simply be registered as an HTML paragraph as shown in Figure 2. The DOM object for the element is subsequently replaced with a
new instance of the appropriate type and the behaviors of that element are now
accessible.
In simple cases, an element registered as a custom element with one of the available
HTML prototypes inherits some of the custom behaviors. In testing, it is unlikely
that
default styling will automatically be applied (e.g. using
HTMLPreElement.prototype doesn't guarantee pre element
styling). Yet, in some cases, styling does occur and so the behavior is inconsistent
and
seems to be implementation defined. One can imagine that a consistent, reliable behavior
is the goal and this will sort itself with time.
Moreover, registration can go far beyond such simple associations of name to pre-defined
prototypes. A script can register a custom prototype to provide specific behaviors.
The
prototype provided must contain a function via a createdCallback property that
will perform any additional initialization of the element. Other similar mechanism
are
available for maintaining the element throughout its life cycle.
For example, in Figure 3, the callback applies a JavaScript-based
syntax highlighter (highlight.js
highlightjs) to the contents of the element. Once the element is
re-created within the DOM with this prototype, the callback function executes with
the
value of this assigned to the element. In this particular example, this means
the db-programlisting element is constructed with the prototype and the
callback adds the syntax highlighting.
Often, the structured information of an element doesn't directly match the desired
rendering. The use of HTML Templates (part of the HTML5 specification) provides the
ability to package and use structured layouts for the display of custom elements.
A
template is a portion of markup that is wrapped by a template element that can
be used to construct new content programmatically. One main use for templating is
to avoid
manual construction of elements by either parsing or direct DOM method calls.
For example, in Figure 4, the template for a figure is listed.
The content element specifies where contained content should be placed. In
this example, the select attribute is used to specify which child elements
should be used. The result of this example is reordering the children of
db-figure so that the title is last.
The registered prototype must use the template and the Shadow DOM
shadowdom to affect the rendering of the element. The Shadow DOM
provides the ability to create a rendering based on elements not shown to the user.
When
the user inspects the displayed element (or its source), they will only see the custom
element. Inside the browser, a "shadow element" is used to structure and render the
same
information where the shadow element is only accessible via scripting or styling embedded
within the template.
An example of using a template for the db-figure element is shown in Figure 5. The callback constructs a Shadow DOM for the current
element and appends content. The content is structured via the template shown in Figure 4. The consequence is the current sub-tree for
db-figure is rendered using the newly constructed Shadow DOM.
Finally, we can package our script, templates, and any styling via HTML
Imports
html-imports. The imported document is simply another HTML document
whose scripts, styles, and templates become available to the current document. The
import
is invoked by a simple link element with rel attribute value of
import in the importing document (see Figure 6).
The imported document packages the Web Component by linking to the necessary scripts
and
stylesheets while containing any templates that are used by those scripts. The example
in
Figure 7 shows the structure used to package the previous examples.
The scripts and stylesheets for the highlighter are included using the same mechanism
already known to Web developers.
As a nuance, the script registering the custom elements and the templates are in
collusion within this imported document. At the very start of the example in Figure 5, the expression
document.currentScript.ownerDocument is used to obtain the correct document
for retrieving the templates. If the component is packaged differently, retrieving
the
template might be more difficult or impossible.
In summary, Web Components relies on four essential features:
Custom Elements — a specification that is in Last
Call and may enter CR in 2014.
Shadow DOM — a specification that is a working
draft.
HTML Imports — a specification that is a working draft and
volatile.
Pandora's Box?
As the features of Web Components coalesce and become part of the commonly deployed
browser, there is little anyone can do to prevent their use. An author can simply
import a
Web Component of their choice, custom or shared, and the browser can do little more
than
execute the associated semantics within the bounds of the Open Web Platform. That
allows
anyone to develop custom markup to encapsulate their information in much the same
way was
hoped for with XML.
There are two notable differences between now (2014) and 1998:
The browser, as a component of the Open Web Platform, is much more stable,
technologically advanced, and well understood.
Web Components utilize the Open Web Platform to package semantics in a much more
extensive way that is compatible with how browsers actually
work.
An unscientific look at the current opinions of the use of Web Components indicates
it
may become hugely popular. While only time will actually determine the outcome, the
Shadow
DOM and HTML Templates are very useful. Accessing them within Custom Elements provides
needed encapsulation to Web applications and so their intended use in that context
makes a
lot of sense.
Yet, we don't have to use Web Components to package semantics for custom markup that
is
limited to specialized uses. That is, with relative ease, we can transliterate whole
XML
documents into custom elements, wrap them with a few lines of HTML markup, and the
browser
will load and process the custom elements as specified. Is this abuse, a practice
that
isn't recommended, or should a thousand custom elements bloom?
Let's open Pandora's box and see whether what is inside is truly evil. We will take
DocBook, a known vocabulary for documents (books, articles, etc.), and turn the markup
into
a set of Web Components. We will demonstrate how easy the transliteration is to perform
and
show a few interesting results.
The DocBook Web Component
Turning any arbitrary XML document into an HTML document as a Web Component requires
on
three essential steps:
Prefix every element with a constant prefix and hyphen that can be associated with
the element's namespace.
Develop stylesheets, templates, and scripts that encapsulate the desired
behavior.
Wrap the document in the minimum amount of HTML bootstrapping necessary to deliver
the Web Component to the browser.
For example, in the specific case of DocBook, we would do the follow:
Transform the document by changing every DocBook element name to a name with
db- prefix with no namespace. Also, copy any MathML
or SVG to the output and pay specific attention to the serialization (HTML without
a
namespace or XHTML with a namespace).
Implement Web Components for common constructions like xref,
mediaobject/imageobject/imagedata, link, etc. and develop CSS stylesheets for the
rest. Package this component as a single document (see Figure 7).
Wrap the document in the minimum markup (see Figure 6).
In addition, we'd like to retain some aspect of identity of the namespace from the
original XML. To do so, we will add an RDFa rdfa
typeof attribute on the root element whose value is the namespace URI. This
will allow a consuming application to identify the custom element by type rather than
a
fixed prefix. Hence, on the root custom element for DocBook (e.g. db-article),
a typeof attribute will contain the value
http://docbook.org/ns/docbook.
This process was implemented using the simple XProc xproc pipeline
shown in Figure 8 where the transformed document is inserted in
the wrapper (see Figure 9) as a replacement for the content
element. The transformation is simply a set of renaming rules with the main two rules
shown
in Figure 10.
In terms of what these custom elements might provide to a user, some behaviors for
DocBook that require scripting are:
Links (e.g. link or xref).
Auto-numbering of sections, figures, etc.
Display of media objects (e.g. imageobject/imagedata).
Generated text for cross references (e.g. turn xref into "Figure 2.1 ...").
Auto-generation of a table of contents and other navigation.
Syntax highlighting in programlistings and other code.
These features were implemented[1] and tested in Chrome (the only browser currently implementing Web
Components[2]). In total, the implementation was 235 lines of JavaScript, 76 lines of CSS,
and a 67 line HTML document with none of these resources having been compressed or
otherwise optimized. The implementation also includes highlight.js via the
HTML import and programmatically adds MathJax mathjax for rendering
MathML.
At present, there are some notable issues implementing a set of Web Components and
using
HTML Imports:
MathJax was not able to be included via the import. The method it uses to
determine the base URI cannot find the script reference in the imported document.
MathJax isn't HTML import aware at this point in time. As such,
MathJax added scripts and stylesheets aren't hidden in the imported document but,
instead, are programmatically added to the importing document.
Implementing links was harder than expected. Just associating the prototype
HTMLAnchorElement with the element does not induce some minimal
linking behavior. Further, using a template that wraps the content with an HTML
anchor in the Shadow DOM is more complicate as there is no way to automatically copy
attributes (e.g. the URI in the href attribute) and some default
behaviors (e.g. a mouse pointer) aren't automatic. Further, clicking had no effect
and a custom event handler had to be added.
The division between the stylesheet within each template and the overall
stylesheet is a bit tricky.
There is a lot more to be done to handle the full life cycle of the elements. That
is, if other scripts manipulate the custom elements in
situ, the components (e.g. the auto-generated navigation) may need
to update themselves.
Web components can also be used within other browsers by using the Polymer Platform
polyfill
platform. This JavaScript library provides implementations of various
Web Components specifications for the Firefox, Safari, and IE browsers. Unfortunately,
at
this time (July 2014), this library fails to work with the DocBook example:
Firefox crashes almost immediately. This seems to have something to do with the
generation of the table of contents navigation.
Safari fails with an JavaScript error.
The Evolving Web
Web Components is a promising technology for delivering packaged semantics for general
markup. It succeeds in many places where previous attempts with XML in the browser
have
failed. That it is somewhat of a reality today is ever more exciting.
Yet, the mechanisms for which a browser or resource consumer can recognize the use
of a
particular set of custom elements is fraught with problems. The inability to identify
the
prefix used in constructing the element names, associate that prefix with some URI,
or to
protect content from collisions with other custom elements is going to be an immediately
painful experience. Authors and publishers will want to mix content from different
sources
outside of their control and custom elements will make that increasingly harder.
XML has a partial solution for identifying and uniquely naming elements to avoid
collisions. Yet, that solution allows arbitrary complexity without sufficient gains
in
functionality and was rejected by many in the various Web developer communities. Yet,
one
can't help but feel like a colon was swapped for a hyphen and we lost something in
the
exchange.
In the end, Web Components lets us deliver XML documents, transliterated, and packaged
with their semantics. The mechanisms of the Shadow DOM and scripting allow the markup
used
for rendering to have a interactive and integrated mechanism for live manipulation
within
the browser. HTML imports and templates enabling packaging of these semantics into
a single
resource.
Even though Web Components, HTML5, and scripting isn't necessarily how we all may
have
imagined XML on the Web in 1998, their combination is sufficient to accomplish real
work
with markup within the Open Web Platform. The Web has evolved and XML may be evolving
along
with it. It is a reality that we affectionately call the Prague
Compromise.
He put on his skis, straightened himself up, and remained standing there for some
time; as he pulled on his mittens he took one glance homeward. He could just make
out
the house in the dim distance. Then the whiteness all around it thickened—rose up
in a
cloud—seemed to be piling in. ... Perhaps it wasn't so dangerous, after all. The wind
had been steady all day, had held in the same quarter, and would probably keep on
...
Oh, well—here goes!
...
On one of the hillsides stood an old haystack which a settler had left there when
he
found out that the coarse bottom hay wasn't much good for fodder. One day during the
spring after Hans Olsa had died, a troop of young boys were ranging the prairies,
in
search of some yearling cattle that had gone astray. They came upon the haystack,
and
stood transfixed. On the west side of the stack sat a man, with his back to the
mouldering hay. This was in the middle of a warm day in May, yet the man had two pairs
of skis along with him; one pair lay beside him on the ground, the other was tied
to his
back. He had a heavy stocking cap pulled well down over his forehead, and large mittens
on his hands; in each hand he clutched a staff ... To the boys, it looked as though
the
man were sitting there resting while he waited for better skiing ... His face was
ashen
and drawn. His eyes were set toward the west.
— Giants in the Earth: A Saga of the Prairie, O. E. Rölvaag (1924)
[html5]
HTML5, W3C, 2013-09-06, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara,
Edward O'Connor, Silvia Pfeiffer, and Ian Hickson; see also http://www.w3.org/TR/html/
HTML5, W3C, 2013-09-06, Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara,
Edward O'Connor, Silvia Pfeiffer, and Ian Hickson; see also http://www.w3.org/TR/html/