How to cite this paper
Quin, Liam R. E. “Automatic XML Namespaces.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). https://doi.org/10.4242/BalisageVol3.Quin01.
Balisage: The Markup Conference 2009
August 11 - 14, 2009
Balisage Paper: Automatic XML Namespaces
Liam R. E. Quin
Liam Quin has been involved with declarative, descriptive markup since the early 1980s.
He wrote his open-source text retrieval system and first distributed it in the late
1980s
He has worked at the World Wide Web Consortium since 2001, where he is XML Activity
Lead, or, informally, Mrs XML.
Copyright © 2009 Liam R E Quin. Used by permission.
Abstract
XML, originally called Web SGML, was designed as a subset of SGML
suitable for use on the World Wide Web. Shortly after the publication of
XML, the Names in XML specification, better known as Namespaces, was
published. One of its primary purposes was to facilitate the exchange of
RDF metadata in XML. XML Namespaces are somewhat clumsy and unwieldy,
under-specified, and incomplete. They have gained considerable adoption
within XML vocabularies, often with reluctance, but have met with
resistance from the Web community.
This paper explores some of the difficulties (whether real or
perceived) with Namespaces, suggests ways to address some of those
difficulties, and also suggests ways to attempt some reconciliation with
the Web developer community.
Table of Contents
- Introduction
- XHTML and Namespaces
- HTML 5, XML and Namespaces
- Other XML Environments where Namespaces may be Suboptimal
- Requirements for a Solution
- Existing and Proposed Technologies
-
- Default (#FIXED) attributes in a DTD
- Information Technology — Document Schema Definition Languages
(DSDL) — Part 8: Document Semantics Renaming Language (DSRL)
- JavaScript-based solutions
- Disruptive Approaches
- Proposal: Automatic Namespaces
-
- The Automatic Namespace Definition
-
- Attributes
- Namespace Prefixes
- Changing DSRL
- Conclusions and Ongoing and Future Work
Introduction
The XML community has lived with XML namespaces for a decade. They
are useful to the point of seeming indispensable, they are ubiquitous, and
yet they are at the same time unwieldy and flawed. Namespace declarations
can be inconvenient to remember, and errors in them are frequently the
source of subtle and hard-to-diagnose errors. From a programming
perspective, namespaces provide scope and disambiguation; from a document
authoring perspective, namespaces provide headaches. For an HTML author,
working in a world in which the browsers tend to suppress or auto-correct
errors, and in which MathML, XHTML, SVG, XForms, Dublin Core and more each
have their own namespace URI, the need to pre-declare large sets of
namespaces quickly becomes onerous.
In this paper the author proposes a simple system to simplify
namespace declaration, and to enhance namespace functionality considerably
by introducing a single new feature, without losing the existing benefits.
The paper first describes in more detail some of the issues, then
summarises the issues with requirements for change, then discusses other
proposals, and finally makes a concrete proposal.
XHTML and Namespaces
Consider a typical XHTML document that also uses XForms, SVG,
MathML, and has some metadata using the Dublin Core and FOAF. Already we
are up to six namespace declarations (including the one for XHTML) and we
have hardly begun. SVG uses XLink adding another, and so it continues.
Documents with twenty or more declarations are sometimes seen.
Of course, many of these documents are generated automatically
rather than hand-authored. Even there, the burden of maintaining the
declarations should not be underestimated. To XML people familiar with the
mechanisms and overhead of the namespace syntax there may seem (at least
at first sight) no problem, but to HTML authors the difference is
startling. Is it necessary?
Recall that namespaces are serving two primary functions: they are
associating names with the specifications that define them, and they are
disambiguating in the case that two specifications define the same name.
In practice, conflicts where the same name is defined by multiple
specifications are rare (although still important enough to need
addressing). For XHTML, the DOCTYPE declaration already is sufficient to
bestow HTML-ness, and, within the context of HTML, an svg
element only has one plausible interpretation. A significant motivation
driving the use of XHTML is that XML tools can be used with the document,
and for these tools, SVG-ness is not associated with any particular
element name. One could use #FIXED
attributes in a document
type definition, but we will see later why this is not a satisfactory
approach. The HTTP MIME Content Type can also be used to indicate
combinations (application/xml+svg+mathml) but since every combination must
be registered with IANA, this does not scale; it also doesn't work on a
local file system.
A major goal of the work described in this paper, then, must be to
eliminate as much syntax as possible without losing the benefit of being
able to combine namespaces at will and have both XML and Web tools operate
on the result.
The XML community will not be motivated to support a new
specification merely to satisfy the needs of some other community. We
would need to do more. Happily, there is more to be done. Extensive use of
namespaces has demonstrated a need for an easy way for users to define
their own namespaces that are a mixture of existing namespaces and their
own elements, and to check that
HTML 5, XML and Namespaces
In the last couple of years, a number of individuals have gathered
support for renewed work on non-XML versions of HTML. These are also not
based on SGML, but instead are an SGML-inspired format. Avowed dislike of
XML appears to have stemmed at least in part from misunderstandings and in
part from the stricter and more verbose syntax. For these people,
robustness, accuracy, error detection and correctness are relatively
unimportant: all that matters is that the Web browser render an acceptable
result. At the time of writing, the HTML Working Group is considering
hard-wiring MathML and SVG namespaces into the HTML specification, so that
an svg
element would automatically be placed into the SVG
namespace. This would make it harder to process the documents with other
tools, for example it's tricky to match SVG elements with XPath or with
XSLT match expressions if you don't know in advance whether there will be
a namespace declaration, and, if there is, whether it will be correct. In
fairness it should be noted that, since HTML 5 processors are expected to
auto-correct certain classes of syntax errors which XML processors are
required not to attempt to correct, one cannot in general expect to
process HTML 5 documents with XML tools. None the less it is reasonable to
be able to expect to generate HTML documents from XML, and also to use
JavaScript, XPath and other tools on the HTML DOM.
Other XML Environments where Namespaces may be Suboptimal
Anywhere that users have to declare a large number of mostly
orthogonal (non-overlapping) namespaces is a candidate for improvement: it
is particularly unfortunate that users cannot themselves combine
namespaces to make new amalgamated ones, such as XSLT plus SVG plus
HTML.
Some difficulties when using multiple namespaces today
include:
-
The need to remember long URIs: people often use copy and paste,
which can result in extra declarations being pasted in and later
causing problems; or, they re-type the URI by hand and make errors,
with the result that software later doesn't recognise the namespace
correctly.
-
The need for humans to remember which namespace defines which
element or attribute, even where there's no clear functional gain. For
example, remembering that href
comes from XLink in SVG,
and from XHTML in some other vocabulary.
-
Matching mixed-namespace documents with XPath, whether for XSLT
or for XQuery or (hardest of all) stand-alone XPath, is distressingly
exciting. The most commonly asked questions on XML support channels
are about processing namespaces.
Requirements for a Solution
Freely Available
|
If you need to pay for the spec, the Web developer community
is not interested.
|
Freely Implementable
|
There should be no patent encumbrances; since this is in
practice not determinable, at the very least the people developing
the specification, and the organisation publishing it, mast make
effort to ensure that people using or implementing the specification
won't suddenly be asked to pay royalties.
|
Makes life simpler
|
Although part of the goal of Automatic Namespaces is to enable
HTML 5 documents to be namespace-well-formed in memory, it's
important to remember that this is a motivation for XML people but
not HTML people. Therefore, to gain adoption, Automatic Namespaces
must not require the user to understand new concepts. For example,
the user should not have to declare or use namespace bindings, since
those are the things that are objected to the most.
|
Easy to Implement
|
The solution must be easy to implement in JavaScript (for
example, an experienced JavaScript programmer should be able to
write a complete implementation in under an hour, including the time
taken to read the specification). This is partly because otherwise
no-one will do it, and partly because a JavaScript implementation
that is more than a few hundred bytes will not be appealing to
people trying to make Web sites that load quickly. The specification
must also be easy to implement in XSLT and other
environments.
|
Compatible with Today's Web
|
The solution must work in Web browsers that are in use today,
at least in the vast majority of cases. People won't upgrade their
Web browsers to view Web pages using namespaces.
|
Gives clear benefits to XML people
|
I'm not out just to make the HTML 5 people feel vaguely more
karmic. They can do that all by themselves. I aim to write a spec
that will mean XML users can benefit too.
|
Existing and Proposed Technologies
Others have identified a need in this area. A recent discussion on
the xml-dev list quin2009aelicited an incomplete
proposal that will be discussed below, together with two other methods,
#FIXED
attributes and ISO DSRL. Private communication has
also provided a JavaScript-based partial solution that will also be
described below.
Default (#FIXED) attributes in a DTD
The idea here is to have a document type definition that supplies
xmlns:svg and so forth as #FIXED
attribute values. This
would be interesting if Web browsers fetched DTDs. One could consider
JavaScript to fetch the DTD from the W3C servers, but W3C can't afford
the bandwidth, and there are file access restrictions on JavaScript that
may make a persistent cache or XML Catalogue approach hard to implement;
in addition, both the SYSTEM and the PUBLIC identifiers are fixed, so
one cannot serve HTML documents with server-specific values. This
DTD-based approach might work fine outside the HTML world, but it turns
out that today's Web browsers reject documents containing qnames if the
prefix has not been explicitly bound to a URI. This means that a
document would not be considered as well-formed without the DTD:
progressive rendering as the document loads would have to wait for the
DTD, and an unavailable DTD would prevent the document from loading.
Worse, since the browsers don't fetch the DTD themselves, the approach
of defining default prefixes in a DTD can only lead to documents that
the browsers can't load. Since our goal is to reach out and build
bridges between HTML and XML worlds and we must (regretfully) dismiss
this approach.
Information Technology — Document Schema Definition Languages
(DSDL) — Part 8: Document Semantics Renaming Language (DSRL)
ISO Joint Technical Committee ISO/IEC JTC 1, Information
Technology, Subcommittee SC 34, Document Description and Processing
Languages has recently produced a draft of their Document Semantics
Renaming Language (DSRL) . This document does not appear to make clear
how a DSRL mapping is located, given an instance document, although
separate evidence private communication suggests a
plan to use a processing instruction. Possibly the ISO committee would
be amenable to an alternative suggestion that is more likely to work in
HTML-based Web browsers, since processing instructions interfere with
PHP processing, and also cannot portably appear before the end of the
document's head element, which may be too late in a world of progressive
rendering.
The DSRL specification describes a powerful mechanism to map
elements, attributes, processing instructions and entities (!) in the
XML document to alternates. You can map any element to a new (namespace,
element) combination, where the replacement is part of a validating
schema. This specification is almost certainly too complex to implement
in the few hundred bytes of JavaScript we can allow ourselves, although
the possibility of defining HTML entities using an XML syntax is very
alluring. DSRL uses XPath to specify the context in which remapping is
to occur. One could thus map every third svg
element in the
document, if desired.
Overall, DSRL seems very promising. It appears to do what is
needed. But, like a US Congress bill, it comes with a lot of additional
baggage, some of which is problematic for us, and also is some missing
functionality we need:
-
DSRL files themselves have at least three namespace
declarations in them. We want something that doesn't need to have
any additional declarations, if possible.
-
DSRL appears to lack an inclusion facility. One could use
XInclude, perhaps, but at the cost of added syntactic complexity
that we are trying to avoid. An XML-savvy user could create DSRL
files with XSLT or XQuery, but again, that's a level beyond our
expectations. We want to be able to combine namespaces to make new
ones, and DSRL isn't designed to do that.
-
DSRL requires an explicit reference from the document to the
DSRL file, but using a processing instruction. Processing
instructions can cause problems in Web browser environments: they
generally work in application/xml documents but in HTML and XHTML
documents they can be confused with PHP on the server, and can also
be (incorrectly) displayed by a Web browser: any unrecognised markup
terminates the HTML HEAD element, so a processing instruction can
also break stylesheet and script links in older browsers, and may
even be rendered as text content [ISO;
XHTML-C1, Appendix
C1].
JavaScript-based solutions
The trouble with a purely JavaScript approach is that it works in
a Web browser but not in a pure XML environment. It is a bridge that
reaches only one shore of the river. But JavaScript can indeed be part
of a solution, as we shall see.
First, we should note that, as things stand (April, 2009), HTML 5
says that certain elements, such as svg
and
math
, are to be placed in the namespaces one might expect
automatically. Unfortunately, existing Web browsers do not behave this
way. Once HTML 5 becomes a W3C Recommendation one might reasonably
expect to see implementations, but a great many people will still be
using older browsers. This also presents an incompatibility with XPath
used from JavaScript, because older browsers put everything in the
default (non) namespace.
One can use JavaScript to place elements in the right namespace to
make older browsers work the same way as newer ones, and this leads us
towards a possible solution.
We should note, however, that any JavaScript that changes the DOM
tree representation of the document might break other scripts, so our
route to a short-term solution is not without problems.
Another compatibility issue is that XPath name test expressions
with no namespace match only elements in the default namespace; a
request has been made to bless the idea that they also can match
elements in the HTML namespace. There is ongoing discussion in this area
at the time of writing.
Disruptive Approaches
In the interest of completeness, it is worth mentioning an
approach to changing namespaces that are incompatible with current
practice. Ian Hickson has suggested [Hixie2006]
adding micro-syntax within the xmlns pseudo-attribute, after the
namespace URI, to allow a search path of namespaces. This would cause
interoperability problems with existing XML software, but does show that
others have considered this problem space in the past. Private
communication tells the author that others have also given thought to
this problem.
Proposal: Automatic Namespaces
The goals of the Automatic Namespace mechanism are to allow document
authors to define their own namespace mix-ins in terms of other namespaces
and to refer to them, and also to minimise the amount of syntax needed for
declarations—in the case of HTML, ideally, to zero.
The first goal, defining mix-ins, is so that one can have a
combination such as HTML, MathML and GeoML, using a single URI to refer to
the definition, and share that definition between many documents. A Web
browser would act as if a default mix-in had been read; for older Web
browsers this could be simulated with JavaScript. This means that if you
want the default HTML 5 behaviour, in almost all cases you don't need to
do anything at all; but if you want some other combination, or you need to
extend the HTML 5 set of namespaces, you can do so easily.
The second goal, minimising syntax, is necessary in order to have
any hope of adoption by HTML and XHTML authors.
People reading a draft of this paper commented that a greater
barrier to XML adoption in the HTML world was the draconian
error-handling, which they believed meant that a Web browser must reject
any document that claims to be XML but is not well-formed. This is an
unfortunate mis-perception: in fact, the restriction is that the browser
must not claim such a resource to be a well-formed XML document, but, once
it is not XML it is outside the scope of the XML specification, and error
recovery is perfectly acceptable, as long as no claim is made that the
original document is itself XML. So it seems to this author that the
barrier is not draconian error handling, but browser writers. So, rather
than address a problem that appears not to exist, the approach here is to
address a real difficulty that might be pointed out as a barrier if the
draconian error-handling straw-man were to be removed. There is no
possibility of making the unfamiliar familiar without acquaintance, but
first impressions count for a lot.
The Automatic Namespace Definition
An automatic namespace definition file is a simple XML document.
It does not itself use a namespace, and does not need a DTD (although
there is one) or schema (although you can have those too if you like).
Let's start with a simple example:
<ns>
<element>
<name>svg</name>
<uri>http://www.w3.org/2000/svg</uri>
</element>
</ns>
The example says that whenever an element called svg
is encountered, it introduces a new default namespace, with the given
URI, which will apply both to it and to all its children, unless of
course they are themselves listed in a namespace file, or unless they
have explicit namespace bindings in the document.
This much one could do with DSRL, except that we have not used a
namespace declaration for our namespace definition file itself. We also
need a way to connect the namespace definition with the document; if we
are in an HTML document, we could use a link:
<link rel="ns" href="ns.xml" />
This markup would go in the HTML head, although it is only needed
if your namespace differs from the HTML 5 default, or if you are using
XHTML. For older (i.e. all existing) browsers we also need to add a link
to some JavaScript to make it work:
<script type="JavaScript src="ns.js" />
This is more complexity than declaring the SVG namespace, so it's
hard to imagine people doing it. But what if we define several
namespaces in one ns file?
<ns>
<element>
<name>svg</name>
<uri>http://www.w3.org/2000/svg</uri>
</element>
<element>
<name>math</name>
<uri>http://www.w3.org/1998/Math/MathML</uri>
</element>
</ns>
Now we are starting to make some headway, because although we
added a lot of garbage, we don't need to change the declarations in the
document. We could still do this with DSRL, or by putting the namespaces
right into that JavaScript code. Now suppose we want to add the
Enterprise Access Control Markup Language (invented for the purpose of
example), EACML, and that we have a separate eacml.ns file that defines
its element and namespace. We'll go back to our SVG-only example to keep
the example listing shorter.
<ns>
<element>
<name>svg</name>
<uri>http://www.w3.org/2000/svg</uri>
</element>
<element src="eacml.ns />
</ns>
Perhaps this markup is a little odd, but the idea is to have an
analogy with HTML script elements. In addition, a non-empty element can
supply a namespace URI, so that if the software recognises the URI, it
does not need to fetch the ns file.
For XML documents in general, we could use an attribute on the
top-level element (or any beneath it):
<mydocument xml:ns="ns.x" />
Such a declaration would need the blessing of the W3C XML Core
Working Group, and at the time of writing has neither been proposed to
them nor discussed by them.
Attributes
It turns out that some markup languages need to have some of
their attributes in a different namespace from their elements. This,
of course, is because the languages predate the invention of automatic
namespaces!
<ns>
<element>
<name>svg</name>
<uri>http://www.w3.org/2000/svg</uri>
<attribute>
<name>href</name>
<uri>http://www.w3.org/1999/xlink</uri>
<attribute>
</element>
</ns>
Now any href attribute anywhere inside an SVG element (or, more
precisely, affixed to an element in the SVG namespace) will be put in
the XLink namespace.
Namespace Prefixes
What if you need to disambiguate an element? Or if you need to
put a prefixed QName into an attribute?
The first answer is that you really don't want to put prefixed
names into attribute values. You might think that you do, but you are
deluded. If you should happen to persist, we will honour your
delusion. But we will not make it too easy. The answer is that you can
bind prefixes in just the way you always did, even in the presence of
namespace files.
The original design of automatic namespaces let you name a
prefix in the ns file, and use it in the instance, but it turns out
you can't do that: your document would not be considered well-formed
by Web browsers, which defeats the purpose. The second attempt was to
use a prefix character other than the colon, but at that point it
seems just as easy to declare the namespace. This is an area of
experimentation at the time of writing.
Note that a DSRL-based approach to disambiguation might be to
define a rewrite, so that the names of the elements are changed.
Automatic Namespace Files do not support renaming elements or
attributes beyond associating namespaces with them, partly because of
the goal of having documents that work in Web browsers as much as
possible, and partly because that seems a lot more than just defining
namespace mix-ins.
Changing DSRL
Since DSRL already exists, it seems reasonable to ask how it could
be changed to support automatic namespaces.
-
Support an implicit link to a DSRL definition, supplied by an
application (such as a Web browser), rather than requiring a
processing instruction. We want to allow XHTML documents to be legal
with a minimum of extra work for their authors.
-
Allow a DSRL processor to recognise a default namespace, so
that DSRL documents do not themselves need to use namespace bindings
in the most common cases.
-
Add an inclusion facility, so that one DSRL document can
reference another, preferably using a terminology that suggests
namespace bindings rather than the renaming of elements.
-
Ensure that there are royalty-free patent commitments from all
authors of the specification.
-
Ensure that the text of the specification will be freely
available, and can be freely reproduced in books, tutorials and
elsewhere.
Although DSRL is not exactly aligned with automatic namespaces, it
seems worth exploring further.
Conclusions and Ongoing and Future Work
Automatic Namespaces can considerably reduce the amount of syntax
at the start of XHTML documents. They can also legitimize HTML 5
parsing, by having a default namespace file that specifies the
behaviour, a sort of in-browser thought experiment. Automatic Namespaces
can also help with other XML applications, because although currently
you'd need to use (say) XSLT to process the namespace file and the
instance, this is a straight forward thing to do in may pipeline-based
work flows.
Links to other resources, such as schemas and style sheets,
human-readable documentation and more could arguably live in the same
namespace file in the future. The mechanism proposed is easily
extensible by adding new elements.
A simple mechanism that can be implemented in JavaScript for XHTML
documents can also be useful for more general XML, and could help to
give a way to consider HTML 5 documents to have mixed namespaces
without, in most cases, any need to declare them.
References
[Hixie2006] Hickson, Ian,
How to make namespaces in XML easier,
blog entry from 2006-06-29, online at http://ln.hixie.ch/?start=1151612438.
[ISO] ISO, ISO/IEC 19757-8:2008 Information technology -- Document
Schema Definition Languages (DSDL) -- Part 8: Document Semantics Renaming
Language (DSRL), online at www.iso.org; €98 fee
applies for downloading the PDF.
[XHTML-C1] XHTML™ 1.0 The Extensible HyperText Markup
Language (Second Edition) A Reformulation of HTML 4 in XML 1.0, W3C
Recommendation 26 January 2000, online at www.w3.org/TR/xhtml1/.
×ISO, ISO/IEC 19757-8:2008 Information technology -- Document
Schema Definition Languages (DSDL) -- Part 8: Document Semantics Renaming
Language (DSRL), online at www.iso.org; €98 fee
applies for downloading the PDF.
×XHTML™ 1.0 The Extensible HyperText Markup
Language (Second Edition) A Reformulation of HTML 4 in XML 1.0, W3C
Recommendation 26 January 2000, online at www.w3.org/TR/xhtml1/.