van der Vlist, Eric. “One Href is not Enough: We need n hrefs!” Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). https://doi.org/10.4242/BalisageVol7.Vlist01.
Balisage: The Markup Conference 2011 August 2 - 5, 2011
Balisage Paper: One Href is not Enough
We need n hrefs!
Eric van der Vlist
Dyomedea
Eric is an independent consultant and trainer. His domain of expertise include Web
development and XML technologies.
He is the creator and main editor of XMLfr.org, the main site dedicated to XML
technologies in French, the author of the O'Reilly animal books XML Schema and RELAX
NG and a member or the ISO DSDL (http://dsdl.org) working group focused on XML schema languages.
He his based in Paris and you can reach him by mail (vdv@dyomedea.com) or meet
him in one of the many conferences where he presents his projects.
Hyperlinks are an old concept that has been invented before the web and to achieve
this remarkable success, the web had to come out
with a very simplified version of hyperlinks.
In the process a lot of features have been lost and some of them, such as the possibility
to link to multiple targets have been
lost.
This talk proposes to use modern techniques to regain this ability while remaining
conform to existing standards and running in
existing browsers.
I am a webaholic: the web has changed my life and it has
changed the way I write: links are disruptive and my writing is no longer the same
since I can use them.
Before the web (and before the links) you had to be very careful to be understood
and introduce all the words that were not commonly
known or disambiguate those that could be ambiguous.
Now that we have links, we can use them for these two purposes and concentrate on
the message we express. This leads to a new
conciseness that I love.
Unfortunately when you use links a lot you run rapidly into trouble...
The other day, I was writing a blog post to announce that my paper had been accepted
at XPL Prague:
Just got the confirmation that I'll be presenting a paper on XQuery injection at XML
Prague March 26th or 27th.
While typing the obvious question arose: where should I link "XQuery" to?
To Wikipedia which is usually a good choice because it provides cool URIs (that don't
change) and pages that introduce a
subject?
To the W3C recommendation which is another cool URI (that doesn't change) and is the
normative reference but isn't an
introductory material?
Elsewhere (to the XQuery tag on my blog, to the W3C XML Query Working Group,...)?
And, for versioned resources such as Wikipedia pages or W3C recommendations should
I link to the current version at the date when I
wrote the blog entry or to an updated, latest version?
All these choices make sense but (X)HTML imposes to chose one and only one target
for a link!
The problem got worse when I was typing "XML Prague" because I had to choose between:
Linking "XML" and "Prague" separately (and again, to which target? Wikipedia, the
W3C recommendation, the XML category for
"XML"; Wikipedia, tourist office, ... for "Prague")
Linking "XML Prague" as a whole to the conference web site or the tag on my blog.
This issue of embedded links seems really tough and I think I could live with it but
wanted to mention it for completeness.
The problem can also get worse when I write in French because I often want to give
the choice between targets in French and targets in
English when they are higher quality...
In other words: one href is not enough, we need n hrefs!
Am I asking too much?
I don't think so, my requirements are legitimate and generic: I want to be able to
write simple sentences using the words that are
relevant in my domain(s) while using links to give to my readers the ability to discover
the meaning of the words that they don't know,
browse authoritative resources to deepen or extend their knowledge or find out relative
pages that I have written.
Furthermore this is an old issue already addressed in SGML world by HyTime and acknowledged
by the W3C back in 1997!
What happened then?
The topic has always been considered touchy and the first working draft published
in April 97 as "Extensible Markup Language (XML): Part
2. Linking" notes:
Please be advised that the draft you are now reading is unusually volatile. The debating and balloting process
which determines the material contents is far from complete, and is nonetheless substantially
ahead of the editing process that turns
the material contents into usable specification language.
The content was indeed so volatile that the specification was taken out of the XML
recommendation and eventually became a recommendation
no less that four years after in June 2001. This recommendation, known as XLink, does
address what I need:
This specification defines the XML Linking Language (XLink), which allows elements
to be inserted into XML documents in order to
create and describe links between resources. It uses XML syntax to create structures
that can describe links similar to the simple
unidirectional hyperlinks of today's HTML, as well as more sophisticated links.
Unfortunately, without wanting to start a flame war nor blame anyone, I think it is
fair to say that the syntax of these sophisticated links mentioned in this introduction
and known as
"extended links" is so complex that they are considered unusable by most of us XML
geeks and have no chance to be embedded in real world (X)HTML pages. If you're not
convinced by this bold
statement, please hold on: I'll come back on extended XLinks in a while...
Is this topic doomed then? How can we go through when previous attempts seem to have
all failed?
Ten years have passed since 2001 and one of the things we've learned is to hijack
existing technologies to do what we need! Some
hijacking technologies have even become de facto standards... Why not call them to
the rescue?
In other words, why not use microformats, RDFa or HTML5's microdata to specify these
"sophisticated links" that are missing to
XML?
Requirements
Please take the remaining of this paper as a demonstration of how this problem could
be handled rather than a final proposition...
The requirements that are chosen here are arbitrary: they meet what I find important
as I write these lines and are subject to discussion but I am confident that the same
method can be
used with different requirements sets as long as they remain "reasonably" simple!
The requirements for this exercise can be summarized as defining a (X)HTML jargon
(microformat, RDFa, microdata, ...) that:
Expresses inline links with multiple arcs between (X)HTML fragments and several link
ends.
Can be processed by a simple JavaScript library to be displayed in a fancy way.
Degrades nicely and remains readable when not processed by such a library.
Plays well with search engines.
Do not requires server storage.
If possible, provides a way to annotate the arcs (to provide arc roles, the language
of link ends or other informations).
If possible, support embedded links.
The general idea is to keep the thing as simple as possible while maintaining good
practices!
Requirement 3 excludes solutions such as pluralink that package multiple links into a single href attribute and is not "degradable"
since the link doesn't work if it
isn't processed by a script.
Requirements 3 and 4 can be contradictory. Taken alone, point 3 would lead to defining
a jargon that would replace "XQuery" by
"XQuery [Wikipedia, W3C]" with links between the words "Wikipedia" and "W3C" and (respectively)
the article about XQuery on Wikipedia and
the XQuery W3C recommendation but the practice may be considered as an almost as poor
as the infamous "Click here" practice!
Requirement 4 will thus lead to more verbose alternatives such as "XQuery [XQuery
on Wikipedia, XQuery W3C Recommendation]" with links
on "XQuery on Wikipedia" and "XQuery W3C Recommendation".
Requirement 5 excludes services such as http://www.multiurl.com/ that are similar to URL
shorteners with the additional possibility to define multiple targets.
Note
This is a simplified set of requirements and that do not take into account chained
links such as the relation between a page and its archive or translation. In this
first version the
arcs are between a document fragment and multiple resources that are all at the same
level. In a next iteration, we'll have to see how this can be extended to introduce
relations
between linked resources.
First Step: Without Embedded Links
Let's first keep things simple and explore simple implementations for microformats,
RDFa and microdata.
In each case, we will present the markup to express an nhrefs link and the corresponding
JavaScript implementation.
This implementation will loop over nhrefs links and for each link it will hide the
original markup but keep it intact so that other
scripts could access the information for other purposes if that was necessary. For
each link, a dialog will be created and a simple link
will be added to open this dialog.
Kissing with Microformat
The good thing with microformats is that their "balisage" is flexible and they often
can be kept as simple as possible...
In our case, the following seems to be good enough (indentation has been added to
make the code more readable):
Is the source of the link (the link start if you prefer). This source is always local
to the document.
a.arc
Is an arc.
a.arc/@rel
Is the arc role (using curies and/or a set of well known common roles).
a.arc/@href
Is the URL of the arc destination.
a.arc/node()
Is the label of the arc end.
This format degrades reasonably well when it is not processed by any kind of script:
With a simple JavaScript function, this text can be streamlined into:
This script opens a dialog when you click on link that has been generated around the
word "XQuery":
If you wonder the level of complexity of such a script, here is a version that uses
jQuery (the code could probably be further
simplified: I am not a jQuery expert):
jQuery(document).ready(function() {
jQuery('.nhrefs')
.each(function() {
var span = jQuery(this);
span.hide();
var source = jQuery('.source', this).text();
var link = jQuery(span.before('<a href="">'+ source +'</a>')[0].previousSibling);
var dialog = jQuery(span.before('<div title="Links for "'+ source + '""><ul /> </div>')[0].previousSibling);
var list = jQuery('ul', dialog);
jQuery('a.arc', this)
.each(function(){
list.append('<li><a href="' + this.href + '">' + this.text + '</a>');
});
dialog.dialog({ autoOpen: false });
link.click(function() {
dialog.dialog("open");
return false;
});
});
});
Tripling with RDFa
The good thing with RDFa is that assertions can be extracted using any tool of a generic
toolbox.
The price to pay is that your markup needs to follow a set of rules that are much
more rigid than those of microformats...
In our case, here is the simplest markup I have been able to produce (enhancements
welcome, especially if they simplify the
source!):
This code get displayed exactly like its microformat counterpart when it is not processed
by a script.
Although this snippet is more verbose than its microformat equivalent, it is arguably
more "auto documented" and any reader (human or
not) familiar with RDFa can understand that we have here a "nhrefs:link" with a source
and a couple of arcs...
Here is how Raptor RDF sees it (with some help from Graphviz):
More concisely, it can be represented in turtle as:
To be honest, there is a flaw in this model: the arcs are embedded in a blank node
without using any container and in that case RDF
specifies that the triples are unordered. In other words, there is no guarantee that
the relative order of the arcs will be
kept.
Neither the current recommendation (RDFa1.0) nor the latest RDFa 1.1 Working Draft
support containers but a proposal has been made on the RDFa wiki and I do hope that this most needed feature will be
added to RDFa at some
point.
This is only a problem as far as authors expect this order to be preserved (which
is probably the case) and if we use a RDF library
that may change this order (which is not the case of the library that we'll be using)
but this is still a flaw.
A RDF library... Yep, let's see how you parse that kind of thing in JavaScript!
It could be tempting to use a library such as jQuery and just adapt what we've done
for microformats to query the RDFa attributes in
stead of the class attributes that drive microformats...
This would work on this example but unless you are ready to reimplement a RDFa parser
that wouldn't work with models that would express
the same set of triples using different RDFa syntaxes: even supporting a different
namespace prefix than "nhrefs" would require extra
work.
The best way to avoid these issues is to use a RDFa parser and, if you enjoy jQuery,
Jeni Tennison's rdfQuery is definitely for you since it comes as a kind of
jQuery add-on and shares its syntax.
RdfQuery also borrows a lot from SPARQL and to get the nhrefs links with their sources,
you can write:
var rdf = jQuery(document)
.rdf()
.prefix('nhrefs', 'http://nhrefs.org/')
.where('?link a nhrefs:link')
.where('?link nhrefs:source ?source');
In RdfQuery like in SPARQL, query results are sets of resources and literals rather
than triples. These resources and literals cannot be
mapped back to DOM nodes in the (X)HTML document and you need to go back to the triples
for that.
In our case, the outer span element for the link is the element that carries the type
information:
<span typeof="nhrefs:link">
...
</span>
A triple directly generated by this element is:
?link a nhrefs:link
And to get the span element (to hide it and prepend the dialog and replacement link),
you can query this triple and use its source
attribute:
rdf
.each(function(){
var span = jQuery(rdf.reset().where(this.link.value + ' a nhrefs:link').sources()[0][0].source);
span.hide();
After that, you can perform a sub query to find the arcs and create the dialog with
the query results. The remaining of the function is
straightforward and the complete code is:
jQuery(document).ready(function() {
var rdf = jQuery(document)
.rdf()
.prefix('nhrefs', 'http://nhrefs.org/')
.where('?link a nhrefs:link')
.where('?link nhrefs:source ?source');
rdf
.each(function(){
var span = jQuery(rdf.reset().where(this.link.value + ' a nhrefs:link').sources()[0][0].source);
span.hide();
var link = jQuery(span.before('<a href="">'+this.source.value+'</a>')[0].previousSibling);
var dialog = jQuery(span.before('<div title="Links for "'+ this.source.value + '""><ul /> </div>')[0].previousSibling);
var list = jQuery('ul', dialog);
rdf
.reset()
.where(this.link.value + ' nhrefs:hasarc ?arc')
.where('?arc nhrefs:title ?title')
.where('?arc nhrefs:dest ?dest')
.each(function(){
list.append('<li><a href="' + this.dest.value + '">' + this.title.value + '</a>');
});
dialog.dialog({ autoOpen: false });
link.click(function() {
dialog.dialog("open");
return false;
});
var x = this;
});
});
Again, this code is more verbose than its microformat counterpart, but the links properties
are accessed using proper queries over
formal properties and that seems more robust than just relying on (X)HTML classes.
Bleeding with microdata
HTML5's microdata is arguably the most bleeding edge of these somewhat competing technologies.
Although HTML5 isn't there yet, microdata
can be used with libraries such as HTML5 Microdata JavaScript.
Some HTML5 specific features such as using meta elements within page bodies can't
be used (because these elements are considered bogus
and are stripped down by browsers) and need to be workaround. However, the result
is still reasonably simple:
This code get displayed exactly like its microformat and RDFa counterparts when it
is not processed by a script.
The microdata jQuery library is fairly simple to use and the code to process these
links is very similar to what we've seen so
far:
jQuery(document).ready(function() {
jQuery(document)
.items('http://nhrefs.org/link')
.each(function(){
var span = jQuery(this);
span.hide();
var source = span.properties('source').itemValue();
var link = jQuery(span.before('<a href="">'+ source +'</a>')[0].previousSibling);
var dialog = jQuery(span.before('<div title="Links for "'+ source + '""><ul /> </div>')[0].previousSibling);
var list = jQuery('ul', dialog);
span
.properties('arc')
.each(function(){
var arc = jQuery(this);
list.append('<li><a href="' + arc.properties('dest').itemValue() + '">' + arc.properties('title').itemValue() + '</a>');
});
dialog.dialog({ autoOpen: false });
link.click(function() {
dialog.dialog("open");
return false;
});
});
});
Why not extended XLinks after all?
Now that we've seen the level of simplicity (or complexity) of three different approaches,
let's go back and revisit extended XLinks.
To express an extended link, you need to define :
The extended link itself that will serve as a container.
Link ends that can be either local to the link or external. In our case, the source
(i.e. the span containing the text "XQuery") can be defined as a local resource and
the targets will necessarily be defined as external resources (aka XLink "locators").
The arcs between the link ends.
As far as XLink is concerned, a simple way to define these links in a XHTML document
could be:
As far as I understand the XLink recommendation, this is enough to express what we
want. That's not so bad and we could argue that the level of complexity is similar
to what we've
seen so far.
Unfortunately, I am not aware of any existing implementation that can process this
markup and display what we want to display. Browsers just ignore extended links and
won't display
anything more than the word "XQuery" from this markup.
To get a degraded display similar to what we had with microformats, RDFa or microdata,
we need to repeat the target titles and href attributes :
<!-- An extended link -->
<span xlink:type="extended"
xlink:role="http://nhrefs.org/link/">
<!-- The source -->
<span xlink:type="resource"
xlink:role="http://nhrefs.org/source/"
xlink:label="source">XQuery</span> [
<!-- The targets -->
<a href="http://en.wikipedia.org/wiki/XQuery"
title="XQuery on Wikipedia"
xlink:href="http://en.wikipedia.org/wiki/XQuery"
xlink:type="locator"
xlink:role="http://nhrefs.org/target/wikipedia/"
xlink:label="target"
xlink:title="XQuery on Wikipedia" >XQuery on Wikipedia</a>,
<a href="http://www.w3.org/TR/xquery/"
title="XQuery W3C Recommendation"
xlink:href="http://www.w3.org/TR/xquery/"
xlink:type="locator"
xlink:role="http://nhrefs.org/target/authoritative/"
xlink:label="target"
xlink:title="XQuery W3C Recommendation">XQuery W3C Recommendation</a>]
<!-- The arcs -->
<span xlink:type="arc"
xlink:from="source"
xlink:to="target"> </span>
</span>
Here we have a XHTML fragment that will get displayed with the degraded display than
we have requested in our requirements and has the meaning that we want to convey for
XLink
implementations.
The price to pay in term of complexity is clearly visible when we compare this fragment
to what we've seen before.
In addition to the markup complexity, I am not aware of any JavaScript implementation
of extended XLinks on which we can rely to process this fragment like we did for the
other
technologies and we might have to develop our own JavaScript implementation
If the downsides are clearly visible, the benefit is not that obvious!
Except for being proud to be conform to a W3C recommendation and hoping to convince
more people to use it, what's the benefit of using a recommendation that has almost
no traction?
Next Step: Embedding
A simple way to represent embedded links is to embed nhrefs links with the source
property of another nhrefs link.
OK, but how should we present such embedded links to the user?
Taking back the example of "XML Prague", we could differentiate the link on "XML"
that would present resources about XML and resources
about XML Prague and the link on "Prague" that would present resources about Prague
and resources about XML Prague. However, this would be
displayed by the browser as one link (or at best two links separated by a space) and
users would very likely miss the difference between
these two links.
To avoid this issue, I suggest that we display the same dialog on all the terms of
embedded links. That dialog will display all the
links for all the terms but can group the links per term.
Of course, we are bitten again by the same limitation: the links that compose sources
are unordered and in theory there is no
guarantee that when we generate the title for composed links we won't generate "Prague
XML" instead of "XML Prague"!
The JavaScript is now 77 lines long (compared to 24).
Next Steps
All three techniques provide a lightweight solution to express links with multiple
arcs that are easy to parse in JavaScript. Now, what
can we do with all these angle brackets?
The first conclusion is that for this application there is no clear winner between
microformats, RDFa and microdata:
Microformats are less verbose and more "free style". The price to pay is that you
need to read the spec to understand the
structure of each of them and need to use DOM level methods to get your information.
Microdata and RDFa have roughly the same level of verbosity.
RDFa and microdata are more rigid and more verbose. The benefit is that if you use
the right library you can parse their
structure with higher level methods.
In theory, RDFa doesn't preserve the relative order between arcs and multi part sources.
Microdata isn't at recommendation stage yet and may change.
With RDFa, it is straightforward to extract link information as triples and use semantic
web tools to do all kind of funky
things with them.
In the future, microdata will probably be natively supported by browsers.
The most sensible choice is probably to make no choice and support all three technologies!
OK, but what can we do with all these angle brackets?
The markup should be further documented and it can be seen as an open API between:
Consumers (such as the scripts that have been presented here) that parse the markup
to do all kind of interesting things.
Producers that write this markup which isn't really fun to write by hand.
The consumers that we've seen should be documented and tested before they can be considered
really usable.
Producers need to be implemented. Producers for popular web publishing platforms would
be especially useful. For these platforms, two
kind of publishers could be developed:
Transformers that transform other markup into one of these three formats. In WordPress
for instance nhrefs links could be
expressed using shortcodes in the posts.
GUI that let user create nhrefs links is a friendly way.
Producers and consumers could also be packaged as plug-ins for web publishing platforms.
Such a plug-in would contain:
A producer to facilitate the production of nhrefs markup by the platform.
The JavaScript to display the links on the browser.
This is more or less my roadmap for this project. If you are interested, watch this
space: http://nhrefs.org!