Walsh, Norman, and C. M. Sperberg-McQueen. “Interactivity Three Ways.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Walsh01.
Balisage: The Markup Conference 2021 August 2 - 6, 2021
Balisage Paper: Interactivity Three Ways
Norman Walsh
Saxonica
Norm Walsh is a Senior Software Developer at Saxonica. He has also been an active
participant in international standards efforts at both the W3C and OASIS. At the W3C,
Norm was chair of the XML Processing Model Working Group, co-chair of the XML Core
Working Group, and an editor in the XQuery and XSLT Working Groups. He served for
several years as an elected member of the Technical Architecture Group. At OASIS,
he was chair of the DocBook Technical Committee for many years and is the author of
DocBook: The Definitive Guide. Norm has spent more than twenty years developing commercial and open source software.
C. M. Sperberg-McQueen
Black Mesa Technologies
C. M. Sperberg-McQueen is the founder and principal of Black Mesa Technologies, a
consultancy specializing in helping memory institutions improve the long term preservation
of and access to the information for which they are responsible.
He served as editor in chief of the TEI Guidelines from 1988 to 2000, and has also
served as co-editor of the World Wide Web Consortium’s XML 1.0 and XML Schema 1.1
specifications.
One of the most obvious differences between documents physically printed on pages
of paper and documents displayed on electronic devices is that the latter can be interactive
in ways that the former cannot. More than 50 years ago, this is what convinced Ted
Nelson and others that when used well computers would dramatically change our relation
with text. What kinds of interactivity are possible, and to what extent interactivity
adds value to a document, are challenging questions that require careful analysis.
Deciding that some specific interactive feature would add value immediately raises
a new challenge: how is that feature going to be realized? In this paper, we look
at three different technologies that can be used to add interactivity to a document
presented on the web: “plain old JavaScript”, Saxon-JS, and XForms. We examine a specific
feature and compare the differences between similar implementations across these three
platforms.
One of the most obvious differences between documents physically
printed on pages of paper and documents displayed on electronic
devices is that the latter can be interactive in ways that the former
cannot.
More than 50 years ago, this is what convinced Ted Nelson and
others that when used well computers would dramatically change
our relation with text.
What kinds of interactivity are possible, and to what extent
interactivity adds value to a document, are challenging questions that
require careful analysis.
Deciding that some specific interactive feature would add value
immediately raises a new challenge: how is that feature going to be realized?
In this paper, we look at three different technologies that can
be used to add interactivity to a document presented on the web:
“plain old JavaScript”, Saxon-JS, and XForms. We examine a specific
feature and compare the differences between similar implementations across
these three platforms.
Background
When a conference is an in-person event at a particular location,
there’s relatively little ambiguity in the schedule. If the clock on
the wall at breakfast reads 9:45am and the first talk is scheduled for
10:00am, you have fifteen minutes.
The schedule for a virtual conference with attendees in many
different time zones is another matter. In principle, you can publish
the schedule in a single time zone and everyone can add or subtract
the necessary offset to get “local time”. If the first talk is
scheduled for 10:00am, in Rockville, and the clock on the wall at
breakfast reads 9:45am, you have fifteen minutes — if you are
in Rockville, or anywhere else on Eastern Daylight Time. If the clock
is on Mountain Time because it and you are located in Denver, you’ve
missed that talk!
Ask anyone who has participated regularly in activities
organized simultaneously across several time zones and they will tell
you stories about when they got it wrong or their colleagues
did.
It’s just easier if you can see the times in your time
zone.
For Balisage 2020, Norm attempted to make the Balisage schedule
interactive such that a user could pick a timezone and the “schedule
at a glance” table would be updated so that it displayed the times of
the talks translated into that timezone.
The challenge at that time was to see if this could be done
using entirely native JavaScript APIs (no third party libraries
allowed), in a relatively straightforward way, that would work on many
modern browsers.
It turned out that a couple of decades into the twenty-first
century, this was a quite practical task. The complete solution was
achieved in just under 200 lines of code. (Lines of code is arguably a
poor metric generally, but this gives a sense of the magnitude. You
could just about squeeze the whole thing onto the front and back of a
single sheet of paper and it would still be legible.)
This year, the challenge is to achieve the same results using
Saxon-JS and XForms. Then we’ll examine how those solutions compare.
Constraints
The first and most significant constraint is that adding the
interactive schedule must not change the editorial process for
producing the schedule page in any significant
way. This is a perfectly fair and reasonable constraint. There is an
existing process and retraining everyone who updates that page is
beyond the scope of this project.
Luckily, as we’ll see in the next section, the Balisage
organizers understand the value of markup and the page is already
structured for reuse. And we aren’t forbidden from making any
changes to the markup on the page, only those that would significantly interfere
with the editorial process.
There are two additional constraints that stem from our requirement
not to change the editorial process:
The rendering has to be done in the browser. The data source is
markup in an otherwise static HTML page. Storing the schedule in a
database and rendering it from a server would open up new and
different avenues of approach, but we aren’t considering those here.
The table itself is authored by hand. The talks have semantic
markup that we can use, but there are other things in the schedule
with no corresponding description: the markup of the “Description”
column, the schedule entry for Balisage Bard, closing comments, etc.
If we were starting from scratch, it might be practical to define
declarative markup for the whole schedule and generate the table directly
from that markup, but we are
not starting from scratch.
The freedom we’ve given ourselves to make “insignificant”
changes to the page allows us to configure the schedule-at-a-glance
table so that it will be easier to process. These changes only have to
be done once and have no impact on the other editorial changes to the
table.
First, we can add some markup to each cell in the “time” column
to identify the time of the talk in a machine-readable way:
The datetime attribute contains an ISO 8601 value for
time. This value includes a timezone and represents the moment to display, but we
can assume they’ll all be in the timezone of Rockville, MD.
(The content of the time element is the displayed value, which will
change as a function of the selected local timezone.)
Additionally, we can add a data-slot attribute to
each table cell where a talk occurs. The presence of that attribute
identifies that a talk should be injected into this cell and the value
of the attribute identifies the talk.
Markup for the table cell for the talk on Monday at 10:00am is
<td row-span="2" data-slot="Monday/10:00"></td>
The data-slot attribute identifies the talk by day and
time, not by its XML identifier, to protect the schedule-at-a-glance
against errors if the schedule changes at the last minute.
Requirements
For the purposes of this paper, the requirements below have
been identified.
Each talk is to be summarized with its title and author (including a link to
their bio, if it’s available). The abstract should appear as a popup or “tool tip”.
The title and author are separated by an em-dash (—). If multiple authors are
present, they are separated by commas.
If the talk identified by the data-slot attribute has no authors,
the class “lb” must be added to the class attribute for its cell.
(This identifies “late breaking” slots which are colored differently by the CSS.)
If there is no talk corresponding to the data-slot value,
the class “none” must be added to the class attribute for its cell.
Additionally, the contents of the cell must be “No talk scheduled.”
The table has two controls, a select with options for the
timezone and a checkbox for selecting am/pm or 24 hour time display.
The times in the first column of the table must be adjusted according to these
values. The value of each select option is the
timezone offset expressed as +/-HH:MM.
The select has the id “tz”, the checkbox has the id
“clock24”.
When the timezone is changed, all of the times must be adjusted for the
newly selected timezone. If the checkbox is changed, all of the times must be
adjusted for the corresponding am/pm or 24 hour time display.
When the timezone is changed, the value of the “clock24” updates
automatically. It is unselected for timezones that have a negative
offset and selected for all others. This corresponds roughly to “use
am/pm time in the Americas and 24 hour time everywhere else.” In practice,
that’s a locale setting with more nuance, but this simplification probably
satisfies most of the likely audience.
If the schedule includes 12:00 in local time, it is displayed as either
“midday” or “noon” depending on the time display: midday if the 24 hour display
is used, noon otherwise.
If the schedule includes 00:00 in local time, it is displayed as “midnight”.
If midnight occurs within the schedule, “<br> (the next day)”
must be added to the displayed times that occur after midnight.
If the table is successfully constructed, the content of the p
element on the page that has the id “schedlink” must be changed to:
Historically, the schedule page was the “single source of truth”
used to generate both web pages and a printed conference program. Consequently,
the existing editorial process already adds sufficient markup to correctly
identify the salient details about each talk: when it occurs, the title,
authors, and so forth. You can easily explore this markup with “view source”
on the schedule page.
In brief: there’s a div for each day. Each event has
its own div within the day. The date and time, title, speakers,
and abstract are encoded within that div. By way of example,
Figure 1 shows how a talk scheduled for Monday at
10:00am would be tagged.
Observations about browser architecture
All three of the solutions discussed here ultimately run in the
browser, using the architecture and features exposed there. XForms
takes a robustly declarative position, freeing the user from many of
the details, but the XForms implementation is
still operating within the browser architecture.
The programming language environment supported by the browser is
JavaScript. (There are XSLT 1.0 implementations as well in some
browsers, but XSLT 1.0 stylesheets by themselves, as supported by
browsers, aren’t sufficient to write interactive applications.)
JavaScript is an interesting language. A full exploration of the
features of JavaScript is well beyond our scope here. For our
purposes, in this paper, it’s sufficient to think
of JavaScript as a procedural programming language with a syntax in
the style of C and Java. It is much more than “just” a procedural
programming language, but we don’t need its advanced features
(objects, prototypes, promises, etc.) for this simple application.
Browser implementations of JavaScript provide complete and
robust access to the HTML documents rendered by the browser. They do
not, natively, provide especially robust access to XML documents or
XML document APIs. Most JavaScript applications use JSON to store
data, not XML. The easiest data structures to create, query, and
update in the browser using JavaScript are JSON objects and arrays.
There are rich JavaScript APIs for accessing information about
the state of the browser, the web page being displayed, and the state
of individual controls on the web page. You can easily, for example,
ask if a checkbox is or is not checked. You can easily change its
state as well. There are also APIs for querying and updating the
values of class attributes, CSS properties, etc.
JavaScript is single threaded: if your
browser is doing one thing in JavaScript, it can’t simultaneously be
doing something else. In addition, the scheduler, the part of the
browser that decides what bit of JavaScript to run, isn’t preemptive.
If you write JavaScript code that “busy waits” for something to
finish, you’re not just blocking your thread, you’re blocking
all the threads. If you’ve ever seen the browser
dialog box that says “this page isn’t responding, do you want to kill
it or continue waiting?”, you’ve seen what happens if a thread busy
waits for too long.
In order to avoid this, JavaScript programmers have to use
“callback functions”. Imagine that your JavaScript code is going to
run a query to find something on the page, a particular
div, for example, and then process it. A naive
implementation would ask the browser to run the query, wait for it to
return the result, and then process it.
// This looks a little bit like JavaScript,
// but it's just pseudo-code
let myDiv = document.find("div.ProgramEvent");
processMyDiv(myDiv);
There are lots of other
JavaScript functions waiting in the wings to run (even if you didn’t
write them, chunks of the browser itself are waiting for their turn to
run). Waiting for something to finish is bad practice.
It’s your responsibility to give up control and let someone else
have a turn!
What you do instead of waiting, and what many JavaScript APIs force you to do,
is tell the engine what function to “call back” when the query results are ready.
// More pseudo-code
document.find("div.ProgramEvent", processMyDiv);
Execution of your script stops at the point where you call
document.find. Someone else gets their turn, and when the
results are ready, your processMyDiv function will be called.
This mostly works just fine, even if it’s a little confusing at first.
Thing is, this is so common, that it would
be tedious if you had to write a separate, named function (like
processMyDiv) for every callback you used. JavaScript
lets you just inline the function. Again, this may look a little
confusing, but you get used to it.
// One final bit of pseudo-code
document.find("div.ProgramEvent", function() {
// just write the code that processes the
// div right here.
});
Another area where callback functions come into play is in
handling events. From the perspective of your application, the browser
is “out there” displaying HTML elements, scrolling the page, reacting
to mouse clicks, and doing all the things that the browser does. These
are all represented in the browser as events that occur: a button was
clicked, the page scrolled, a web request finished, etc. There are
lots and lots of kinds of events, we won’t try to enumerate them all
here.
If your application wants to participate, it needs to tell the browser
what events it cares about. It does that by registering a special kind of callback
function, an event listener:
a_button.addEventListener('change', function(event) {
// do something in response to the event
});
When the “this item changed” event occurs on the page element
identified by “a_button”, the “listening” callback
function will be run.
There’s lots more that could be said about JavaScript, including new features
that are syntactically nicer than all those callback functions, but that’s enough
of the conceptual framework for now.
Plain old JavaScript
At a glance: satisfies all the requirements, +6K download.
Two browser APIs form the heart of the plain old JavaScript
solution: query selectors and “inner HTML”. Query selectors come in
two forms: querySelector and querySelectorAll.
Each takes a CSS selector as an expression and returns matching
nodes. When called from the document object, the selector
applies to the whole document. When called from some other node, such as one
returned by an earlier query, the selector applies to the
descendants of that node. Where querySelector returns
the first match, querySelectorAll returns all the matches.
For example,
document.querySelectorAll("div.ProgramEvent")
returns all of the div elements with the ProgramEvent class.
If you look back at the markup example in Figure 1, you’ll see that
that is the wrapper for each scheduled talk. You can process those elements
with a JavaScript forEach and our friend, the callback function:
document.querySelectorAll("div.ProgramEvent")
.forEach(function(item) {
// ProgramEvent processing goes here
})
You can nest these API calls and callbacks arbitrarily.
To finish our little detour into the mechanics of the query
selector APIs, here’s an example that will process each event and write the
day of the week to the console:
The only difference is that
querySelector("span.Day") is rooted at the
item we’re currently processing to get the first
span element with the Day class. That’s the
day of the week for this event.
If you run that code in the browser console, you’ll get all of the day spans:
The other API that plays a central role is “innerHTML”. The
inner HTML of an element is its HTML content, as a string. You can read from
innerHTML to get the content, but you can also write
to it to change the content.
In other words,
document.querySelector("table caption").innerHTML
returns “Interactive schedule-at-a-glance” (if you’re on the
schedule page). But
Will change the caption to “Hello everyone!”.
If you’ve got your browser handy,
try it and see!
Armed with these JavaScript APIs, the task isn’t too difficult.
JavaScript is neither especially declarative nor necessarily functional.
The cheap-and-cheerful approach here is to make a few passes over the document
with querySelectorAll, querySelector and innerHTML
to build a summary data structure.
For each talk, we construct a little JSON object containing the relevant details.
For example:
const talk = {
"day": "Monday",
"dtstart": "12:00",
"dtend": "12:30 EDT",
"title": "XSLT 3.0 on ordinary prose",
"blurb": "You work with text and documents for a living,…",
"authors": ["Norman Tovey-Walsh"]
};
And we insert those into a global object, talks with the
value that will be in the data-slot attribute as the key:
Next, we use querySelector to find the table and nested
querySelectorAll calls to process each row and cell.
If the cell has a data-slot attribute, we populate it by
writing to innerHTML, otherwise we just copy its contents.
There’s a little bit of fiddling with the class attribute, but the browser
has simple APIs for that sort of thing.
We have to do two more things: we have to adjust the times to the local
time zone and we have to arrange for the browser to run the adjustment whenever
the user changes the timezone, or the 24 hour clock checkbox.
The actual adjustment is straightforward. If we know what the offset is between
the time in Rockville and the time zone to display, we simply add that offset (possibly
negative) to the time, do a few small cosmetic adjustments (12 or 24 hour clock, etc.)
and display the result:
function adjustTime(item) {
let timez = item.getAttribute("datetime");
// We assume timez is in ISO 8601 format
let hours = timez.substring(11,13);
let minutes = timez.substring(14, 16);
hours = parseInt(hours) + OFSHOURS;
minutes = parseInt(minutes) + OFSMINUTES;
while (minutes >= 60) {
hours += 1;
minutes -= 60;
}
let plusday = "";
if (hours >= 24) {
plusday = "<br />(the next day)";
hours -= 24;
}
let ampm = 'am';
if (!CLOCK24 && hours >= 12) {
if (hours > 12) {
hours -= 12;
}
ampm = 'pm';
}
if (CLOCK24) {
hours = hours.toString().padStart(2, "0");
} else {
hours = hours.toString().padStart(2, " ");
}
minutes = minutes.toString().padStart(2, "0");
let timel = hours + ":" + minutes;
if (!CLOCK24) {
timel = timel + ampm;
}
if (hours == 0 && minutes == 0) {
timel = "midnight";
plusday = "";
}
if (hours == 12 && minutes == 0) {
if (CLOCK24) {
timel = "midday";
} else {
timel = "noon";
}
plusday = "";
}
item.innerHTML = timel + plusday;
}
The key ingredients in that function are the OFSHOURS,
OFSMINUTES, and CLOCK24. Those values come from the user’s
selections. We listen for when the timezone is changed:
let timezone = document.querySelector("#tz");
timezone.addEventListener('change', function() {
adjustOffset(this);
}, false);
The adjustOffset function is passed the select
control. The selected option is identified by the selecedIndex
property on that control. That gives us access to the selected timezone.
From there, we compute the offsets and the default value of the
24 hour clock, then we run adjustTime on each time value:
function adjustOffset(item) {
let tz = item.options[item.selectedIndex].value;
let pos = tz.indexOf(":");
let hours = tz.substring(0, pos);
let minutes = tz.substring(pos+1);
OFSHOURS = parseInt(hours) + ROCKVILLE_OFFSET;
OFSMINUTES = parseInt(minutes);
if (OFSHOURS > 0) {
document.querySelector("#clock24").checked = true;
CLOCK24 = true;
} else {
document.querySelector("#clock24").checked = false;
CLOCK24 = false;
}
document.querySelectorAll("time").forEach(adjustTime);
}
A similar listener function adjusts the CLOCK24 value
when the checkbox changes and recomputes the times.
Saxon-JS
At a glance: satisfies all the requirements, 6K stylesheet (42K compiled),
plus Saxon-JS (3M) download.
If you are familiar with XSLT, you will probably recognize a
common design pattern in this task. What we want to perform is an
“almost identity transformation” on the table, adjusting the content
of just those cells that have a data-slot attribute.
Having an XSLT processor in the browser lets us leverage those common
design patterns directly.
It’s easy to match the cells that have a data-slot
attribute. Finding the corresponding ProgramEvent object can
be accomplished directly with a key:
So the heart of our table transformation is just this simple template:
<xsl:template match="td[@data-slot]">
<xsl:variable name="event" select="key('slot', @data-slot)"/>
<xsl:choose>
<xsl:when test="$event">
<!-- copy the event details into the table -->
</xsl:when>
<xsl:otherwise>
<!-- mark this cell as "No talk scheduled" -->
</xsl:otherwise>
</xsl:choose>
</xsl:template>
Like JavaScript, Saxon-JS has APIs for interacting with the
browser. Idiomatically, they’re exposed as XSLT extension functions
and extension elements, but they’re very similar to JavaScript. This
shouldn’t be a surprise as Saxon-JS is JavaScript
running in the browser using the underlying JavaScript APIs on your
stylesheet’s behalf.
Let’s step back and look at how those APIs work.
At the top of our web page, we need to load Saxon-JS and a small JavaScript hook
to start the ball rolling. (In practice, the hook function can be inserted directly
in the
HTML page with a script element, but Norm has chosen to keep it outside to
satisfy his aesthetic sensibilities.)
The startup script is just a function assigned to the browsers
“onload” event so that it runs automatically when the page is
loaded:
function render_schedule() {
let options = { stylesheetLocation: "xslt/schedule.sef.json" };
SaxonJS.transform(options, "async");
}
window.onload = render_schedule;
Saxon-JS evaluates a compiled form of XSLT stored in a “SEF”
file, most often in a JSON document. XSLT stylesheets can be compiled
with Saxon-EE or with Saxon-JS running on Node. That step just becomes
part of your build process, analogous to minifying CSS or JavaScript
for web delivery.
The Saxon-JS APIs that form the heart of the solution are
ixsl:page() and xsl:result-document.
Execution of the stylesheet will begin in the initial template.
The HTML page isn’t the global context item because the HTML data
model and the XPath Data Model are not exactly the same. Instead, the
ixsl:page() function is used to access the HTML page.
This function returns an object which “shims up” the HTML page for XDM
access.
With the ixsl:page() document in hand, you have the full power
of XPath and XSLT to query and transform it. But where does the output go?
Enter xsl:result-document. The content of xsl:result-document
is output to the location identified by the href attribute.
In Saxon-JS, this concept has been extended such that if the
href attribute begins with a “#” sign, the output goes into
the browser page in the element with the corresponding ID. This instruction:
adds the “new content” div to the end of the
element with the id “main” in the document. There’s an
extension attribute as well, method. This instruction:
Will change the content of the element with the id “caption”
to “Hello everyone!”.
(Keen observers may have noticed that the table caption on the schedule
at a glance page doesn’t have an ID. That’s ok. We can use
xsl:result-document to update any element on the page. If it doesn’t
have an ID, we can arrange for that element to be the context node and use
href="?." to change it.)
Saxon-JS manages event listeners with special modes. These templates fire when
a user changes the timezone or clicks on the 24 hour clock checkbox:
It’s certainly nice to have access to the XPath datetime
functions! Also noteworthy are a couple of calls to
ixsl:get, another extension function. This one gives you
access to JavaScript properties. And ?. as the
href value for xsl:result-document when you
want to write to the current context node, whether it has an id or
not.
XForms
At a glance: XForms resists some aspects of the task as
specified, but satisfies all the requirements, mostly in pure standard
XForms with some implementation-specified extensions, 800K download
for the XSLTForms implementation of XForms (partly XSLT, partly
Javascript).
XForms is designed as a declarative language for making forms in
web pages. It is XML-based: it uses XPath to identify elements and
specify values, and when it submits data to a server, it does so in
XML. If there is not too much mixed content, XForms can be thought of
as a general tool for making special-purpose XML editors.
Interactivity in existing documents is not its main goal, but its
declarative mechanisms for user interaction make it a useful tool for
such tasks. From an XForms point of view, the core tasks imposed by
the constraints outlined above are, first, to maintain information
about the schedule and modify it in response to user actions, and,
second, to display the current form of that information to the
user.
Since the usual XForms mechanisms for displaying information to
the user do not involve modifying the form’s host document, the parts
of the problem description that prescribe that mechanism for conveying
information to the user don’t fit comfortably in the usual XForms
patterns. The result is that the goal of simplifying comparison by
making all three solutions as similar as possible is in tension with
writing the form in normal idiomatic XForms.
In an XForm, the information displayed to the user comes in part
from the form document itself and in part from XML instance documents which are managed and
manipulated by the form. The form is written in a host document language chosen by the XForms
processor, extended with XForms elements. For this exercise the host language is
XHTML, but XForms can be and has been implemented for other host XML
vocabularies instead. The XForms extensions to the host vocabulary
fall into two classes: elements for specifying the XML document
instances to be worked on and constraints that govern the documents
and their manipulation (xf:model,
xf:instance, xf:bind, and others) on the one
hand, and on the other hand elements for specifying, within the displayable
part of a form, that particular parts of the XML instances documents
should be displayed or made accessible for interaction
(xf:input, xf:output and other controls, as
well as markup for specifying event-handling rules). As may be
inferred, XForms makes a strict distinction between the XML documents
being manipulated and the user interface through which that
manipulation occurs, in a way reminiscent of the Model / View /
Controller idiom.
In XHTML-based XForms, the document instances and constraints
are usually specified within the HTML header, wrapped in an
xf:model element. The interaction controls, by contrast,
are placed at appropriate locations in the HTML body. We’ll start by
showing the XForm implementation of the timezone selector and the
24-hour-clock checkbox, which are straightforward applications of core
XForms ideas, and then other aspects of the problem: how to inject
HTML into the host document of a form, how the timezone calculations
can be implemented in XForms, and a simple application of XForms
facilities for event handling.
Before we do, some notes on the implementation may be in order
and may as well be taken care of now.
First, there are multiple XForms implementations, which vary in
approach. Some have gaps in their coverage of the specification, and
many offer extensions to the spec. This implementation uses
XSLTForms, a well known open source XForms engine which uses the
browser’s XSLT 1.0 engine to transform the user-written XHTML + XForms
source document into pure HTML in the browser, and a Javascript
library to support XForms functionality within the browser. XSLTForms
offers very simple setup — the only thing the form author has
to do is install the stylesheet and Javascript library on their server
and point to the stylesheet from the form. Other implementations use
a different approach, often providing a server-side component which
handles some tasks. The implementation described here mostly uses
standard XForms 1.1 and would look the same whatever XForms engine was
used; when extensions specific to XSLTforms are used, the fact will be
noted.
Second, there are multiple versions of XForms. The form
described here uses XForms 1.1, which became a W3C Recommendation in
2009. Some implementations have also implemented parts of XForms 2.0,
which is currently under development in a W3C community group, but the
authors’ knowledge of XForms has not yet caught up to XForms
2.0; where we are aware that an XForms 2.0 solution to this problem might take a
different form, that fact may be noted.
Third, there are multiple ways to satisfy the given
requirements in XForms. The implementation currently on hand is not
necessarily the best. Some of the more obvious alternative methods
will be mentioned below, but they have not yet been implemented.
Between now and August, some of them may be. The current plan is to
leave the current implementation in place, and put one or more
alternative implementations beside it on the public server. The paper
will be updated accordingly.
Simple user-interaction controls and the ui instance
In this exercise, the direct user interaction is very simple and
rather limited: the user gets a way to select a time zone, and a way
to specify whether a 24-hour clock or a 12-hour clock is to be
used. For selecting one value from a controlled list of values, XForms
uses the xf:select1 control. In rough outline, the
time zone control looks like this:
The basic structure will be familiar to many who have written
forms: the control contains a list of possible values, each with the
label to be used in offering the choice to the user. The list of
options is wrapped in a xf:select1 element, which
describes not the user interface widget to be presented to the user,
but the semantic affect to be achieved: the user should select one
value from among those listed. Whether the display engine uses a
pull-down menu, a fully expanded menu, radio buttons, or some other
mechanism to achieve that effect is not specified in the form or in
the XForms spec. (An appearance attribute with possible
values full, compact, and
minimal can be used to give hints about what would work
best. At that level of abstraction, a form processor working in a
voice browser can try to make appropriate adjustments.) Note that the
xf:label element appears at both levels: one label for
the selection control as a whole and one for each item. By
associating the label structurally with the thing labeled, XForms
makes it easier for software to understand the logical structure of
the form, which is why it provides a better experience for users
employing voice browsers and assistive technologies.
But where is the selected time zone actually stored, once the
user selects one?
XForms 1.0 and 1.1 do not define variables; all controls are
bound to nodes in XML document instances. A common idiom, when the
form needs to keep track of information that is not in any of the XML
documents the form works with, is to add a document instance to
contain that information and allow the form to manage it. In this
case, the information is all concerned with the state of the user
interface, so the instance has been given the name ui. Like the other instances used, the
ui instance appears in the HTML
header, inside the xf:model element, which in outline
looks like this:
As can be seen, the ui instance
has a clock24 attribute on the root
element and a child element named tz,
and a bit more not shown here.
The ref attribute on the xf:select1
control binds the control to the tz
element using the XPath expression instance('ui')/tz.
The XForms function instance() identifies the
outermost element of an instance document; instance('ui')
thus refers to the ui-info element shown in the
model.[1]
The result of the binding is that when the control is used to select a
timezone, the value in the instance document is updated
automatically.
The 24-hour clock checkbox is simpler. It uses the
xf:input element:
If the form knew nothing about the clock24
attribute, this would be roughly equivalent to an HTML
input element of type text. But if we tell
the form that the attribute holds a Boolean value, the form can use a
more appropriate interface like a checkbox to present the value to the
user. This we do with an xf:bind element which
identifies a node in an instance document and identifies its (XSD)
datatype.
The xf:bind element has other uses, some of which
will be seen later.
Injecting HTML into the host document
Because XForms is designed to display and allow user interaction
with nodes in the document instances of the model, and not to modify
the nodes of the host document itself, the parts of the exercise which
specify that various pieces of information be injected into specific
places in the conference program require a little adjustment in
XForms.
The simplest example is perhaps the final item in the requirements list. In the Javascript
and Saxon-JS solutions, an HTML p element in the program
is given HTML content. The element originally looks like this:
<p id="schedlink"></p>
In the browser, it is to be modified to take the following form,
or an equivalent:
<p id="schedlink">
☞ Interactive
<a href="#schedule">schedule at a glance</a>.
</p>
This particular implementation tactic is not open to XForms, but
we can do something conceptually similar. In the XForms solution to
the task, the conference program is modified so that the paragraph in
question contains an xf:output element, which XForms uses
to display data from an instance document.[2]
The mediatype attribute on xf:output
indicates the data format of the data to be displayed. (Operationally,
it says “take this string and write it to the
innerHTML property of the containing element.”)
The mediatype attribute is
particularly helpful for allowing the display of image data, but as
can be seen here it can also be used to inject HTML elements into the
display of the form. The ref attribute links this output
control to the schedlink element in the ui
instance document.
The same pattern is used in the main part of the tabular
display. Since an XForm cannot change elements in its host document,
we need to introduce xf:output elements to show where the
titles and authors of talks need to be displayed. So the
td elements of the framework table are no longer empty.
In the current workflow, for example, the first slot on Monday takes
the following form.
<td rowspan="2" data-slot="Monday/10:00"></td>
In the XForm, it contains XForms elements to specify what needs
to be displayed there:
As may be seen, this complicates the input significantly. A
simpler approach is to use the transform() function, an
extension offered by XSLTforms, which allows an XSLT stylesheet to be
invoked on a specified node in an instance document. Using that
approach, the same slot looks like this:
The xf:output element here essentially says that
the value to be displayed (as HTML) is whatever is produced by
invoking the XSLT 1.0 stylesheet single-item.xsl on the
program instance, with the stylesheet parameter
slot given the string value
Monday/10:00.[3]
The program instance is, as the name suggests, a
document containing authoritative information about the conference
program. As explained above,
that is the HTML document in which the form is embedded. But an XForm
cannot extract data from its host document any more than it can modify
the host document. To make the code just given work, therefore, it is
necessary to make the host document available to the XForm as an instance document, which is done using an
xf:instance element with the appropriate URI.
<xf:instance id="program" src="./Program.xhtml"/>
This is not quite as self-referential as it may look. In
practice, the browser loads two copies of the file: one is the XHTML
page actually displayed by the browser,[4] and the other is an instance document accessible to the
form. Changing the HTML in the instance document does not affect the
document being displayed by the browser, except by means of the
xf:output elements shown above.
An alternative implementation approach would resemble the
Javascript and Saxon-JS examples more closely. In this approach:
The skeleton of the schedule-at-a-glance table is left as
is, without the insertion of xf:output
elements.
When the program is loaded as an instance document, the form
updates the program-event cells in the table (in the instance
document) by locating the talks and injecting the appropriate HTML
into the table elements.
This processing is triggered automatically when the form
processor reports that the instance document has been loaded (by
dispatching the event xforms-model-construct-done
to the model).
When the user selects a new timezone, the resulting time
displays are injected into the table in the instance document
and not (as described below)
in the ui instance.
The schedule table in the form’s host document remains
unchanged and empty, and it is not displayed.
Instead, an xf:output element in the program
displays the schedule table from the instance document.
This approach is probably preferable, since it involves less change to
the current program document structure. At the time this submission
was written, however, it had not yet been implemented, mostly because
it was not the first approach found. (As mentioned already, the idea
of modifying elements in the host document is foreign to normal
practice in XForms. It took a while for the idea described above to
take shape.) A second reason is that it requires a firm grasp of
XForms event handling, which is sometimes elusive.
Timezone calculations
The core idea of this exercise is to make the document
interactive by making it react appropriately to the user’s choice of
timezone. To do this, we:
Replace the literal time indications of the program document
with xf:output elements pointing to elements in an
instance document, so they can be changed dynamically.
Add a slot-times
element to the ui instance document, containing a
slot element for each time slot in the table.
Calculate new values for the slot elements, whenever the
user changes the timezone.
In the XForm version of the program document, the table cells
containing the time for each event take the following general
form. This is the same as the current structure, except for
the addition of the xf:output element.
The id and datetime attributes are
copies of those on the table cell and the time element
inside it. The default attribute holds the original content
of the table cell, in case we ever need it. (We don’t, as it turns
out, but we didn’t know that at first.) The content of the element is
what will ultimately be displayed to the user. The other attributes
are there to hold intermediate values, partly as a way of simplifying
the XPath expressions used in the calculation of the element content,
and partly as a way of dividing the time zone calculation up into
manageable pieces. It may be observed that they are initially blank,
because they are only needed when calculating the time for a new time
zone.
The calculation of the event’s time in the user’s time zone is
handled by a series of xf:bind elements with
calculate attributes. The calculate
attribute is used, as its name suggests, to calculate the value of
some node from other information already available. Its value is an
XPath expression.
In an XForms implementation which had full support for XPath 2.0
functions, most of the intermediate values would not be needed and the
entire process could be handled with a simple call to
adjust-dateTime-to-timezone(), followed by a call to
format-dateTime(). Since the current version of
XSLTforms does not have an XPath 2.0 version of
adjust-dateTime-to-timezone(), we cannot do it this way;
we need to do the timezone arithmetic ourselves.
An alternative approach would be to implement the required XPath
2.0 functions in Javascript and install them as extension functions.
This is possible, but details of how to do it vary among XForms
processors, and it requires using Javascript, which some people try to
avoid when possible.
Since calculating intermediate values by specifying XPath
expressions and storing them in nodes in an XML document is easy to
understand for any XSLT programmer, the xf:bind elements
used to perform the time zone calculations will need hardly any
commentary. But first we need to explain briefly how the calculations
work.
XForms 1.1 specifies a number of functions for the XPath
function library, including a seconds-from-dateTime()
function which converts a dateTime expression into the number of
seconds separating the dateTime indicated from the epoch at
1970-01-01T00:00:00Z.[5] Its opposite number is
seconds-to-dateTime(), which converts an integer number
of seconds into a dateTime in UTC. We will compute the user’s local
time by (1) converting the timezone offset into seconds, (2)
converting the scheduled time of the event into seconds, (3) adding
the two, and (4) converting the resulting number of seconds back into
a dateTime expression.[6]
First, we calculate the timezone offset in seconds by extracting
the number of hours and minutes in the timezone offset value. To hold
these, we add some blank attributes to the tz
element: <tz h="" m="" sign="-" secs="">. The values
to be assigned to those attributes are specified in the following bind
elements.
It may need to be mentioned that the context item for
interpretation of the calculate attribute is the node
whose value is being calculated. Thus the calculation of the
h and m attributes must refer to
.. to fetch the string value of the tz
element.
The last calculation uses the XForms function
if(), which allows conditionals to be expressed inside
XPath 1.0 expressions. It takes three arguments: the condition to be
tested, the result if the condition evaluates to true, and the result
if the condition evaluates to false.
Whenever the value of the tz element changes, all
other values which are known to depend on it are recalculated. The
dependencies expressed in the calculate expressions are
known, and the XForms processor uses the same algorithm as a
spreadsheet processor to decide on the order in which to calculate the
values in any dependency chain. So no effort is needed on the part of
the form author to specify that h and m and
sign need to be calculated before secs. The
processor takes care of all that.
The user-datetime value on any slot is calculated
as described above.
From the user-datetime value, we can extract
several other useful values: the hours-and-minutes part of the
dateTime expression (user-time), the hours part of the
expression on a 24-hour clock (uh24), the hours part on a
12-hour clock (uh12), and finally a Boolean flag to show
whether the clock has moved to the next day or not
(wrapped).
The final calculation of the string to be displayed to the user
is complex enough to be worth explaining in more detail. The core
calculation is: if the clock24 flag is set to
false,[7] then concatenate the 12-hour version of the
hour value with a colon and the minutes value, and add either “am” or
“pm” as appropriate. Otherwise (clock24 is true), just
use the user-time value, which is on a 24-hour
clock.
But the time 12:00 should be displayed as “noon” or as “midday”,
so the code just given is wrapped in a test for 12:00:
if(@user-time = '12:00',
if(instance('ui')/@clock24 = 0
or instance('ui')/@clock24 = 'false',
'noon',
'midday'),
... else-case as above ...
)
A second complication is that if the time is past midnight in
the user’s timezone, we are to add an HTML br element and
the notice “(the next day)”. So the expression just given is wrapped
in a call to concat(), together with a test for the
past-midnight case.
concat(
... time expression as given above ...
,
if(@wrapped = 'true' or @wrapped = 1,
' <br />(the next day)',
'')
)
The final complication is that if the user time is 00:00, then
(a) the time should be displayed as “midnight”, and (b) the phrase
“(the next day)” should not be used, since it would raise more
questions than it answers. The full expression of the final step in
the calculation is thus as follows.
With the help of the xf:bind element, the core
interactivity of the document can be implemented in a purely
declarative way. As can be seen from the first steps in the sequence,
using instance-document nodes to hold intermediate values can help
make the expressions at each step simpler; as can be seen from the
final calculation, intermediate nodes are not strictly necessary, if
one is willing to work with a more complex expression.
Event handling
The final part of the XForms implementation to be explained is
the dependency of the clock24 flag on the timezone.
Whenever the timezone changes, the clock24 flag should be
updated: for time zones in (roughly) the Western Hemisphere,
clock24 should be set false; for timezones in the Eastern
Hemisphere, it should be set true.[8]
We could use xf:bind with a calculate
attribute for this task, but that would effectively prevent the user
from changing the clock24 flag: calculate
expresses an invariant relation between values in the instance
document. We want to change the value, but let the user change it
back.
XForms has an event model similar to (and derived from) that of
the HTML document model as specified in DOM level 2. At various times
during the life of a form, events are
dispatched to appropriate targets to be handled by event handlers.
Form authors can register new event handlers which observe specified
locations in the form and can capture and handle them.
For the task at hand, the simplest thing to do is to listen for
the event xforms-value-changed to be dispatched in
response to a change to the tz element, and to handle the
event by setting the value of the clock24 attribute
appropriately. To do this, it suffices to add two
xf:setvalue elements to the xf:select1
element bound to the tz element, thus:
Here, the ev:event attribute, in the XML Events
namespace, gives the name of the event for which the
xf:setvalue element is listening. (It does not need to
specify which element it is interested in, because its parent select
controller will only receive events related to the tz
element.)
The ref attribute specifies the node whose value is
to be updated. The value attribute specifies the new
value.
In principle, this simple approach should solve the problem.
For reasons thus far unexplained (probably an error somewhere in the
form, or possibly in the development version of the engine), however,
this simple approach does not work. The event is dispatched and
captured, but setting the new value of the clock24
attribute appears to cause an infinite loop. So until the cause of
the problem is found, another approach is needed.
Since the problem appears to be an infinite loop caused by the
event xforms-value-changed being raised by the handler
for that event, one plausible approach is to try to find some other
event to be used as the trigger for resetting the clock24
flag.
When an item in an xf:select1 element is selected,
or de-selected, appropriate events are dispatched to the item. So we
should be able to reach our goal by putting an event handler on each
item in the timezone selection control and making it listen for the
event xforms-select. At the same time, partly to
illustrate the user of user-defined events and partly because it makes
it easier to insert debugging messages into the form to report on what
is going on, the xf:setvalue action will be replaced by
an xf:dispatch action which dispatches an appropriate
user-defined event, whose handler will (possibly among other things)
update the clock24 flag.
So instead of adding one xf:setvalue element as a
child of xf:select1, we add one xf:dispatch
element to each item in the select control. The item for Central
Daylight Time (Chicago), for example, looks like this:
<xf:item>
<xf:label>CDT Chicago, IL, USA (UTC-05:00)</xf:label>
<xf:value>-05:00</xf:value>
<xf:dispatch ev:event="xforms-select"
targetid="m1"
name="bmt:set-12h-clock"/>
</xf:item>
As before, the ev:event attribute specifies which
events this handler listens for. The name attribute give
the name of the event to be dispatched by the handler, and the
targetid gives the ID of the element which is the target
of the event. Here, m1 is the ID assigned by the form
author to the xf:model element, and
bmt:set-12h-clock is a user-defined event (given as a
QName) meaning the clock24 flag should be set
false.[9] A
different event (bmt:set-24h-clock) is dispatched by
handlers for time zones east of Greenwich.
Because the event handlers are now embedded in each item in the
selection control, it is easy to make them follow a more complex
pattern than “west, 12-hour clock; east, 24-hour clock”.
Some sources claim, for example, that most Anglophone countries
normally use 12-hour clocks, and other countries normally use 24-hour
clocks (again, with exceptions and complications not covered here).
If we believe this claim, we can make the selection of Australian or
New Zealand time set the flag to a 12-hour clock, even though they
have a positive timezone offset.
The user-defined events are dispatched to the model, and the
handlers for those events are given as children of the
xf:model element.
<xf:model id="m1">
...
<xf:action ev:event="bmt:set-12h-clock">
<xf:message if="instance('ui')/@debugging = 'true'"
level="modal"
>Clock should change to 12h.</xf:message>
<xf:setvalue ref="instance('ui')/@clock24"
value="'false'" />
<xf:message if="instance('ui')/@debugging = 'true'"
level="modal"
>Check it out! The 24h clock switch <!--
-->should have changed!</xf:message>
</xf:action>
<xf:action ev:event="bmt:set-24h-clock">
<xf:message if="instance('ui')/@debugging = 'true'"
level="modal"
>Clock should change to 24h.</xf:message>
<xf:setvalue ref="instance('ui')/@clock24"
value="'true'" />
<xf:message if="instance('ui')/@debugging = 'true'"
level="modal"
>Did the 24h clock switch?</xf:message>
</xf:action>
</xf:model>
The xf:action element is a wrapper for other XForms
actions. It executes them one at a time in turn. The practical
effect is that the ev:event attribute can be specified
once on the enclosing xf:action and does not need to be
repeated on its children. The xf:message element
displays a message to the user (here displayed in a popup window, as
suggested by level="modal"). The
xf:setvalue element has been seen before.
The if attribute seen here on the
xf:message element specifies an arbitrary condition on
the execution of the action. If the condition holds, the action is
performed, and if not, it is not performed. Here, a Boolean
debugging attribute on the ui-info element
is tested; if debugging is turned on, the messages are displayed, and
otherwise not. Messages can in general display useful information
about the course of events; these messages mostly serve to signal that
the event handler has been activated.
Concluding remarks
We are fortunate enought to have multiple ways to provide simple
interactivity for a document that can use it.
The way requirements are specified can influence what solutions
seem most applicable. They’re described in this paper in terms that
make more sense in JavaScript and Saxon-JS than they do in XForms,
though as we’ve demonstrated, they can certainly be accomplished in
XForms.
Javascript is well established both web browsers and elsewhere;
it possesses a very larger user base and its implementations and
libraries correspondingly benefit from intensive development
efforts. It's not especially declarative, but it can be used easily
for simple tasks like that shown here.
Saxon-JS allows XSLT programmers to leverage their knowledge of
XSLT to provide the required functionality here in a much more
declarative and correspondingly more compact form. The key XSLT 3.0
stylesheet describered here is only 163 lines long (nine templates,
two keys, and an output specification). By providing a comprehensive
XSLT 3.0 implementation in the browser, Saxon-JS brings familiar
design patterns and the XDM’s extensive function library within easy
reach for fully interactive browser applications or server-side
solutions on Node.js.
XForms also allows a highly declarative description of the
desired interaction; its xf:bind constraints allow
dynamic recalculation of document values, and its support for the
event system implemented in Web browsers makes it possible for an
XForm to react to user actions in dramatic ways. And if the
transform() extension function is used, XForms also
allows the browser's built-in XSLT 1.0 processor to be invoked
interactively in ways similar to those of Saxon-JS.
The XForm presented here requires more modification of the
existing HTML document than the Javascript or Saxon-JS solutions of
the task. This is due partly to the more declarative approach taken
by XForms and partly to the fact that the implementor was slow to
hatch the alternative implementation approaches that would have
involved fewer such changes.
We have provided a quick tour of the three sample
implementations of the task; the source code will be available for
inspection, and (assuming we get the continuous-integration pipeline
working) functional versions of all three implementations will be
available for users to experiment with. Ted Nelson's ideas about
hypertext and interactive documents have not, of course, all been
fully achieved. But current technologies bring some of them closer to
realization than they have ever been.
[1] Users of XPath in other contexts may find themselves expecting
instance() to bind to the instance’s document node
instead of to that node’s single element child; a common error for
such users would be to write instance('ui')/ui-info/tz
instead of the correct XPath, in this example.
[2] This may have the advantage of making more obvious to casual
observers that the schedlink paragraph is not a mistake
or tag abuse (as empty paragraphs often are in HTML) but a container
for information to be supplied at display time. New collaborators may
be less likely to delete the element as part of ‘cleaning up the
accumulated cruft’ in the program document.
[3] Some readers will wonder what the third argument of the function
call means. It’s a Boolean flag indicating (if true) that the second
argument contains a string representation of the stylesheet, or (if
false) that the second argument is a URI which must be dereferenced to
get the stylesheet.
The stylesheet could be invoked on the specific ProgramEvent
div needed, but as seen above the XPath expressions
needed to identify the appropriate element are rather long; it’s
simpler to pass the entire program to the stylesheet and find the
appropriate div there.
[4] If we are being pedantic, we should note that what the browser
displays is not the document Program.xhtml, which is an
XHTML document with embedded XForms elements, but the result of
processing that document with the xsltforms.xsl
stylesheet, which translates the XForms elements into elements better
understood by Web browsers. As support for non-HTML elements has
improved, the XSLTForms translation of XForms elements has
changed.
[5] It is thus not at
all similar to the XPath 2.0 function defined later under the same
(local) name.
[6] The resulting dateTime expression will use timezone Z (UTC
itself), but it will not be the UTC
time of the event. It will be the UTC time at the moment when a
UTC clock shows the same time as the user’s local clocks at the
time the event takes place.
The reader can either accept this fact as a kind of black-box
magic, or an example may help make it clear. Suppose we wish to
calculate the JST equivalent for the 10:00am morning slot. The event
time of 10:00am is given in a timezone with offset -04:00; the
corresponding UTC time is 14:00 (10:00 + 04:00). Japanese Standard
Time has an offset from UTC of 09:00. If we add 9 hours to the UTC
time of 14:00, we get a time of 23:00. Whether we interpret this
23:00 as (a) the time shown by a clock in UTC timezone Z, 9 hours
after the event time of 14:00 UTC, or (b) the time shown by a clock in
JST at 14:00Z, is immaterial. The expression returned by the
seconds-to-dateTime() function assumes interpretation
(a), but we can just extract the hours and minutes fields and apply
interpretation (b). And we do.
[7] The slightly awkward form of the test, checking for both 0 and
“false”, is an attempt to protect against any flaws in
the form processor’s implementation of the xsd:boolean
datatype. Trying to meet a deadline, it was easier to write this
awkward expression than to establish with complete certainty whether
there is some
simpler formulation of the test that will always work.
[8] This reflects the
mostly true generalization that conference programs in the
U.S. normally use 12-hour times, and conference programs in the rest
of the world are more likely to use 24-hour times. The reality, of
course, is more complex. But this gives us the excuse to require
another dynamic change in the document.
[9] The sequence of events will thus be, roughly: the user
selects CDT as the value of the timezone (element tz in
instance document ui); the forms engine dispatches
event xforms-select to the matching xf:item
in the selection control; the listener (here the
xf:item element) detects the event and invokes any
handlers it has for the event (here, the xf:dispatch
element); the xf:dispatch element dispatches a
bmt:set-12h-clock event to the element with ID
m1, which is the xf:model element in
the header; the model element detects the event and activates
whatever handlers it has for that event, which in this case
is the xf:action event described below.