How to cite this paper
Biezunski, Michel. “Moving sands: Adventures in XML e-book-land.” Presented at Balisage: The Markup Conference 2012, Montréal, Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). https://doi.org/10.4242/BalisageVol8.Biezunski01.
Balisage: The Markup Conference 2012
August 7 - 10, 2012
Balisage Paper: Moving sands: Adventures in XML e-book-land
From XML to XML. Is that enough?
Michel Biezunski
Consultant in Information Technology, Specialized in Publishing Applications,
Creator of the Topic Maps standard.
Copyright © 2012 Michel Biezunski
Abstract
Converting XML to e-books supported by multiple devices is a significant task. Deficiencies
in the EPUB standard, undocumented bugs in some devices, and issues such as fixed
layout vs. flowable content add complexity. Fortunately, the first experts to
experience weird effects are sharing their findings. This paper continues this
flow
of information by describing experience in a US IRS project. The first step is
scheduled for completion in July 2012. When fully deployed, this project will
target
multiple devices, some announced but not yet available, which differ in size,
features, the formats they can read, and how much of the EPUB standard they
implement, if any. This paper analyzes issues that have been encountered while
working on automatic production of e-books using XML technologies and also suggests
a new vision to ensure a sustainable, minimally intrusive, long term solution
for
producing e-books on various devices. Relying on standards is necessary but not
sufficient and some issues that are at the core of structured markup are coming
back.: the difference between structure and presentation has never been so acute.
This project reminds us that there is no short circuit to a sound approach of
analysis not only of how information looks, but also of what it means.
Table of Contents
- Introduction — E-books go mainstream
- XML-based standards
-
- EPUB
- Amazon
- Rendering remains a challenge
- IRS
- Moving Forward
- References
Note
Note. In this article, the word “e-reader” is used to
designate a reading device, for example a tablet, while the word “reader”
(except in the phrase “e-ink reader”) is used to designate a human being
performing the action of reading.
Introduction — E-books go mainstream
Contemporary publishers of books, journals, and magazines are moving away from a publishing
model based on printed pages. Printed books are industrial products that, once
written,
edited and formatted, undergo several steps: printing, binding, storing, distributing,
displaying, selling. The process of selling e-books is much simpler: once composed,
an
e-book can be put on a server in an online store, where customers can buy it
instantaneously. e-books will never be “out-of-print”, nor will e-books need to
be
destroyed because they don’t sell enough. The term “e-book” encompasses content
beyond
text and graphics. Audio and video can be included, as well as software components
using
scripts.
There are many devices that let us read e-books, including e-ink readers, tablets,
smart
phones, and computers. E-ink readers are best fit for reading plain books, tablets
for
magazines and audio/video content. Smart phones, even though they don’t offer the
same
reading “real estate”, can be used to read the same material. And computers can
do all,
but they don’t necessarily offer the comfortable experience that users want for
leisurely reading. e-books can be downloaded in their entirety and therefore can
be read
off-line.
The e-book business has become big business. In addition to traditional booksellers
such as
Barnes & Noble, most book publishers, and, increasingly, news-“papers”, key players
now include Apple, Google, Microsoft, Adobe, Samsung, and Amazon. Actors in the
sector
are both information technology companies and content owners.
E-book formats are basically a set of compressed XML documents, where the main content
is stored in XHTML. The EPUB delivery standard has become the prevalent means of
document delivery. Amazon is an exception and has adopted its own formats —
MOBI/AZW and now KF8. Use of XML in e-books is more tightly controlled than it
is in web
browsers. In particular, web browsers tolerate XML-like HTML in which validity
to a DTD
or a schema is almost irrelevant. In contrast, e-books must be entirely conformant
to
XML schemas, including XHTML for the main content. One would think that this rigor
would
guarantee that e-books will properly display on most e-readers and eliminate tweaking
of
content according to the e-reader the way web pages must be tweaked according to
the
browser. Unfortunately, this is not the case. The primary goal if this article
is to
show that XML is insufficient to guarantee smooth interoperability for rendering.
A
secondary goal is to remind us not to underestimate the distinction between a logical
structure and its presentation, a distinction that is at the core of structured
markup.
XML-based standards
EPUB
Why is this paragraph a comment?
The EPUB standards are a set of XML vocabularies including the document content (XHTML),
the “manifest”, declaring the document entities present in one e-book, the “table
of
contents” for navigating, the front cover page, and a mimetype declaration. Fine
tuning the formatting is achieved through the use of cascading style sheets (CSS).
The XML documents are compressed together with the style sheets, the images, any
videos, and when applicable, Javascript programs. A compressed package is called
an
“EPUB” document.
EPUB is created by an industry consortium named IDPF, the International Digital Publishing
Forum. The two versions in use of EPUB are 2.0.1 and 3.0. EPUB 2.0.1 was approved
in May, 2010, and EPUB 3.0 in October, 2011. In practice, they are augmented with
proprietary features that are only available on one particular device; the most famous
is the Apple extension to CSS for iPads. The conformance level to a given version
of a standard also varies.
EPUB 2.0.1 comprises three specifications: the Open Publication Structure (OPS), the
Open Packaging Format (OPF), and the Open Container Format (OCF). The OPS is based
on XHTML 1.1 and DTBook (DAISY Digital Talking Book). The style language is a required
subset of CSS 2. It allows the inclusion of so-called “XML Islands” which are any
complete XML document with a structure different from DTBook or XHTML. This feature
was not widely implemented and “out-of-line XML islands” have been dropped in EPUB
3.0.
EPUB 3.0 contains the following specifications: Publications 3.0, Content Documents
3.0,
Open Container Format 3.0, Media Overlays 3.0, and EPUB Canonical Fragment
Identifiers. The main difference between EPUB 2.0.1 and EPUB 3.0 is that the latter
contains more provisions for incorporating diverse media, such as audio and video,
and parallels the evolution of HTML towards XHTML 5. EPUB3 was first implemented
in
Apple’s iBooks, Readium, and Azari. Recent progress has been reported for other
devices. EPUB 3 is gaining a lot of momentum. Its adoption is accelerating
noticeably in Japan due to its enhanced capabilities for global language support,
particularly handling writing directions including vertical writing. Furthermore,
EPUB 3 books can be read on an EPUB 2 reader, providing an EPUB 2.0.1 table of
contents is provided.
Support for other EPUB 3 features varies. There are only a few signs of implementation
of
video and audio support and pragmatists in the industry prefer to “wait and see”
before guaranteeing that it actually delivers on its promises. Although Javascript
is enabled, its support is being largely discarded because of potential security
breaches. It may be possible to move forward by restricting Javascript to a set
of
industry-supported scripts that could be implemented in most e-readers. However,
SVG
and MathML are now incorporated, and there seems to be more traction on that front.
Specific extensions to CSS3 for e-books are also defined in the
specification.
In addition to the general framework the EPUB 3 specification provides, satellite
specifications are now being defined to offer support for multimedia and indexes.
An
important one will support fixed layout. Fixed layout contrasts to the flowable
content addressed by the base EPUB standard but is needed for documents such as
comic books, illustrated books, children books, knitting patterns, textbooks,
and
magazines. A single publication should include both flowable content and fixed
layout, but this requirement seems to be unavailable at this point. Fixed layout
must support multiple columns, and a fixed height to width ratio in different
page
sizes. Rendering fixed layout on a variety of e-readers is a challenging
task.
Amazon
Amazon e-book formats differ from the EPUB standards and are optimized for rendering
on
Kindle devices. They also contain provisions for support of audio and video directly
inspired by HTML 5. The main Amazon formats used today are MOBI/AZW and KF8. MOBI
is
an open standard that was bought by Amazon. AZW was developed by Amazon specifically
for the Kinder e=ink reader and is DRM (Digital Rights Management)- restricted
and
is locked to a specific device
KF8 uses a subset of HTML5 and CSS3, and is designed to be backward-compatible
with
MOBI.
Rendering remains a challenge
There are two kinds of e-books: flowable content and fixed layout e-books. Flowable
content
can be reformatted on the fly, depending on the device and the user configuration.
Furthermore, some devices automatically redisplay the content depending on the
orientation in which they are held. Under these conditions, the traditional notion
of a
page no longer holds. This obvious difference from the world of published documents
consisting of the well-defined pages of paper or PDF, also distinguishes e-books
from
web pages. Even if browser windows can be reformatted using user-defined local
settings,
the notion of a web page is pervasive. And fragment identifiers help locate the
exact
location of any information on a page, regardless of rendering. This location addressing
facility is not currently available on an e=book. Technically, flowable content
in an
e-book is displayed by pages, which depend on the local conditions, and is divided
into
units which correspond more or less to the chapters of a traditional printed book.
Each
of these units may correspond to a different physical XHTML file, although the
page-break property of a CSS may separate one file into multiple pages. However,
an
actual page of flowable content on an e-reader is rendered purely dynamically and
depends on the conditions set for it. Some e-readers assign page numbers, but these
are
modified each time there is a change in the display style. Thus, e-books have no
notion
of an absolute page number nor of a specific URL for a chapter. (While the file
structure within the e-book could be used to locate a particular page, the file
structure is only visible when the e-book is decompressed, information that e-book
developers but not readers are expected to see.)
The second category of e-books are those for which the layout is partially if not
strictly
fixed. The contents of each page is well defined, and can not be reflowed without
constraints. For example, magazines, comic books, children books, art books, and
text
books often need to link tightly illustrations, audio, or video content to surrounding
text. Many pages must be presented in a way that does not dramatically alter the
aesthetics of the publication and the meaning of the content. Some e-books also
contain
software which provide a more interactive experience to the reader, with supplementary
constraints about what needs to be displayed under various conditions.
A mechanism called media queries allows an e-book to contain conditional statements,
but is
insufficient to guarantee that the resulting EPUB will be usable on a variety of
e-readers. Thus, the diversity of e-readers, as well as the existence of multiple
standards, producing several output versions of many e-books is practically impossible
to avoid.
Professionals also advice of not trusting the result displayed when the rendering
of
one e-reader is emulated on another one. For example, a Kindle application on an
iPad,
Android tablet, or on a smart phone doesn't reflect in all details the rendering
obtained on an actual Kindle device.
Implementation of software for rendering EPUB has specific issues on each e-reader.
For example, there is an e-reader-specific limit on the size of acceptable files.
As a result, a Kindle 3 may freeze while trying to display a document which an iPad
can absorb without difficulty.
The physical characteristics of e-readers vary greatly and have an effect on the way
they
are handling documents. For example, e-ink Kindles and Nooks truncate tables that
don’t
fit their screens. The rightmost part of a wide table is simply not displayed,
typically
after the fourth column. In landscape mode the device may display more than in
portrait
mode, but there still may not be sufficient space to render the full table even
when the
font is set to is minimal size. Readers may well be surprised by this loss of
information, especially since it contrasts to their experience with the web pages
and
PDF documents they are accustomed to magnifying or reducing as desired.
Should rumors that Apple is preparing a 7-inch iPad prove true, one would expect that
it will
render the same information that can be displayed on the current versions. But
so far,
that remains wishful thinking, because the actual size is often used as a guide
for an
optimized rendition.
As far as flowable content is concerned, the font size and margins can be customized
by every user. If the layout of the page is not fluid enough, some unpredictable
things
may occur, making the text meaningless. This is for example the case with presentation
in which horizontal layout is critical. If a line wraps where it was not designed
to do,
then the content becomes unreadable. Some e-readers dynamically calculate the page
numbers, and those numbers will change dynamically, according to the font size
or to
whether the device is hold vertically or horizontally. Others only assign “location
values”. Some users want to be able to read the documents on their smart phones,
which
offer a much smaller space than the tablets or e-ink readers.
Another problem is the nature of the documents accepted in an EPUB. Text and images
must be treated differently. The rendition of graphics may vary according to the e-reader.
For example, some e-readers, such as the iPad, enable full-screen display of graphics,
and allow the reader to enlarge even a full-screen image by pinching. Other e-readers,
for example the Kindle 3, do not offer this possibility. Some graphics may therefore
be unreadable on some e-readers. The new iPad, with the Retina display, has a much
higher resolution than the other models, and therefore demands images with greater
resolution. These images will not fit on many other e-readers, and may slow them to
a halt.
So far, there is no universal way of guaranteeing accessibility and each e-reader
handles
accessibility requirements (such as Section 508 compliance) differently. Some e-readers
include software that enables text to be read aloud. Others do not. EPUB3 provides
guidelines for accessible documents, but it has not been implemented yet, and it
is too
early to say which features are going to be actually used.
By definition, footnotes are displayed in books at the bottom of a page. Since e-books
have
no notion of a fixed page in flowable content, they have no analogous location.
Each
e-book or e-book series could specify a location for displaying footnotes (such
as the
end of a chapter or of the entire book). Such conventions blur any logical distinction
between footnotes and endnotes. Another possibility is for footnotes to pop up
when the
reader moves the cursor over a reference to a footnote.
Bugs in some e-reader software causes improper rendering of some HTML code. For example,
a notorious iPad bug, known as the “span bug”, ignores any font change on the HTML
<span> element. If several fonts are needed within one iPad document the HTML <span>
element should not be used to specify fonts. Apple may well solve this bug in a future
release of iBooks software and when that occurs this restriction will not be needed.
On other platforms, this bug does not occur, and the <span> element can reliably used
for font changes.
Another category of problems results from initial settings that vary among devices.
Relying
on the default values can therefore produce undesirable differences that would
be
eliminated if all formatting requirements were explicitly stated. For example,
the Kobo
e-reader centers some headings that other e-readers left align. Amazon uses different
default values in MOBI than in KF8. Using Amazon’s conversion tool to upgrade can
therefore break the alignment, indentation, and other formatting features. These
issues
can be solved if all formatting requirements are explicitly stated.
The tables of contents are encoded according to a given structure, which differs between
EPUB2 and EPUB3. The EPUB3 format uses more XHTML tags. Multi-level tables of contents
can be created, but it is not always possible to render them as folded. For example,
on the iPad, the tables of contents are displayed, with all levels, each sublevel
being indented appropriately. On e-readers software such as Sigil, by Google, the
tables of contents are “folded”. At first, one sees only the first level elements,
and they can be unfolded to reveal the next level headers, etc.
In general, the information on these details is scarce, and much trial and error is
required to obtain satisfactory results. Fortunately, the most prominent consultants
in the field share information about their successes and failures through web pages
and blogs; such resources also provide a forum for exchanging pertinent questions
and answers. One such blog, maintained by Liz Castro, is called “Pigs, Gourds and
Wikis” (http://www.pigsgourdsandwikis.com/) and includes tweets posted at #eprdctn.
IRS
The US Internal Revenue Service publishes documents that help taxpayers compute their
taxes.
These include Taxpayer Information Publications (TIP) as well as instructions for
forms
and the associated schedules. These documents are updated each year; they are available
in print and on the Web (in PDF, HTML, and also as a topic map called “Tax Map”).
For decades, the IRS has served as a laboratory of innovation for testing new
technology in information delivery. It is now launching its first e-books. The
IRS is as
agnostic as possible in hardware choice and gives preference to formal and de facto
standards over proprietary formats. As a result, the IRS plans to make its e-books
available on most e-readers currently on the market. However, it has encountered
the
same issues as those faced by other publishes of e-books for multiple platforms.
As a
starting point, the work has therefore focused on a single platform, the iPad.
Extensions to other platforms are planned to follow quickly.
Seen from the perspective of XML application development, the e-book project seems
straightforward. XML source files conforming to a limited number of schemas must
be
transformed into another XML structure (namely, XHTML and the other EPUB structures).
XSLT is a natural method for transforming this material. The XSLT process includes
slicing the sources into chapter-like divisions that are small enough to prevent
e-readers from crashing, converting static text to cross-references and links to
the
Web, tagging footnotes and graphics, building the table of contents, and (in a
later
stage of the project) creating an index.
The road has been bumpier than expected. Limiting initial output to a single e-reader
reduced the number of e-reader-dependent quirks that had to be addressed for a proof-of-concept
project, but there was no way to plan for undocumented idiosyncrasies. An easy workaround
to the <span> bug mentioned above is to use HTML elements such as <cite>, or the rarely-used
<samp> element instead of <span>.
The problem with this kind of bug is that it may just be temporary. Perhaps Apple
will issue a software upgrade that will solve it. Or not. Or perhaps the upgrade will
introduce a new side effect that didn’t exist before. And there is no way to know,
the documentation is almost non-existent, and the only source of information is the
community of practitioners that exchange this kind of information on specialized forums.
And of course, the problem highlighted here is just one of them, and it’s only for
the iPad. Other e-readers have their own issues, and these issues are only known at
the time of testing.
To provide products that work reliably on a variety of environments, a project must
establish the least common denominator among the formatting requirements, and minimize
use of platform-specific features. This recommendation is easier to follow for
flowable
content than for fixed layout, where interoperability between e-readers is even
more
challenging. Apple, for example, has released software that allows an author to
create
an e-book that may contain text, images, video, etc., and may be beautifully laid
out.
However, the result is only usable on Apple platforms. Apple’s commercial motivation
for
this product is obvious. Until vendors have greater motivation to support
platform-independent standards, the technical features that would ensure good quality
rendering on various devices are pretty daunting, and we are clearly not there
yet.
Moving Forward
Publishing high quality e-books on multiple e-readers requires quality assurance on
every
platform. The scope of this tedious process is open-ended, since new e-readers
appear
regularly, although others are discarded. The work area where testing is performed
can
resemble a toy store with shelves containing numerous devices. Media queries and
similar
mechanisms for anticipating how e-books will appear on various devices can bring
interesting issues to light, but do not substitute for real tests on real devices.
Testing is a learning process for accumulating the experience for tracking,
categorizing, and fixing in the most possible generic way any detected problems.
Long term solutions require conceptual, abstract thinking about the e-book environment
and
nature of the issues involve.
There are several kinds of traditional books, not all consisting of flowable content
or
fixed layout. Flowable content includes novels, essays, academic books, textbooks,
reference books, technical manuals, etc. Most books have tables of contents, some
have
indexes, cross-references and references to parts of other books. Some are illustrated,
others are not. In addition to the graphics and photographs, there can be structures
with complex layout, such as tables. Tables are sometimes automatically composed
from a
source in XML (for example, the content may be provided in DocBook). In addition,
e-books can be interactive and can contain audio and video clips. Almost every
item
listed here poses a challenge to e-book publishing.
A novel is typically easy to publish as an e-book. Most e-reader devices support tables
of
contents. In principle, such mechanisms could work well as well for essays and
academic
books, but there is a supplementary problem here. How should academic citations
to part
of an e-book be made? Literate references to books typically identify the cited
book and
the relevant pages. Since e-books have no fixed page numbers, new conventions for
academic citations must be developed. Publishers advise authors of academic material
to
use numbered sections, and create references by section number. If sections are
automatically numbered, however, section numbers can easily become obsolete in
a later
edition of a work. Perhaps the URI reference system that is pervasive on the Web
can be
used for citations to content within an e-book. Such conventions will only be useful
if
widely adopted.
Few current e-books include an index because little e-book publishing software makes
it easy
to build an index. While the XML community is experienced in building indexes from
“invisible tags” stored within an XML source document, knowledge of XSLT in the
publishing industry is quite limited. Publishers are much more likely to tweak
a CSS
stylesheet to create desired formatting than to consider the design of XML source
material.
Some presentational effects, such as multi-column layout, are dropped in flowable
content
within the e-book edition of a book. Such degradation of original formatting may
be an
aesthetic loss mourned by purists, but is usually not a tremendous loss of information
content.
It is a completely different story with tables. As previously noted, tables as represented
in printed documents or web pages do not display well in the restricted real estate
of
an e-reader screen. Even when manufacturers offer the ability to zoom in and out,
it is
not convenient to view tabular information in limited space. Worse, as previously
discussed, e-readers merely crop the tables to fit the maximum width of the device.
The
amount of information available is proportional to the font size, and depends on
whether
the device is held in portrait or landscape mode. This problem is much more acute
with
e-books than traditional publishing because paper-based publishing relied on an
implicit
notion of using paper of a reasonable width for the content. As more and more material
has moved to computer screens, readers change font size or scroll to view tabular
material successfully in the vast majority of cases. Now that we are reading on
smaller
devices, including smart phones, the situation has changed, and the notion of adjustment
in the width becomes crucial.
These issues take us back to the early days of logical markup, and the difference
between semantic and presentational structures. Practitioners in the XML community,
and earlier, in the SGML community, have been aware of these issues from the beginning
of generic markup. Tables survive because they are a compact and visually compelling
presentation for some kind of information. They will continue to be used as long as
there are methods for fitting them on a display of “reasonable width”.
Alternatively, some tables could be presented as lists. For example, a customer list
in
which the first column contains the first name, the second column the last name,
and the
third column an email address, could very well be represented as an indented list
that
can be more easily rendered in an e-book. The same information could still be rendered
as a table for print, PDF, or web pages.
We also know that tables—and this is especially visible when managed within spreadsheet
software—are views into a database. This model considers each row to be a record and
each cell a field. A header row expresses the database schema. A cell therefore is
the response to a query about a specific property. In database terminology, the word
“table” is actually used interchangeably for “relation”. This database model suggests
the possibility of software that offers a form-like user interface for searching for
information within a table.
List and database solutions for rendering tabular material require document sources
that use semantic markup rather than visual markup to store this type of information.
Other navigational devices need to be adjusted for e-book publishing: footnotes, running
headers, indexes, can be rendered a number of ways, and there are provisions in
the EPUB
3.0 specification (or in one of its satellites currently under development) to
handle
these features in interoperable ways. Such features will remain a factor of
differentiation among e-reader device manufacturers, because supporting these features
are not required.
As we still are in the infancy of the e-book industry, we are in the period of the
“Whoa!”
effect. It’s great to be able to carry thousands of books on a device for which
the
battery life can span over a month or to exchange bookmarks with “friends” on a
social
network. We still accept that e-books do not yet have all the features we have
in
printed books. As readers become more familiar with e-books, they will demand more
of
the sophistication and quality they deserve. Serious readers will require the
nitty-gritty details that have evolved over centuries of book publishing to become
standard in print. How quickly this demand will develop is still difficult to predict.
The XML community, especially publishers within the XML community, clearly will
have a
big part in the future of e-books.