Flynn, Peter. “Your Standard Average Document Grammar: just not your average standard.” Presented at Balisage: The Markup Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). https://doi.org/10.4242/BalisageVol19.Flynn01.
Balisage: The Markup Conference 2017 August 1 - 4, 2017
Balisage Paper: Your Standard Average Document Grammar
just not your average standard
Peter Flynn
Peter Flynn manages the Academic and Collaborative
Technologies Group in IT Services at University College
Cork, Ireland. He trained at the London College of Printing
and did his MA in computerized planning at Central London
Poly (now the University of Westminster). He worked in the
UK for the Printing and Publishing Industry Training Board
as a DP Manager and for United Information Services of
Kansas as IT consultant before joining UCC as Project
Manager for academic and research computing. In 1990
he installed Ireland’s first Web server and now concentrates
on academic and research publishing support. He has been
Secretary of the TeX Users Group, Deputy Director for
Ireland of EARN, and a member both of the IETF Working Group
on HTML and of the W3C XML SIG; and he has published books
on HTML, SGML/XML, and LaTeX. Peter also runs the markup
and typesetting consultancy Silmaril, and is editor of the
XML FAQ as well as an irregular contributor to conferences
and journals in electronic publishing, markup, and
Humanities computing, and a regular speaker and session
chair at the XML SummerSchool in Oxford. He completed a
late-life PhD in User
Interfaces to Structured Documents with the Human
Factors Research Group in Applied Psychology in UCC. He
maintains a fairly random technical blog at http://blogs.silmaril.ie/peter
Most document XML applications adopt or adapt one of a
small number of well-known public document grammars. These are
basically all expressions of a shared and accepted fundamental
logical view of document structure. There are variants and
outliers and long tails, but despite differences in detail,
they form a Standard Average Document Grammar,
which lets us describe the overwhelming majority of text
documents.
The grammar includes a hierarchy of nested, headed
sections; arbitrarily recurring groups of common
components; links between places in and out of the document;
signifiers of importance, relevance, or sequence;
and restrictions on what may and may not occur in different
places. The modifications and customizations users make to
these document grammars are informative both in their variety
and their similarity, and in the fact that they all fit
relatively comfortably within the Standard Average Document
Grammar.
There appears to be a set of structural features common to
the majority of text documents that have become a part of the
way the human race has recorded textual information over the
millenia. As I have shown elsewhere, it is apparent that from
clay tablets to PDFs, we have slowly evolved various models of
a document that have many features in common
[Flynn14, Ch.1]. Part of this may be due to
the need — until recently — to agree upon a generalized physical
representation for the document that others would recognize, but
this could not have been done without there being a mental model
of the document to start from. It is not known if anyone
actually sat down at the dawn of writing, or even at the dawn of
printing, to decide that certain features are what makes up a
document,[1] but we can see evidence of such decisions in the
design of commands and structures in older markup systems such
as RUNOFF, Scribe, [S]GML, LaTeX, and others which inherit
their paradigms.
Strictly speaking, a document grammar (in the case of XML,
for example, a DTD, W3C Schema, or RNG Schema) is a set of
definitions and declarations for modeling a class of text
documents. It defines the components of the documents they
describe, as well as the rules governing their presence in the
documents of that class [Tekli11] — a similar
application has been noted in linguistics [Power03]. However, we are more concerned here with
the document components themselves, and with the rules governing
their arrangement, than with the expressive power of the
particular grammatical notation used to describe them.
Core features
In comparing the features of text document markup
vocabularies for earlier research, the existence of a core set
of features became evident because it recurred in one form or
another in virtually every system examined. Not only were the
functions replicated, but the associations between them, and
the rules under which they operated, were extremely similar.
These features have been observed and discussed many times,
and are used as examples in our theories of document grammars,
but they do not appear to have been codified across multiple
instances of their occurrence. To test the feasibility of
codification, an experimental Table of the
Elements fragment was constructed from a small
sample of document types of varying age and popularity [Table I], looking principally for obvious evidence
of common requirements such as metadata (principally the
document identity), hierarchical structure, non-hierarchical
categorization, and object reference. Although incomplete and
unrefined, the table showed the existence of some common
features, as well as numerous gaps.
Table I
(Non-Periodic) Table of the Elements
from selected XML grammars (LaTeX has been included for
comparison)
Feature
HTML
DocBook
DITA
TEI
12083
JATS
Briefing
Bulletin
LaTeX
title
title
title
title
title
title
article-title
title
title
\title
author
author
author
author
author
briefeditors
author
\author
summary
abstract
shortdesc
abstract
abstract
abstract
abstract
abstract
preface
preface
front
preface
\frontmatter
part
part
section
div|div0
part
sec
\part
chapter
h1
chapter
section
div|div1
chapter
sec
report
\chapter
section
h2
sect1
section
div|div2
section
sec
story
section
\section
subsection
h3
sect2
section
div|div3
subsect1
level3
sub.section
\subsection
subsubsection
h4
sect3
section
div|div4
subsect2
level4
\subsubsection
appendix
appendix
appendix
afterwrd
\appendix
bibliography
bibliography
listBibl
biblist
Ref-list
biblist
thebibliography
index
index
index
index
index
glossary
glossary
glossary
glossary
glossary
glosslist
glossary
paragraph
p
para
p
p
p
p
para
ptxt
\par
quotation
blockquote
blockquote
lq
quote
bq
block.quote
quotation
numbered list
ol
orderedlist
ol
list
list
list
numberlist
list
enumerate
bulleted list
ul
itemizedlist
ul
list
list
list
bulletlist
itemize
dictionary list
dl
variablelist
dl
list
deflist
list
defnlist
description
figure
img
figure
fig
figure
fig
fig
illus
figure
figure
table
table
table
table
table
table
table
table
table
table
mathematics
equation
formula
formula
mml:math
formula
$$
cross-reference
a
xref
link
ref
secref
xref
eiro.ref
\ref
bibliographic reference
a
biblioref
cite
ref
citeref
biblio
\cite
external link
a
link
xref
ptr
weblink
external.ref
\hyperref
emphasis
em
emphasis
emph
emph
emph
emph1
\emph
language
lang
foreignphrase
foreign
language.phrase
\selectlanguage
From this data the features of a common grammar begin to
emerge:
document models provide for self-labelling: in
concrete terms, titles, authors, and other [meta]data
within the document;
the models provide for an ordered hierarchical
division of the information;
within those divisions, there is a non-hierarchical
sequence of text-bearing components (and some for
graphical content);
at the level of the discourse itself (text), there may
be interspersed identifiers which describe relationships
between objects or which signify some special quality to
be observed, and which may themselves contain further
text, identifiers, or signifiers.
I have so far avoided assigning the
conventional labels of markup theory or the names used in any
specific system to these features (element, attribute,
environment, etc; or title, para, or
sect1, etc). However, for practicality and convenience
in discussion, the grouping of the features in Table I corresponds with terminology commonly used:
metadata, hierarchy, pool, and flow.[2]
Standard Average?
The human race seems to like to categorize things. We do
it on the basis of perception (loud|quiet, bright|dark,
hot|cold), cognition (cheap|expensive, fast|slow, wet|dry),
and even guesswork (bull|bear [market]) — ultimately it’s a
survival trait (dangerous|harmless) [Lakoff90]. More experienced humans have more
points on their scales:
flooded|sodden|wet|damp|moist|dry|bone-dry|parched|desert,
because it’s more useful that way. It’s also possible to
measure on a sliding scale, for example 100%=flooded and
0%=desert, or any point in-between. But as most of us live
neither under water nor in a desert, neither in perpetual
daylight nor perpetual night, neither on top of a mountain nor
at the bottom of a canyon, there is a tendency for most humans
to have an affinity for somewhere between the extremes. This
clustering, or central tendency, is a hallmark of natural
behavior, and has been known since antiquity, although
formalized in statistics only since the late 1600s.[3]
Average therefore seems to be an
appropriate way to describe the clustering observed in the way
in which document are constructed — at least in SGML/XML
and LaTeX — even if it is not used in the strictly
mathematical sense required by statistics. There is a cluster
of recognizable types of information around the title and
author; another around the hierarchy, another around the pool,
and around the flow.
The standards we use daily, whether formalized by ISO or
just accepted as patterns of behavior, have been formed from a
similar principle to the average: a degree of genericness or
commonality has been seen to be useful as a model because it
is representative or descriptive of the whole. In effect, we
are unconsciously applying the duck test of
abductive reasoning: if it [repeatedly] looks useful, it
probably is.
The suggested term Standard Average
Document Grammar is derived from (but entirely
unassociated with) the linguistic term Standard Average
European coined in the late 1930s to describe a set
of grammatical similarities which characterize Indo-European
languages.[4] The term Standard Average on its
own has to some extent become a portmanteau phrase in everyday
language for acceptably common behaviour.[5]
Feature set
The set of features for the derived grammar is expanded
below, but we should first deal with what it does
not describe.
There are many classes of document structures that do not or
cannot follow a generic model but have their own: those which
are too short to exhibit much in the way of structure; those
which are intended as ephemeral or singular; and those which by
convention of their nature require a specialist structure.
But even amongst these, some of the
features may be present, even if (for example) in the metadata
rather than the text body.
The point of standard and
average as described above is that such a grammar
should be able to cover enough of the spectrum to be a useful
pattern or model in a majority of cases, and that this
fit should be generally accepted by the user
community. There will nevertheless be some specific factors
which must be considered in testing this acceptance:
there must be broad agreement between users on
semantics;
not all features have to be present: there can be rules
about requirement and optionality;
if features are present, then they
must be used in the manner generally accepted;
Naming is also important, and has been the
topic of much discussion over the years on XML-related mailing
lists. Not only are names a prerequisite of any concrete
instantiation, but we need them informally as handles during
discussion, so they may as well be meaningful in the language of
that discussion. This raises other linguistic and cultural
questions, but in essence we are simply requiring agreement that
the feature we refer to as a title is in fact the
title of a document (or section, or whatever) as commonly
understood, and not a mosquito or a bottle of beer.
Because of the traditional separation of concerns between
logical and physical in dealing with document markup, the visual
appearance of a grammatical feature is not generally relevant.
However, for the purposes of usability and — as here —
illustration, when features are given an appearance, it is
common to use one of the widely-accepted styles.
The salient features of a Standard Average Document Grammar
are summarized in Figure 1 to Figure 4. There may be disagreement over the presence
or absence of some specifics, but enough of these appear to
occur in enough instances of otherwise disparate types of
document to make it worth inclusion.
Figure 1: Identification
Naming and explaining
Title
Subtitle
Author
Summary
The features in Figure 1 are often regarded
as metadata, as they typically stand outside the running text.
It is nevertheless seems to be accepted as part of the function
of the grammar that it should label the document (title), link
to an authority outside the document (author), and provide an
overview or synopsis (summary).
Figure 2: Formation
Hierarchical structure
Preamble
Major Division
Subdivision
Minor Divisions
Postamble
The core structure of a document appears most commonly as a
hierarchical nesting of divisions, with each level able to
reoccur as siblings (Figure 2). As encoders of
documents are well aware, this does not hold true for many early
documents, and even for some contemporary ones, but it is
sufficiently true elsewhere for it to be useful as a model, and
is sometimes imposed upon otherwise unstructured or
semi-structured documents to make them usable in conventional
modern contexts. In formally-published documents, especially
books, there is usually material preceding and following the
hierarchical structure (prefaces, forewords, indexes,
appendices).
Figure 3: Text Content
Recurrent, reusable, ordered
Paragraph
List Item
Table
Figure
Quotation
Image
Notation
While the function of a hierarchical structure is to provide
a referential framework within which the author can develop or
express an argument (at the least, something like introduction,
exposition, analysis, and conclusion), the text itself uses a
set of building-blocks to present that argument (Figure 3), of which a small subset seems to be widely
used.
The most basic seems to be the paragraph (a novel
consists largely just of these and nothing else apart from
chapter headings).
A list is a collection of thoughts or topics in some way
related by order or concept.
Tables and figures are ways of expressing or relating
more complex collections of information in such a way that
they do not interrupt the flow of the argument but remain
available for consultation.
Images and other notations (mathematics, music) are
specialist ways of presenting collections of information
that cannot reasonably be given in normal textual form
because they need their own language.
Quotations are arguably a form of external link (see
Figure 4), but reproduce the content of the
target verbatim so that it becomes part of the author’s
argument.
The critical point about these building-blocks is that they
occur and reoccur many times. While the components of the
hierarchical structure which contain them may reoccur as often
as needed as siblings (that is, at their
own level), they cannot occur out of depth
(that is, you cannot have a subsubsection as a child of a
chapter), whereas the building-blocks of content can occur and
reoccur at any level within the hierarchical structure. Whatever
about the constraints imposed by the hierarchical model, this
distinction seems to be a key aspect of document grammars.
Figure 4: Reference
Completion
Internal Link
External Link
Signifiers
importance
relevance
sequence
Unlike the other features in Figure 1 to
Figure 3, where at least one of them must be
present, otherwise you have no document at all, the reference
features are entirely optional, and are used at the author’s
discretion according to sense (Figure 4).
In the detail of running text, there may be a need to link
components within the document for reference or to link to other
documents elsewhere. While these features perform a closely
related function, an internal reference can be checked
immediately, so it is dependent, whereas a
link to another document is independent, as
it cannot be known at the time of writing if the reader will
have access to the document concerned.
Signifiers are ways to express some special nature of a
feature, so that it takes on a quality which impresses itself on
the reader. Emphasis or terminology are probably the most
frequently-used in continuous text; specifiers of sequence occur
in structures like numbered lists and the titles of
sections.
In this author’s experience, the core set of features, or
one very similar, is where most concrete instantiations of
document grammars appear to have started, as far back as the
days of SGML DTDs. Additional features, and deviations from the
norm, are legion, and may be specialist within a field or topic,
or introduced for practical, technical, or political reasons —
it is these which distinguish one implementation from another.
The ease (or otherwise) with which a particular type of document
can be modified seems to depend largely on the original authors’
intentions:
some structures are designed to be modified, and therefore
provide facilities for doing so, such as parameterization;
some certainly can be modified, and
occasionally are, but it’s a big effort and it’s usually
easier to put up with the occasional semantic mismatch;
some are not intended to be modified at all.
Not all parts of a document grammar may be equal to the
task: in some cases it may be hard to modify the metadata but
easy to modify the hierarchy; in others the reverse. There is
also significant debate (not a part of this analysis) about the
extent to which modifications should allow or deny a user the
right to continue to claim that they are [still] using
the type of document they started with.
Adoption
The simplest use case is no changes. This implies that the
requirements of the documents to be created or encoded are
identical to those envisaged by the creators of the grammar,
or at least so similar that the differences can be ignored.
Using an existing document grammar in this way, without any
modification at all, seems to this author to be relatively
rare in the long run, with some specific exceptions noted
below; but collecting hard data on numbers would be difficult
to undertake. Certainly it makes an excellent starting-point
for those with no history of structured-document usage, but
the process needs to be managed in order to avoid rejection
because of unexpected conflicts between the provisions of the
grammar and the view that users have of their own document
types.
One obvious exception is a need to adhere to a
de facto standard, and HTML is the most
prominent example. It is something of a special case because
it was implemented by software (browsers and editors) that
ignored or even encouraged syntactic errors. While XHTML and
HTML5 are sometimes now well-formed, the uncounted millions of
earlier HTML web pages remain in use and are likely to do so
for the foreseeable future. HTML itself has been adapted on
occasions for specialist use, but usually just in restricted
forms like the subset of XHTML used in EPUBs rather than
extending the grammar in other directions; and this author
(and separately, the ISO HTML committee) did produce versions
which used a hierarchical structure in the body of the
document.
Another exception is the mandated use of specialist
document types in a vertical market such as a single industry.
The success of many industrial document types relies either on
agreement that their use between companies in their industry
is, effectively, grammatically identical, or it relies on an
obvious advantage such as common software.
JATS, for example, while parameterized and open to
modification, is seldom changed much except by very large
organizations (and even then mostly only in the metadata)
because significant change would break the shared model of
an article in journal publishing, as well as
the toolset. However, some extensive modification has been
done to produce BITS (book interchange) and NISO STS
(standards), but these are more in the nature of forks or
full-scale derivatives.
Adaptation
Three commonly-adapted grammars are TEI,
DocBook, and DITA. All provide extensive facilities for
adaptation, implemented in different ways, and all can
generate DTDs, W3C Schemas, or RNG Schemas.
TEI is generated by the ODD system (One Document Does
all), and user modifications can be created via the Roma web
tool by adding features to a minimal core or substracting
them from an all-in version. More
specialist modifications can also be done manually by
creating customized ODD files and generating the schema
afresh.
DocBook is maintained in RNG, and features (specified
as RNG patterns) can be selectively disabled and enabled
in a customization layer, and additional features
introduced. The documentation is careful to distinguish
between creating subsets, which remain valid DocBook
instances, and extensions, which can no longer be called
DocBook [Walsh16b].
DITA is maintained in RNG and allows for adding and
removing new topic or elements types, as well as applying
effectivities (conditionalizations). Specializations can
be managed centrally by the sponsoring agency which
maintains the standard (OASIS) or locally by users or
industry groups.
Despite enquiry, I have failed to identify any modified
version of any of these three which has involved changing any
of the element type names shown in Table I, or their structure relative to one another.
Additions and exclusions occur in more specialist areas, as
noted above, but the basic grammar of a hierarchical structure
containing sequences of text blocks containing mixed text and
referential signifiers appears to satisfy that particular core
of demand for what constitutes a
document.
However, from discussions among developers of document
types and classes (for example, on the TEI, DocBook, HTML,
XML, LaTeX, and other related forums), it is clear that
there have been questions of structural relationships and
content modeling in the grammar at the design
level which appear largely to have been resolved,
at least within the encoding communities served by each
system. A few examples:
Should further discursive block-level content be
permissible after the close of the
last hierarchical child in a hierarchical
container?
After the end of the last sect1 in a
DocBook chapter? Yes, but limited to
simplesect;
After the end of the last div1 in a
TEI div0? No, perhaps oddly, given that
the TEI is designed to be able to model historical
documents which often do not conform to rigid modern
hierarchical structures;
After the end of the last div in a
HTML5 div? Sure, no problem.
Should hierarchical containers be numbered (by level)
or not?
DocBook provides names for Parts and Chapters but
sections within them are numbered by level; but there
is an unnumbered section which can be
used instead;
TEI provides level-numbered divisions and keeps
naming to attributes; but it too provides an
undistinguished div;
ISO 12083 names the components down to the section
level but numbers the levels beneath;
HTML and others simply use recurrent containers of
the same name at all depths.
To what extent should block-level (pool) components
occur within themselves, alongside normal unmarked
text?
Not at all — TEI (in SGML, one of the most notable
victims of pernicious mixed content);
Within limits — DocBook (not those with complex
internal structure);
Go for it — HTML (as implemented).
(Some systems — Microsoft Word, for example — go to
extreme lengths to avoid mixed content entirely.)
Is it the responsibility of the grammar to
describe or
prescribe the possible types of
content of a document?
TEI is largely descriptive, in that it was
designed to cope with the planet’s literary,
historical, and cultural
Nachlaß;
DocBook is mildly prescriptive (no lists in an
Abstract, for example);
Specialist grammars can be almost completely
prescriptive in structure, although rarely in text
content.
The degree to which the chosen grammar offers acceptable
constraints, or fails to offer sufficient descriptive
accuracy, will largely determine the level of adaptation
needed. This is not a failing on either side, simply an
acknowledgement that both sides are close enough to the
standard average to get along together except for a few areas
where they need to go their own way.
Build
The decision to write your own document type or class — to
design your own grammar, often from scratch — seems to me to
be less common than before, when the public offerings were
more limited, document-grammar analysis skills were rare, and
a full understanding of ISO 8879 itself rarer still.
Specialist requirements continue to mean that vertical-market
document type grammars will still need to be written. Maler and el Andaloussi (1999) and others are clear about the
commitment of time and effort required to undertake the task
at an industrial level, but there must be many hundreds,
possibly thousands, of personal or localized schemas
originally written for ad hoc purposes
which have become embedded into workflows and still continue
to function.
In the original analysis for this paper, four small
examples were used: EIRO Bulletin and Croner Briefing, which
appear in Table I because they show some
commonality with the rest; and BiBTeXML and Daybook, which
have no correlation with the Standard Average Document
Grammar.
Bulletin
This was written for the publishing workflow of a
European Union labor research institution. The design is
not easily extensible: it has an abbreviated hierarchy
and pool, simply enough for the practicalities of
publishing; and a curious selection of inline signifiers
aimed at the requirements of the publishing process
which needed to be able to identify many different
aspects (locations, organizations, people, documents,
and three different styles of emphasis) for indexing and
retrieval as well as visual formatting.
Briefing
Croner Publications had this developed for a
frequently-issued series of business briefings. There is
a simple hierarchical structure, but it is remarkable
for the pool having 12 different element types for lists
(surely some kind of record). There is a significant
amount of metadata for document control in a publishing
workflow, even for a relatively small unit of writing.
Some of the inlines are clearly designed to be
retro-fitted after formatting (position and page
number).
BiBTeXML
This shows one possible way of tackling the naming
problem when the field is (by design) very narrow. It
would, of course, have been perfectly possible to encode
the referenced document types (eg article,
book, inproceedings, etc) in
an attribute of the entry element, but this
would mean either attempting a hugely complex content
model to restrict the element types, or making the
content model elements unconstrained and leaving it to
the encoder to make the right choice.
The designers opted for the more pragmatic route of
constraining the content model with an element type for
each referenced document type, so that the element types
available within them reflect exactly those a user would
expect from any other interface to a BiBTeX file. This
is in some ways an exercise in obviousness: part of the
solution in usability is sometimes making the
affordances so obvious that it minimizes
training.
Daybook
This was designed for the transcription of
parliamentary proceedings. Legislative records not only
have to be exact (perhaps in some jurisdictions even
when the truth has been redacted) but for retrieval, an
attempt has to be made to represent the class of
material being debated, so there are element types for
General Debate, Oral Answers, Written Answers, and
Private Notice Questions. They can be nested, so the
structure is discrete; class within class, rather than
hierarchical in the normal chapter—section—subsection
manner.
Ultimately, the write or adapt decision has
to be made on many grounds: accuracy, practicality, security
(independence), ease of use, speed, convenience, software
availability, skill requirements, and others. Not all of these
can necessarily be measured directly with money: there may be
less-quantifiable aspects such as human relations and
organizational politics involved.
Drawing the line
If there is anything we can learn from a Standard Average
Document Grammar, it seems to be that it’s a convenient term for
a phenomenon which needs more accurate measurement. One way of
looking at it would be to pursue the pseudo-statistical theme
and construct values for concrete use cases, with their distance
from the theoretical SADG as a measure of divergence.
When an organization or individual considers using an
existing document grammar, there will eventually be a pain point
at which they in effect say, No, that really isn’t how we
see things here, we need something closer to how we
work. From that point on, it’s a case of adaptation:
new names, perhaps, or a new structure, or an extended or
contracted content model. If such a fork is public, it may
attract additional users, particularly if it is designed for a
vertical market. Takeup and the amount of divergence from the
original can be measured.
Some will never get to that point, and will use an existing
grammar unadapted, or perhaps with only the most trivial of
changes to, say, attribute value lists. In these circumstances,
we are effectively adding to the number of use cases at the
mode (the most commonly-occurring value of an average).
Those who elect to build their own grammar are in effect
initially located beyond some as yet undetermined measure of
deviation, although if the resulting structures end up bearing
enough similarity to the SADG, the grammar may be considered
have added to the base of contributory systems.
In this author’s experience, the adaptations of existing
grammars are undertaken for multiple reasons, but often related
to not enough or too many or
not what we call it:
insufficient or over-complex metadata requirements
(some people need more, others need less);
too many or too few restrictions on the formation of the
hierarchy: a modeling mismatch with the way the organization
or individual works;
missing or excessive provision for pool components which
lie at the heart of structured document writing and editing;
similar problems with the inline flow components.
Cutting back on the richness of some of the standard
offerings is likely to ease editing complexity, but there can
also be extra work if some components are named in a way that
causes ambiguity or uncertainty in the circumstances of use.
When this reaches frustration point among document users, there
may be a rise in tag abuse or other inaccuracy, leading to calls
for adaptation or writing a new grammar.
Given that the creation of a new document grammar and new
document type or class is non-trivial, it would be useful to
have some measure of how far off-piste you have to be to justify
it.
[Lakoff90] Lakoff, George (1990)
Women, Fire, and Dangerous Things.
University of Chicago Press, Chicago, IL,
9780226468044.
[Maler and el Andaloussi (1999)] Maler, Eve; and el Andaloussi, Jeanne (1999)
Developing SGML DTDs: from Text to Model to
Markup. Prentice-Hall, Upper Saddle River, NJ,
0-13-309881-8.
[Oppenheim67] Oppenheim, A Leo (1967)
Letters from Mesopotamia: Official, Business, and
Private Letters on Clay Tablets from Two Millenia.
University of Chicago Press, Chicago, IL.
[Southall (1989)] Southall, Richard (1989) Interfaces between the
Designer and the Document. In André, Jacques; Furuta,
Richard; and Quint, Vincent; Structured
Documents, CUP, Cambridge, England pp.119-131,
0521365546.
[Tekli11] Tekli, Joe; Chbeir, Richard; Traina,
Agma JM; and Traina Jr, Caetano (2011) XML
document-grammar comparison: related problems and
applications. In Central European Journal of Computer
Science (Springer, Versita), 1:1, pp.117–136, doi:https://doi.org/10.2478/s13537-011-0005-1
[OMalley64] Vesalius, Andreas (1554)
Letter to Johannes Oporinus. In O’Malley,
Charles Donald (1964) Andreas Vesalius of Brussels,
1514–1564. University of California Press, Berkeley
CA (text at http://vesalius.northwestern.edu/books/FA.b.html,
retrieved May 2017).
[Walsh16a] Walsh, Norman (2016)
Underlying Technologies. In XML and
Publishing, XML Summer School, St Edmund Hall,
Oxford, p.19
[1] Although in the first case, the authors of clay-tablet
business documents do appear to have settled on shared modes
of expression [Oppenheim67]; and in the
second case, Vesalius came fairly close [OMalley64]. ←
[2] The terms pool and flow
are taken from the design conventions of Document Type
Descriptions as used in SGML and XML: Maler and el Andaloussi (1999) derive them from an Open Software
Foundation DTD design committee. They are in widespread
use and occur in the specifications for both DocBook and
HTML, although they appear much earlier under the terms
hierarchy, containment, and
sequence in Southall (1989). The terms blocks
(for pool) and inlines (for flow) are also
in common use. ←
[3] The word average derives from the Latin
havaria, which was the sharing of the
expense of lost cargoes between shipping merchants which
ultimately gave us the concept of insurance. ←
[4] I am indebted to Michael Sperberg-McQueen for the
suggestion. ←
[5] As in Mark Clifton’s 1952 story about the father of
an exceptionally bright young daughter warning her
against feigning stupidity in order to be accepted in
school: Now, look, I cautioned,
don’t overdo it. That’s as bad as being too quick.
The idea is that everybody has to be just about standard
average. That’s the only thing we will
tolerate. […]←
Flynn, Peter (2014) Human
Interfaces to Structured Documents, PhD Thesis,
University College Cork, Cork, Ireland, https://cora.ucc.ie/handle/10468/1690.
Oppenheim, A Leo (1967)
Letters from Mesopotamia: Official, Business, and
Private Letters on Clay Tablets from Two Millenia.
University of Chicago Press, Chicago, IL.
Power, Richard Power; Scott, Donia;
and Nadjet Bouayad-Agha (2003) Document
Structure. In Computational Linguistics 29:2, p.223 et
seq. doi:https://doi.org/10.1162/089120103322145315.
Southall, Richard (1989) Interfaces between the
Designer and the Document. In André, Jacques; Furuta,
Richard; and Quint, Vincent; Structured
Documents, CUP, Cambridge, England pp.119-131,
0521365546.
Tekli, Joe; Chbeir, Richard; Traina,
Agma JM; and Traina Jr, Caetano (2011) XML
document-grammar comparison: related problems and
applications. In Central European Journal of Computer
Science (Springer, Versita), 1:1, pp.117–136, doi:https://doi.org/10.2478/s13537-011-0005-1
Vesalius, Andreas (1554)
Letter to Johannes Oporinus. In O’Malley,
Charles Donald (1964) Andreas Vesalius of Brussels,
1514–1564. University of California Press, Berkeley
CA (text at http://vesalius.northwestern.edu/books/FA.b.html,
retrieved May 2017).