How to cite this paper
Marcoux, Yves, C. M. Sperberg-McQueen and Claus Huitfeldt. “Formal and informal meaning from documents through skeleton sentences: Complementing
formal tag-set descriptions with intertextual semantics and
vice-versa.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). https://doi.org/10.4242/BalisageVol3.Sperberg-McQueen01.
Balisage: The Markup Conference 2009
August 11 - 14, 2009
Balisage Paper: Formal and informal meaning from documents through skeleton
sentences
Complementing formal tag-set descriptions with intertextual
semantics and vice-versa
Yves Marcoux
Associate professor
Université de Montréal, Canada
Yves Marcoux is a faculty member at EBSI, University of Montréal,
since 1991. He is mainly involved in teaching and research activities in the
field of document informatics. Prior to his appointment at EBSI, he has worked
for 10 years in systems maintenance and development, in Canada, the U.S., and
Europe. He obtained his Ph.D. in theoretical computer science from University
of Montréal in 1991. His main research interests are document semantics,
structured document implementation methodologies, and information retrieval in
structured documents. Through GRDS, his research group at EBSI, he has been
principal architect for the Governmental Framework for Integrated Document
Management, a project funded by the National Archives of Québec and by the
Québec Treasury Board.
C. M. Sperberg-McQueen
Black Mesa Technologies LLC
Sperberg-McQueen, C. M. is an independent consultant for Black
Mesa Technologies LLC. He currently serves as an editor of the W3C XML Schema
Definition Language (XSD) 1.1.
Claus Huitfeldt
Associate professor
University of Bergen, Norway
Claus Huitfeldt is Associate Professor at the Department of
Philosophy of the University of Bergen. His research interests are within
philosophy of language, philosophy of technology, text theory, editorial
philology and markup theory. He was founding Director (1990-2000) of the
Wittgenstein Archives at the University of Bergen, for which he developed the
text encoding system MECS as well as the editorial methods for the publication
of Wittgenstein's Nachlass - The Bergen Electronic Edition (Oxford University
Press, 2000). He was active in the Text Encoding Initiative (TEI) since 1991,
and was centrally involved in the foundation of the TEI Consortium. Huitfeldt
was Research Director (2000-2002) of Aksis (Section for Culture, Language and
Information Technology at the Bergen University Research Foundation).
Copyright © 2009 by the authors. Used with
permission.
Abstract
In [Sperberg-McQueen et al. 2000a], Sperberg-McQueen et al. describe a
framework in which the semantics of a structured document is represented by the
set of inferences (statements)
licensed by the document, that is, statements
which can be considered to hold on the basis of the document.
The authors suggest that an adequate set of basic inferences can be
generated from the document itself by a fairly simple skeleton sentence and deictic
expression mechanism. These ideas were taken up and developed in
various ways and contexts in later work (see for example [Sperberg-McQueen et al. 2002]) and came to be called the “Formal tag-set
description” approach (FTSD). The approach is independent of any
particular logical system, and the possibility that the statements licensed by
a document be in natural language has been mentioned and exemplified, though
not to a large extent.
With a different set of preoccupations in mind (namely, providing
semantic support to an author during the document creation process), Marcoux
introduced in [Marcoux 2006] intertextual
semantics (IS), a framework in which the meaning of a document is
entirely and exclusively represented by natural language segments.
In this paper, we compare the IS and FTSD approaches, and argue
that the insights into the meaning of a document supplied by the two approaches
actually complement each other. We give a number of concrete examples of
increasing complexity, including the set of formal and informal statements
derivable in each case, to substantiate our claim.
Table of Contents
- Introduction
- Formal tag-set descriptions
- Intertextual semantics
- Comparison of FTSD and IS
- Examples
-
- A single paragraph
-
- Intertextual semantics
- FTSD
- Phrase-level markup
-
- FTSD
- Intertextual semantics
- A sonnet
-
- Intertextual semantics
- FTSD
- Conclusion
- Appendix A. Fragment of a formal tag set description
Introduction
What is the “meaning” of markup? How is the meaning of a
document augmented or otherwise affected by the presence of markup? Those
questions have preoccupied markup theorists (and many others) for probably as
long as markup conventions have existed.
Fundamentally, two approaches can be taken. First, one can devise a
formal framework in which the meaning of a
document is represented by a set of formal statements. Second, one can seek an
informal framework in which the meaning of a
document is represented by a set of sentences in an informal language. An
example of suitable formal framework is first-order logic; an example of
suitable informal framework is any natural language. In both cases, the
statements may or may not say something about “the world” beyond
the document as such.
The two approaches are not aimed at the same goals. If automatic
inferencing (through an inference engine) is in sight, then the formal approach
probably has a leading edge. However, if some other use of the
“meaning” of the document is envisioned, which for example involves
showing that meaning to humans, then it is possible the informal approach has a
leading edge.
In [Sperberg-McQueen et al. 2000a], Sperberg-McQueen et al. describe a
framework in which the semantics of a structured document is represented by the
set of inferences (statements)
licensed by the document, that is, statements
which can be considered to hold on the basis of the document. The authors
suggest that an adequate set of basic inferences can be generated from the
document itself by a fairly simple skeleton
sentence and deictic expression
mechanism. These ideas were taken up and developed in various ways and contexts
in later work (see for example [Sperberg-McQueen et al. 2002] and [Sperberg-McQueen & Miller 2004]), which we here call the “Formal tag-set
description” approach (FTSD). The approach is independent of any
particular logical system, and the possibility that the statements licensed by
a document be in natural language has been mentioned and exemplified, though
not systematically.
With a different set of preoccupations in mind (namely, showing a
“preview” of the meaning of a document to an author during the
writing process), Marcoux introduced in [Marcoux 2006]
intertextual semantics (IS), a framework in
which the meaning of a document is entirely and exclusively represented by
natural language segments.
In this paper, we compare the IS and FTSD approaches, and argue that
the insights into the meaning of a document supplied by the two approaches
actually complement each other. After a brief review of each approach (this
paper is not meant to be a complete introduction to either), we give a number
of concrete examples of increasing complexity, including the set of formal and
informal statements derivable in each case, to substantiate our claim.
Formal tag-set descriptions
The essential ideas of the FTSD approach are:
-
The meaning
of a markup construct M in an
instance document can be identified with the set of sentences S true because
of M, or (equivalently) the set of sentences that can be inferred from
M. When necessary, we distinguish the sentences in S
from other sentences by calling the former instance
sentences.
-
The meaning
of a markup construct M in the
abstract can be captured effectively by skeleton sentences
,
sentence schemata with blanks to be filled in appropriately for each instance
of construct M in a document instance.
The skeleton sentences are generalizations of the instance
sentences mentioned in the preceding point; each instance sentence should be an
instantiation of some skeleton sentence.
-
For existing colloquial XML vocabularies, when the inferences
licensed by a particular element instance are being tabulated, the values to be
inserted for the blanks in the appropriate skeleton sentences often vary with
the element's position; XPath expressions can be used to specify a concise rule
for finding the appropriate values, given a particular element as context node.
Because the value of the XPath expressions varies with context, they are (in
the linguistic sense) deictic
expressions.
-
Skeleton sentences, together with the deictic expressions used
to specify how to fill in their blanks, can provide useful documentation of a
markup vocabulary. They could be integrated, for example, into the Tag Set
Documentation (TSD) vocabulary defined by the Text Encoding Initiative. If the
skeleton sentences are written in a formal notation like predicate calculus,
the conventional tag set documentation (TSD) becomes a formal
tag set documentation (or FTSD), which can provide the kind of formal
definition of the semantics of an XML vocabulary which some observers have
occasionally desired, and which some others (who give signs of wishing to
displace colloquial XML and replace it with RDF or some other formalism
instead) have simply claimed does not or cannot exist.
Intertextual semantics
The intertextual semantics (IS) approach is based on a view of which
traces can be found in, among other places, the works of Wirzbicka [Wirzbicka 1992], Smedslund [Smedslund 2004] and even
Wittgenstein [Wittgenstein 1953]. This is the view that humans
ultimately make sense
of artefacts through the use of
natural language, or rather, that to the
extent that they can make sense of an artefact, this sense can be expressed in
natural language (NL). Thus, in designing artefacts such as markup, one should
be preoccupied by how, and how easily and with how much ambiguity (or
unambiguity), humans can understand those artefacts in NL terms. No matter how
useful intermediate formal representations of meaning (including marked-up
documents) may be for conciseness, machine processing, etc., they must
ultimately be translatable (not necessarily translated) to NL, and are ever only as “meaningful”
as such NL expressions of them are.
In the realm of markup, IS suggests the creators of tag-sets
(modelers) must be preoccupied by how markup can be translated to NL. Even if
“end users” never see any marked-up document, some other humans,
for example, processing software developers, or archivists, will have to deal
with them directly or indirectly, unless the documents are totally pointless.
One might say it is even more important to be preoccupied by that translation
as the number of intermediate representations increases, because there are then
more opportunities for misinterpretations. Dubin et al. have recently
illustrated some difficulties that can arise from failures in automatic
translation from one representation to another [Dubin et al. 2006].
IS proposes a mechanism by which NL passages (or whole documents) are
generated from marked-up documents, according to an IS
specification for the tag-set. So far, only very weak NL generation
mechanisms have been explored, and it is extremely
important that those mechanisms be weak, because too powerful
mechanisms would “hide under the carpet” inherent interpretation
complications which IS, in contrast, seeks to uncover.
In the current state of the IS framework, an IS specification takes
the form of a table giving, for each element type two NL segments: a
“text-before” segment and a “text-after” segment
(generically called “peritexts”). Attributes are handled by the
possibility of including in the peritexts “guarded segments,”
segments guarded by an attribute name, that are only included if the
corresponding attribute is specified on the element, and that can refer to the
attribute value. “Local” elements (in the sense of W3C schemas) are
supported, so that different peritexts can be assigned depending on the
ancestors of the element. The IS generation process is similar to styling the
document with the peritexts, concatenating peritexts and element contents as the document tree is traversed
depth-first. The IS, or IS-meaning, of the
document is the resulting character string.
IS has similarities with various mechanisms aimed at presenting
markup in more or less explicit or explicated forms, such as Piez's
false-color proofs
[Piez 2006, slide 12].
However, it is important to stress that the preoccupations of IS are not at the
presentational level, but really at the semantic level. The
“presentation” obtained through the IS mechanism
defines the meaning of a document. In the
other approaches we are aware of, the presentation (if successful) accurately
represents the meaning of a document, but that
meaning is defined elsewhere.
It is also important to mention that IS is not first and foremost
intended to give interpretations of existing tag-sets, but is mostly meant to
assist in the development of new tag-sets. Applying it to existing tag-sets
often gives rise to improbable or awkward formulations in the IS (meaning) of
documents, in part because such tag-sets were not in the first place designed
with IS preoccupations in mind. In our view, this only brings to light the
inherent complexities of the tag-set, or the difficulty or possible variability
(sometimes deliberate, it is important to say) in interpreting conforming
documents.
A full presentation of IS in general can be found in [Marcoux & Rizkallah 2009]. For structured documents, it is defined in [Marcoux 2006] and [Marcoux & Rizkallah 2007].
Comparison of FTSD and IS
Suppose document D conforms to a certain tag-set TS, to which
corresponds a collection F of formal skeleton sentences. We will denote by
F(D) the set of actual formal sentences (not skeleton ones) generated by
applying F to D. Now, let I be an IS specification for TS, and let us
denote by I(D) the set of (natural language, or “informal”)
sentences generated by I when applied to D.
What can we say about how I(D) compares to F(D)? Of course,
it all depends on exactly how F and I are constructed, that is, ultimately,
on what the actual meaning of markup is intended to be. However, we can say
something about what I and F would typically look like.
-
Ordering
Typically, F(D) is an unordered set of discrete statements in some formal
language. In our examples (as in most of previous work on FTSD), we will use
first-order logic sentences. Even when natural language is suggested as a
potential language for statements, F(D) is first and foremost envisoned as
an unordered set of sentences.
In contrast, I(D) is typically a single string of characters,
possibly forming multiple sentences (in natural language), in which case,
however, the order of the sentences matters.
I(D) is first and foremost meant to be readable sequentially, as normal
text (as opposed to hypertext). That being said, I(D) can contain hypertext links, but they must only be used
to point to “background” or “complementary” material,
which more or less forms a whole, and not in a way that disrupts
sequentiality.
-
Universe of discourse and target
community
In the FTSD approach, the actual set of predicates used in the
sentences for a given tag-set depends on the “universe of
discourse” of the documents, that is, the collection of things and
concepts the documents in that tag-set “talk about.” For example,
in defining the meaning of the OAI 2.0 tag-set [Sperberg-McQueen 2005],
predicates to the effect that something “is an OAI-server,”
“is an OAI-request,” or “is a response sent by an
OAI-server,” are naturally introduced. In addition to defining predicates
(which include types and relations), characterizing the universe of discourse
in the FTSD approach involves making assertions about that universe (facts or
inference rules), e.g., assertions that certain individuals satisfying certain
predicates exist.
In IS, the rough equivalent of defining the universe of discourse
is identifying the target community of users
of the documents (“users” is used here in a generic sense, which
includes authors, readers, analysts, processing software developers,
information managers, archivists, etc.). Intuitively, one can view the universe
of discourse as the intersection of what the community members know or, at
least, can name. In identifying the target community, one is required to make
(preferably explicit, but at least implicit) assumptions about what vocabulary
and level of language is appropriate for the community members, what their
previous knowledge is, what profiles they have, through which use cases will
they interact with the documents, etc. Note how similar assumptions are
involved in making a sensible and useful selection of predicates and other
elements in the FTSD approach.
-
Deixis and locality of references
In the context of markup, deictic
expressions are expressions pointing to various
“locations” within a document (usually in a relative way). Relative
XPath expressions provide a good approximation of what deictic expressions are.
For example, a deictic expression evaluated at some given element in a document
may point to a specific attribute of that element, or to the first child of
that element, or to a specific attribute of the last child of that element,
etc.
Although far from exploiting the full expressive power of
XPath 1.0 (let alone XPath 2.0), deictic expressions in the FTSD
approach often point outside of “the current element.” For example,
they might point to the parent, a child, or a sibling of the current element.
In contrast, if we were to express the “pointing” power of the IS
generation mechanism as deictic expressions, the only expressions allowed would
be “the current element,” or “the attribute named X of the
current element.” So, the reach of a
skeleton sentence in IS is very limited. But that limitation is quite
deliberate; in a nutshell, it stems from the assumption that the closer the
artificial (marked-up) form of knowledge is to its informal (natural-language)
form, the higher the odds it will be properly understood. Any complexity in the
deictic expressions used in the skeleton sentences translates (or, at least, so
goes the IS story) into complexity for anyone required to comprehend the
tag-set (whether they be readers, authors, archivists, software developers, or
what have you).
Examples
We now compare FTSD and IS through examples.
A single paragraph
For simplicity, we start with a very simple example, perhaps
trivial. (But its simplicity allows the machinery to be more readily
understood.) Let D be the following document:
Example 1:
<doc>
<para>Elizabeth went to Sussex.</para>
</doc>
We have just two tags in the tag-set, doc
and
para
. With such a simple example, the similarity between FTSD and
IS can be quite high. The minimal universe of discourse for this example is
that of documents, paragraphs, and character strings. We assume for purposes of
the example that these are primitive notions that convey interesting
information about the nature of certain objects. Documents contain sequences of
paragraphs. Paragraphs have character-string values.
Intertextual semantics
An IS specification for our tag-set just has to specify a
text-before segment and a text-after segment for the two elements
doc
and para
. We will present IS specifications using
the format adopted in [Marcoux 2009], which is pretty much
self-explanatory:
<rule paths="doc"
text-before=" This is a document: "
text-after=" End of the document. " />
<rule paths="para"
text-before=" This is a paragraph: "
text-after=" End of the paragraph. " />
which would produce the following IS for our document:
This is a document:
This is a paragraph:
Elizabeth went to Sussex.
End of the paragraph.
End of the document.
Note that the peritexts
(text-before and text-after segments) are shown differently from actual
contents coming from the document; this is an integral and essential feature of
the IS framework (formally, we could say the strings forming the IS of
documents comprise characters from two different alphabets, or of two different
colors). Note also that some indentation is performed, for increased
readability. This is not at the moment an
integral feature of the framework, but it has been the usual presentation of IS
so far [Marcoux 2006] [Marcoux & Rizkallah 2007]. In fact, the
implementation described in [Marcoux 2009] does perform an
automatic indentation of the IS.
FTSD
In all our examples, we will use normal first-order logic as a
formalism for FTSD. For this first example, we need only a few predicates to
capture the documented meaning of the markup:
is_document(x)
|
x is a document.
|
document_content(x,y)
|
Document x contains y (a sequence of paragraphs
— or in larger vocabularies, sections, heading, tables, and other
paragraph-level objects).
|
is_paragraph(x)
|
x is a paragraph.
|
paragraph_string(x, y)
|
The character-string value of the paragraph x is the
string y.
We will write strings enclosed in quotation marks in the
conventional way.
|
In order to write out the second argument of
document_content
, we will need a way to write a sequence of
objects (or rather, of expressions denoting objects) as a sequence. Where
possible, we adopt the convention that sequences are written with commas
separating the expressions denoting the items in the sequence, and enclosed in
parentheses: the sequence consisting of a, b, and c in that order, is
written (
a,
b,
c)
. In some circumstances, it proves simpler to give the
sequence a name and specify the position of its items with a predicate like
seq_pos_item(
x,
y,
z)
. (We will start counting at 1.)
Assuming two individuals to which we assign the arbitrary
identifiers d
and p
, we can write the instance
sentences for this document instance thus:
is_document(d)
is_paragraph(p)
document_content(d, (p) )
paragraph_string(p,"Elizabeth went to Sussex.")
or equivalently (assuming an individual
s)
is_document(d)
is_paragraph(p)
document_content(d, s)
seq_pos_item(s,1,p)
paragraph_string(p,"Elizabeth went to Sussex.")
A more rigorous and detailed account might include character
tokens and character types in the universe of discourse, so that if (for
example) two paragraphs in the same document had the same text, the formal
representation of the document could make clear that while the two different
paragraphs had the same string-value at the character type level, they were
realized by different sequences of character tokens. Such
rigor is necessary to achieve clarity and satisfactory treatment of some topics
(e.g., the relation between a transcription and its exemplar), but it requires
a great deal of machinery to achieve results that were intuitively obvious to
start with, and we omit it here to spare our readers the ennui of working
through it.
For similar reasons, we refrain here from offering a fuller
development of character strings, with definitions of length, concatenation,
and substring functions, which we do not need for now. Some universes of
discourse may need them. At this moment, all we have are string individuals,
denoted by the usual straightforward notation "a string"
.
If we decide the document means no more than that the content of
the para
element is a paragraph, which in turn makes up the sole
content of the document, then we can be happy to say that F(D), the meaning
of the document, is the set of sentences given above.
For this purpose, a set F of a single formal skeleton sentence
will suffice. For convenience, we will write skeleton sentences as literals,
filling in blanks with their associated deictic expressions and distinguishing
the deictic expressions from their context by enclosing them in braces (in the
style of XSLT attribute-value templates).
Our F for this vocabulary might contain these skeleton
sentences:
for doc elements
|
is_document( {generate-id()} )
document_content( {generate-id()} ,
{concat(generate-id(),'-children')} )
|
for para elements
|
is_paragraph( {generate-id()} )
seq_pos_item(
{concat(generate-id(..),'-children')} , {1 +
count(preceding-sibling::*)} , {generate-id()} )
paragraph_string( {generate-id()} ,
{string(.)} )
|
In general, we assume that each of the skeleton sentences given
will be instantiated once for each element that matches the pattern. Here, each
doc
element will generate one is_document
sentence
and one document_content
sentence, and each para
element will generate three sentences. As each skeleton sentence is
instantiated, each deictic expression will be evaluated with the current
element instance as the context node, and the instance sentence will be written
out with the value replacing the deictic expression.
Phrase-level markup
The “challenges” of our next example are phrase-level
markup and the use of attributes.
Example 2:
<doc>
<para>
<person key="E.I.Regina">Elisabeth</person> went to
<place key="getty:7008133">Sussex</place>.
<person>Elizabeth</person>, on her part, went to
<person>Sussex</person>, and told him the whole story.
</para>
</doc>
The doc
and para
elements here have the
same meaning as in the preceding example; the person
and
place
elements mark personal names and place names in the running
text.
The optional key
attribute, used for both
person
and place
, introduces a notion of
registry of persons and places. The value of that
attribute is the “access key” of a person or place in some known
“registry,” which establishes a univocal correspondence between
keys and entities (persons or places, in our case). A
single entity can have many different keys “pointing” to it, but
any given key points to only one entity of a given type. It would be possible
to introduce registries as individuals in our universe of discourse; however,
it is not necessary and, for simplicity, we will not do it.
FTSD
The predicate-calculus sentences for this document will use the
following predicates (in addition to those defined in the preceding section):
is_personname(s)
|
s (typically a string of characters) is (here) a proper
noun denoting a person.
|
is_person(x)
|
x is a person.
|
is_placename(s)
|
s (typically a string of characters) is (here) a proper
noun denoting a place.
|
is_place(x)
|
x is a place.
|
denotes(s,x)
|
The string of character tokens s here denotes the
object or individual x.
|
person_dbkey(x, y)
|
The person x is denoted by the identifier y.
|
place_dbkey(x, y)
|
The place x is denoted by the identifier y.
|
Note that the formulations of is_personname
,
is_placename
, and especially of denotes
, are not
entirely satisfactory. Earlier, we simplified the discussion by not
distinguishing systematically between sequences of character tokens and
sequences of character types. Here, we pay the price for that simplification.
Strictly speaking, what is needed here is a way to specify that a
particular instance or occurrence of string s (i.e., a particular
sequence of character tokens) is used as a proper noun and
denotes individual x. Not all occurrences of the string s will necessarily
be proper nouns (consider the personal name Brown
and the place
name Bath
), nor will they all denote the same individual.
Without a rather tedious treatment of tokens and types, it is not possible to
make the necessary distinction properly; we content ourselves with the
hand-waving visible in the glosses above and in this explanatory
paragraph.
The predicates person_dbkey
and
place_dbkey
, by contrast, need an identifier (viewed as a sequence
of character types) not a sequence of tokens, as their second argument.
Armed with these predicates, we can say in predicate calculus
terms not only that the string
Elizabeth
is (here) a personal name, but also that
that name denotes a particular individual, also identified by a particular
prosopographical key in some known registry. And similarly, we can say that
Sussex
here is used once to denote the country, and
once the nobleman.
The skeleton sentences for the new element and attribute types
can be formulated thus:
person
|
is_personname( {string(.)} )
is_person( {concat('ref-',generate-id(.))}
)
denotes( {string(.)} ,
{concat('ref-',generate-id(.))} )
|
person/@key
|
person_dbkey(
{concat('ref-',generate-id(.))} , {string(.)} )
|
place
|
is_placename( {string(.)} )
is_place( {concat('ref-',generate-id(.))}
)
denotes( {string(.)} ,
{concat('ref-',generate-id(.))} )
|
place/@key
|
place_dbkey(
{concat('ref-',generate-id(.))} , {string(.)} )
|
The result of instantiating the skeleton sentences for the
example document is
is_paragraph(id17806)
seq_pos_item(id19125-children, 1, id17806)
paragraph_string(id17806, "
Elisabeth went to Sussex.
Elizabeth, on her part, went to Sussex, and told him the whole story.
")
is_personname("Elisabeth")
is_person(ref-id17651)
denotes("Elisabeth", ref-id17651)
person_dbkey(ref-id17651, "E.I.Regina")
is_placename("Sussex")
is_place(ref-id19390)
denotes("Sussex", ref-id19390)
place_dbkey(ref-id19390, "getty:7008133")
is_personname("Elizabeth")
is_person(ref-id19224)
denotes("Elizabeth", ref-id19224)
is_personname("Sussex")
is_person(ref-id19558)
denotes("Sussex", ref-id19558)
Intertextual semantics
The IS specification is as follows:
<rule paths="doc"
text-before="This is a document:"
text-after="End of the document." />
<rule paths="para"
text-before="This is a paragraph:"
text-after="End of the paragraph." />
<rule paths="person"
text-before="THE PERSON NAMED "
text-after=" @key[ (identified by the registry record
{{http://my.person.registry/?@}})]" />
<rule paths="place"
text-before="THE PLACE NAMED "
text-after=" @key[ (identified by the registry record
{{http://my.place.registry/?@}})]" />
The strings "{{"
and "}}"
delimit
hyperlinks in peritexts. Passages of the form @
attrib-name[...@...]
are
“guarded,” and only appear in the IS if the named attribute in
present on the element.
Note that two text-before segments have been written in uppercase
to make them independent of their position in a sentence.
Here is the resulting IS:
A sonnet
Here is a more realistic example, a TEI (P5) encoded sonnet by
Québécois poet Émile Nelligan (1879-1941).
Example 3:
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="fr-CA">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Le Vaisseau d'or</title>
<author>Émile Nelligan</author>
<editor>Luc Lacourcière</editor>
</titleStmt>
<publicationStmt>
<pubPlace>Montréal (Québec, Canada)</pubPlace>
<publisher>Fides</publisher>
<date>1952</date>
</publicationStmt>
<sourceDesc>
<bibl>
<author>Émile Nelligan</author>
<title>Poésies complètes 1896-1899</title>
<edition>Texte établi et annoté par Luc Lacourcière</edition>
<editor>Luc Lacourcière</editor>
<pubPlace>Montréal (Québec, Canada)</pubPlace>
<publisher>Fides</publisher>
<date>1952</date>
<biblScope>page 44</biblScope>
</bibl>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<front>
<head>LE VAISSEAU D'OR</head>
</front>
<body>
<lg>
<l>Ce fut un grand Vaisseau taillé dans l'or massif :</l>
<l>Ses mâts touchaient l'azur, sur des mers inconnues ;</l>
<l>La Cyprine d'amour, cheveux épars, chairs nues,</l>
<l>S'étalait à sa proue, au soleil excessif.</l>
</lg>
<lg>
<l>Mais il vint une nuit frapper le grand écueil</l>
<l>Dans l'Océan trompeur où chantait la Sirène,</l>
<l>Et le naufrage horrible inclina sa carène</l>
<l>Aux profondeurs du Gouffre, immuable cercueil.</l>
</lg>
<lg>
<l>Ce fut un Vaisseau d'Or, dont les flancs diaphanes</l>
<l>Révélaient des trésors que les marins profanes,</l>
<l>Dégoût, Haine et Névrose, entre eux ont disputés.</l>
</lg>
<lg>
<l>Que reste-t-il de lui dans la tempête brève ?</l>
<l>Qu'est devenu mon cœur, navire déserté ?</l>
<l>Hélas! Il a sombré dans l'abîme du Rêve!</l>
</lg>
</body>
</text>
</TEI>
Intertextual semantics
The IS specification is:
<rule paths="TEI"
text-before="This electronic document is a TEI document. @xmlns[It obeys
the general structure and definitions associated with the XML
namespace {{@}}.] @xml:lang[Its textual contents are written (except
where otherwise stated) in the natural language which, according to the
IETF RFC 1766 specification (accessible at
{{http://www.ietf.org/rfc/rfc1766.txt}}), is denoted by "@".]"
text-after="This concludes the TEI document." />
<rule paths="teiHeader"
text-before="This section gives general information about how the
document came into existence, the way it is identified, its status,
and trail of modifications."
text-after="This concludes the section giving information about how
this document came into existence, the way it is identified, its
status, and trail of modifications." />
<rule paths="fileDesc"
text-before="The document, as a computer file, can be described as
follows:"
text-after="This concludes the description of the document as a
computer file." />
<rule paths="titleStmt"
text-before="The key identifying elements of this document are:"
text-after="End of the key identifying elements." />
<rule paths="titleStmt/title"
text-before="its title, which is "
text-after=" " />
<rule paths="titleStmt/author"
text-before="its author name, which is "
text-after=" " />
<rule paths="titleStmt/editor"
text-before="its editor name, which is "
text-after=" " />
<rule paths="publicationStmt"
text-before="This document corresponds to a published work"
text-after=" " />
<rule paths="pubPlace"
text-before="which has been published in the place "
text-after=" " />
<rule paths="publisher"
text-before="by the publisher "
text-after=" " />
<rule paths="date"
text-before="on the date "
text-after=" " />
<rule paths="sourceDesc"
text-before="This document is derived from another document, called
"the source"."
text-after="End of the indentification of the source." />
<rule paths="sourceDesc/bibl"
text-before="That source corresponds to the following bibliographic
data:"
text-after=" " />
<rule paths="author"
text-before="Author: "
text-after=" " />
<rule paths="title"
text-before="Title: "
text-after=" " />
<rule paths="edition"
text-before="Edition: "
text-after=" " />
<rule paths="editor"
text-before="Editor: "
text-after=" " />
<rule paths="bibl/pubPlace"
text-before="Publication place: "
text-after=" " />
<rule paths="bibl/publisher"
text-before="Publisher: "
text-after=" " />
<rule paths="bibl/date"
text-before="Publication date: "
text-after=" " />
<rule paths="biblScope"
text-before="Part used as a source: "
text-after=" " />
<rule paths="text"
text-before="The document "per se" starts here."
text-after="End of the document "per se"." />
<rule paths="front"
text-before="Front matter:"
text-after=" " />
<rule paths="front/head"
text-before="General heading: "
text-after=" " />
<rule paths="body"
text-before="Main body of the document:"
text-after="End of the main body of the document." />
<rule paths="l"
text-before="Line: "
text-after=" " />
<rule paths="lg"
text-before="Stanza:"
text-after=" " />
Here is the resulting IS:
Note that we have taken advantage of the fact that
http://www.tei-c.org/ns/1.0
is a dereferenceable URL, in order to
convert it into a clickable link in the IS.
It may be an appropriate place to note that a given IS
specification (and this is also true of FTSDs) need not be tied to a tag-set
in the absolute. It can actually mirror a
certain usage of a given tag-set (e.g.,
tag-set + writing protocol). The current example illustrates that in a number
of ways, for example by the fact that the IS specification takes for granted
that lg
are all stanzas.
FTSD
Like many vocabularies, the one shown here (a simple adaptation
of the TEI) divides a document into a header providing metadata and the text
proper. Taking TEI documents
, metadata, and text
proper
as primitive notions, we can express the overall structure of a
TEI document using these predicates:
is_TEI_document(x)
|
The individual x (an XML document) is a TEI document
(i.e., it's encoded following the TEI Guidelines).
|
TEIdoc_metadata(x, y)
|
The individual y (a TEI header) provides the metadata
for the individual x (a TEI document).
|
TEIdoc_textproper(x, y)
|
The individual y is the text proper
portion of the individual x (a TEI document).
|
Skeleton sentences for this information are straightforward; as
in the preceding examples, we use the generate-id()
function of
XSLT to generate arbitrary identifiers for various individuals, with or without
concatenating various prefixes or suffixes.
TEI elements
|
is_TEI_document( {generate-id()} )
|
teiHeader elements
|
TEIdoc_metadata( {generate-id(..)} ,
{generate-id()} )
|
text elements which are children of
TEI elements
|
TEIdoc_textproper( {generate-id(..)} ,
{generate-id()} )
|
If, as is plausible, we assume that every TEI
document is a document in the more general sense, as well as being an XML
element, we could also infer that
is_document(
x)
and
is_XML_element(
x)
, for any
x which is a TEI
document. These could be added to the skeleton sentences in the FTSD, or we
could assume (as background knowledge) an inference rule which can be given in
the following form:
is_TEI_document(x)
________________________________________
is_document(x)
is_XML_element(x)
This is a relatively simple example of what proves to be a
general fact about the specification of FTSDs (and also of IS specifications):
there is a certain latitude about what is said where, so that producing a
formal tag-set description requires choices and judgement.
The actual text of the document has a simple regular structure,
readily representable with the predicates:
is_textproper(x)
|
The individual x is the textual part of a TEI document
(as opposed to the metadata in the TEI header).
|
text_contents(x,y)
|
The text x contains y (a sequence of objects).
|
is_linegroup(s)
|
The sequence s is a group of verse lines (possibly with
nested line groups, and possibly with title or other heading material). (The
most common form of line group is a stanza, but in itself, without a
type attribute, the is_linegroup predicate says
nothing about stanza structure.)
|
lg_contents(x,y)
|
The line group x contains y (a sequence of lines,
line groups, etc.)
|
is_verseline(x)
|
The individual x is one line of verse
(not necessarily a typographic line!)
|
line-string(x,s)
|
The verse line x has (can be realized as) the character
string s.
|
These are used in the obvious way. A small sample of instance
sentences will illustrate the result:
is_textproper(id21050)
TEIdoc_textproper(id20965, id21050)
is_sequence(id21050-children)
text_contents(id21050, id21050-children)
seq_pos_item(id21050-children, 1, id21053 )
is_title("LE VAISSEAU D'OR")
doc_title(id20965, "LE VAISSEAU D'OR")
seq_pos_item(id21050-children, 2, id21060 )
is_linegroup(id21062)
lg_contents(id21062, id21062-children)
seq_pos_item(id21060-children, 1, id21062)
is_verseline(id21064)
line_string(id21064, "Ce fut un grand Vaisseau taillé dans l'or massif :")
seq_pos_item(id21062-children, 1, id21064)
is_verseline(id21069)
line_string(id21069, "Ses mâts touchaient l'azur, sur des mers inconnues ;")
seq_pos_item(id21062-children, 2, id21069)
is_verseline(id21074)
line_string(id21074, "La Cyprine d'amour, cheveux épars, chairs nues,")
seq_pos_item(id21062-children, 3, id21074)
is_verseline(id21080)
line_string(id21080, "S'étalait à sa proue, au soleil excessif.")
seq_pos_item(id21062-children, 4, id21080)
is_linegroup(id21085)
lg_contents(id21085, id21085-children)
seq_pos_item(id21060-children, 2, id21085)
is_verseline(id21088)
line_string(id21088, "Mais il vint une nuit frapper le grand écueil")
seq_pos_item(id21085-children, 1, id21088)
is_verseline(id21093)
line_string(id21093, "Dans l'Océan trompeur où chantait la Sirène,")
seq_pos_item(id21085-children, 2, id21093)
...
is_verseline(id21136)
line_string(id21136, "Hélas! Il a sombré dans l'abîme du Rêve!")
seq_pos_item(id21125-children, 3, id21136)
The TEI header can contain a great deal of metadata, but it would
be tedious to work through all the details needed even for this simple example,
let alone to work through the variations in structure and semantics allowed by
the TEI vocabulary. So we will pass over the TEI header almost in silence. A
fragment of an FTSD for this example is given in the appendix; it covers the
elements and attributes used in the example's header.
Conclusion
What can we conclude from the exercices we have been going through in
this article? Obviously, FTSD and IS have quite different goals. Yet, as we
hope to have shown, they are strikingly similar, especially with respect to the
type of intellectual effort that goes into writing a specification. Empirical
“evidence” in support of this view is that, in the FTSD approach,
the names chosen for predicates often have the look-and-feel of very compact
peritexts, such as is_document
, seq_pos_item
, and
paragraph_string
. We think we have brought out the fact that the
same kind of knowledge of the “user community,” of their profiles,
of the use cases through which they interact with the documents, are necessary
to write both a useful FTSD and a useful IS specification for a given
tag-set.
We suggest the following complementarity between IS and FTSD: if the
IS approach is used in the process of developing a tag-set, then, much of the
work needed to devise a suitable universe of discourse for FTSD will have been
done already, and the task of mapping that universe to predicates and other
formal objects will be much simplified. It is even possible that the IS
specification worked out might constitute valuable material for documenting the
formal apparatus developed for the FTSD.
Appendix A. Fragment of a formal tag set description
This fragmentary FTSD includes entries for the elements and
attributes used in the third example of the paper and provides skeleton
sentences covering simple straightforward uses of those elements and
attributes. For simplicity's sake, however, it does not attempt to cover all
the cases foreseen in the full TEI Guidelines.
The basic structure of the FTSD is as given in [TEI P4], and the descriptions of elements and attributes are taken
from that source, but detailed information has been omitted for brevity. The
skeletons
and ss
elements have been added as
extensions; it is hoped that after the discussion above their syntax and
semantics will be clear enough without further documentation.
<tsd xmlns:t="http://www.tei-c.org/ns/1.0">
<tagDoc id="TEI.2">
<gi>TEI</gi>
<rs>TEI document</rs>
<desc>Contains a single TEI-conformant document,
comprising a TEI header and a text, either in isolation
or as part of a <gi>teiCorpus</gi> element.</desc>
<skeletons>
<ss lang="pc">is_document(<deixis>generate-id()</deixis>)</ss>
<ss lang="pc">is_TEI_document(<deixis>generate-id()</deixis>)</ss>
<ss lang="pc">is_XML_element(<deixis>generate-id()</deixis>)</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="teiHeader">
<gi>teiHeader</gi>
<rs>TEI Header</rs>
<desc>supplies the descriptive and declarative information
making up an <soCalled>electronic title page</soCalled>
prefixed to every TEI-conformant text.</desc>
<skeletons>
<ss lang="pc">is_XML_element(<deixis>generate-id(.)</deixis>)</ss>
<ss lang="pc">TEIdoc_metatdata(<deixis>generate-id(..)</deixis
>, <deixis>generate-id()</deixis>)</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="fileDesc">
<gi>fileDesc</gi>
<rs>File Description</rs>
<desc>contains a full bibliographic description of an
electronic file.</desc>
<skeletons>
<ss lang="pc">is_XML_element(<deixis>generate-id(.)</deixis>)</ss>
<ss lang="pc">is_bibliographic_description(<deixis
>generate-id(.)</deixis>)</ss>
<ss lang="pc">is_isbd(<deixis>generate-id(.)</deixis>)</ss>
<ss lang="pc">doc_bibldesc(<deixis
>generate-id(ancestor::t:TEI[1])</deixis
>, <deixis>generate-id()</deixis>)</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="titleStmt">
<gi>titleStmt</gi>
<rs>title statement</rs>
<desc>groups information about the title of a work and
those responsible for its intellectual content</desc>
<skeletons>
<ss lang="pc">isbd_titlestatement(<deixis>generate-id(..)</deixis
>, <deixis>generate-id()</deixis>)</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="title">
<gi>title</gi>
<desc>contains the title of a work, whether article, book,
journal, or series, including any alternative titles or
subtitles.</desc>
<attList>
<attDef>
<attName>level</attName>
<rs>bibliographic level (or class) of title)</rs>
<desc>indicates whether this is the title of an article,
book, journal, series, or unpublished material</desc>
<datatype>(a | m | j | s | u)</datatype>
<valList>
<val>a</val>
<desc>analytic title (article, poem, or other item
published as part of a larger item)</desc>
<val>m</val>
<desc>monographic title (book, colection, or other item
published as a distinct item, including single volumes
of multi-volume works)</desc>
<val>j</val>
<desc>journal title</desc>
<val>s</val>
<desc>series title</desc>
<val>u</val>
<desc>title of unpublished material (including theses
and dissertations unless published by a commercial
press)</desc>
</valList>
<default>#IMPLIED</default>
<skeletons>
</skeletons>
</attDef>
</attList>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:titleStmt/t:title">
is_title("<deixis>string(.)</deixis>")
doc_title(<deixis>generate-id(../../../..)</deixis
>, "<deixis>string(.)</deixis>")
</ss>
<ss lang="pc" match="t:bibl/t:title">
is_title("<deixis>string(.)</deixis>")
doc_title(<deixis>concat('ref-',generate-id(..))</deixis
>, "<deixis>string(.)</deixis>")
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="author">
<gi>author</gi>
<desc>in a bibliographic reference, contains the name of the author(s),
personal or corporate, of a work; the primary
<term>sttement of responsibility</term> for any bibliographic item.</desc>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:titleStmt/t:author">
is_authorname("<deixis>string(.)</deixis>")
is_author("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>",<deixis>concat('ref-',generate-id())</deixis>)
doc_author(<deixis>generate-id(../../../..)</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
<ss lang="pc" match="t:bibl/t:author">
is_authorname("<deixis>string(.)</deixis>")
is_author("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>",<deixis>concat('ref-',generate-id())</deixis>)
doc_author(<deixis>concat('ref-',generate-id(..))</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="editor">
<gi>editor</gi>
<desc>secondary <term>statement of responsibility</term>
for a bibliographic item, for example the name of an
individual, institution, or organization (or of several
such) acting as editor, compiler, translator, etc.</desc>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:titleStmt/t:editor">
is_editorname("<deixis>string(.)</deixis>")
is_editor("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_editor(<deixis>generate-id(../../../..)</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
<ss lang="pc" match="t:bibl/t:editor">
is_editorname("<deixis>string(.)</deixis>")
is_editor("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_editor(<deixis>concat('ref-',generate-id(..))</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="publicationStmt">
<gi>publicationStmt</gi>
<rs>publication statement</rs>
<desc>groups information concerning the publication or
distribution of an electronic or other text.</desc>
<skeletons>
<ss lang="pc">isbd_pubstatement(<deixis>generate-id(..)</deixis
>, <deixis>generate-id()</deixis>)</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="pubPlace">
<gi>pubPlace</gi>
<rs>place of publication</rs>
<desc>contains the name of the place where a bibliographic
item was published</desc>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:publicationStmt/t:pubPlace">
is_placename("<deixis>string(.)</deixis>")
is_place("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_pubplace(<deixis>generate-id(../../../..)</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
<ss lang="pc" match="t:fileDesc/t:publicationStmt/t:pubPlace">
is_placename("<deixis>string(.)</deixis>")
is_place("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_pubplace(<deixis>concat('ref-',generate-id(..))</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="publisher">
<gi>publisher</gi>
<desc>provides the name of the organization responsible for the publication
or distribution of a bibliographic item.</desc>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:publicationStmt/t:publisher">
is_orgname("<deixis>string(.)</deixis>")
is_organization("<deixis>concat('ref-',generate-id())</deixis>")
is_publisher("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_publisher(<deixis>generate-id(../../../..)</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
<ss lang="pc" match="t:bibl/t:publisher">
is_orgname("<deixis>string(.)</deixis>")
is_organization("<deixis>concat('ref-',generate-id())</deixis>")
is_publisher("<deixis>concat('ref-',generate-id())</deixis>")
denotes("<deixis>string(.)</deixis
>", <deixis>concat('ref-',generate-id())</deixis>)
doc_publisher(<deixis>concat('ref-',generate-id(..))</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="date">
<gi>date</gi>
<desc>contains a date in any format.</desc>
<skeletons>
<ss lang="pc" match="t:fileDesc/t:publicationStmt/t:date">
doc_publicationdate(<deixis>generate-id(../../../..)</deixis
>, <deixis>string(.)</deixis>)
</ss>
<ss lang="pc" match="t:fileDesc/t:publicationStmt/t:date">
doc_publicationdate(<deixis>concat('ref-',generate-id(..))</deixis
>, <deixis>string(.)</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="sourceDesc">
<gi>sourceDesc</gi>
<rs>source description</rs>
<desc>supplies a bibliographic description of the copy text(s)
from which an electronic text was derived or generated.</desc>
<skeletons>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="bibl">
<gi>bibl</gi>
<desc>contains a loosely structured bibliographic citation of which the
sub-components may or may not be explicitly tagged.</desc>
<skeletons>
<ss lang="pc" match="t:teiHeader/t:fileDesc/t:sourceDesc/t:bibl">
is_document(<deixis>concat('ref-',generate-id())</deixis>)
doc_bibldesc(<deixis>concat('ref-',generate-id())</deixis
>, <deixis>generate-id()</deixis>)
is_transcription(<deixis>generate-id(../../../..)</deixis>)
transcribes(<deixis>generate-id(../../../../..)</deixis
>, <deixis>concat('ref-',generate-id())</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="edition">
<gi>edition</gi>
<desc>describes the particularities of one edition of a text.</desc>
<skeletons>
<ss lang="pc" match="t:bibl/t:edition">
doc_edition_desc(<deixis>concat('ref-',generate-id(..))</deixis
>, "<deixis>string(.)</deixis>")
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="biblScope">
<gi>biblScope</gi>
<desc>defines the scope of a bibliographic refeence, for example
as a list of page numbers, or a named subdivision of a larger work.</desc>
<skeletons>
<ss lang="pc">// omitting biblScope for now ... </ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="text">
<gi>text</gi>
<desc>contains a single text of any kind, whether unitary or composite,
for example a poem or drama, a collection of essays, a novel,
a dictionary, or a corpus sample.</desc>
<skeletons>
<ss match="t:TEI/t:text" lang="pc">
is_textproper(<deixis>generate-id()</deixis>)
TEIdoc_textproper(<deixis>generate-id(..)</deixis
>, <deixis>generate-id()</deixis>)
is_sequence(<deixis>concat(generate-id(),'-children')</deixis>)
text_contents(<deixis>generate-id()</deixis
>, <deixis>concat(generate-id(),'-children')</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="front">
<gi>front</gi>
<desc>contains any prefatory matter (headers, title page,
prefaces, dedications, etc.) found efore the start of a
text proper.</desc>
<skeletons>
<ss lang="pc">
seq_pos_item(<deixis>concat(generate-id(..),'-children')</deixis
>, <deixis>1 + count(preceding-sibling::*)</deixis
>, <deixis>generate-id()</deixis> )</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="head">
<gi>head</gi>
<desc>contains any heading, for example, the title of a section,
or the heading of a list or glossary.</desc>
<skeletons>
<ss match="t:front/t:head" lang="pc">
is_title("<deixis>string(.)</deixis>")
doc_title(<deixis>generate-id(ancestor::t:TEI)</deixis
>, "<deixis>string(.)</deixis>")
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="body">
<gi>body</gi>
<desc>contains the whole body of a single unitary text, excluding
any front or back matter.</desc>
<skeletons>
<ss lang="pc">
seq_pos_item(<deixis>concat(generate-id(..),'-children')</deixis
>, <deixis>1 + count(preceding-sibling::*)</deixis
>, <deixis>generate-id()</deixis> )</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="lg">
<gi>lg</gi>
<desc>contains a group of verse lines functioning as a
formal unit, e.g., a stanza, refrain, verse paragraph,
etc.
</desc>
<skeletons>
<ss lang="pc">
is_linegroup(<deixis>generate-id()</deixis>)
lg_contents(<deixis>generate-id()</deixis
>, <deixis>concat(generate-id(),'-children')</deixis>)
seq_pos_item(<deixis>concat(generate-id(..),'-children')</deixis
>, <deixis>1 + count(preceding-sibling::*)</deixis
>, <deixis>generate-id()</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
<tagDoc id="l">
<gi>l</gi>
<desc>contains a single, possibly incomplete, line of verse.</desc>
<skeletons>
<ss lang="pc">
is_verseline(<deixis>generate-id()</deixis>)
line_string(<deixis>generate-id()</deixis
>, "<deixis>string(.)</deixis>")
seq_pos_item(<deixis>concat(generate-id(..),'-children')</deixis
>, <deixis>1 + count(preceding-sibling::*)</deixis
>, <deixis>generate-id()</deixis>)
</ss>
</skeletons>
<elemDecl>...</elemDecl>
</tagDoc>
</tsd>
References
[Dubin et al. 2006] Dubin, D.,
Futrelle, J., & Plutchak, J. “Metadata Enrichment for Digital
Preservation.” Proceedings of Extreme Markup
Languages 2006 (Montréal, Canada, August 2006).
http://conferences.idealliance.org/extreme/html/2006/Dubin01/EML2006Dubin01.html
[Marcoux 2006] Marcoux, Y. “A
natural-language approach to modeling: Why is some XML so difficult to
write?” Proceedings of Extreme Markup Languages
2006 (Montréal, Canada, August 2006).
http://conferences.idealliance.org/extreme/html/2006/Marcoux01/EML2006Marcoux01.html
[Marcoux 2009] Marcoux, Y.
“Intertextual semantics generation for structured documents: a complete
implementation in XSLT.” To appear in Proceedings
of the 12th Colloque International sur le Document Electronique
(Université de Montréal, Canada, October 2009).
[Marcoux & Rizkallah 2007] Marcoux, Y. & Rizkallah, É.
“Exploring intertextual semantics: a reflection on attributes and
optionality.” Proceedings of Extreme Markup
Languages 2007 (Montréal, Canada, August 2007).
http://conferences.idealliance.org/extreme/html/2007/Marcoux01/EML2007Marcoux01.html
[Marcoux & Rizkallah 2009] Marcoux, Y. & Rizkallah, É.
“Intertextual semantics: A semantics for information design.”
Journal of the American Society for Information Science
& Technology, Volume 60, Issue 9, 2009, pp. 1895-1906.
Published Online: 21 Aug 2009.
doi:https://doi.org/10.1002/asi.21134.
[Piez 2006] Piez, W. “XSLT
for Quality Checking in the Publication Workflow.” Online presentation,
Mulberry Technologies, Inc., 2006.
http://www.mulberrytech.com/papers/XSLTforQA/
[Smedslund 2004] Smedslund, J.
Dialogues about a new psychology. Chagrin
Falls, Ohio: Taos Institute. 2004.
[Sperberg-McQueen 2005] Sperberg-McQueen, C. M. “The meaning of
OAI 2.0 Markup: An exercise in markup interpretation.”
http://www.w3.org/2004/04/em-msm/ioai.xml
[Sperberg-McQueen et al. 2002] Sperberg-McQueen, C. M., Dubin, D.,
Huitfeldt, C., & Renear, A. “Drawing inferences on the basis of
markup.” In Proceedings of Extreme Markup Languages
2002 (Montréal, Canada, August 2002), B. T. Usdin and S. R. Newcomb,
Eds.
http://conferences.idealliance.org/extreme/html/2002/CMSMcQ01/EML2002CMSMcQ01.html
[Sperberg-McQueen et al. 2009] Sperberg-McQueen, C. M., Huitfeldt,
C., & Marcoux, Y. “What is Transcription? (part 2)” In
preparation. Abstract available in Conference Abstracts
of Digital Humanities 2009 (University of Maryland, College Park,
June 2009), Claire Warwick, Ed.
http://www.mith2.umd.edu/dh09/wp-content/uploads/dh09_conferencepreceedings_final.pdf
[Sperberg-McQueen et al. 2000a] Sperberg-McQueen, C. M., Huitfeldt,
C., & Renear, A. “Meaning and Interpretation of Markup: Not as Simple
as You Think.” Proceedings of Extreme Markup
Languages 2000 (Montréal, Canada, August 2000).
[Sperberg-McQueen & Miller 2004] Sperberg-McQueen, C. M. &
Miller, E. “On mapping from colloquial XML to RDF using XSLT.”
Proceedings of Extreme Markup Languages 2004
(Montréal, Canada, August 2004).
http://conferences.idealliance.org/extreme/html/2004/Sperberg-McQueen01/EML2004Sperberg-McQueen01.html
[TEI P4] The TEI Consortium / The
Association for Computers and the Humanities (ACH); The Association for
Computational Linguistics (ACL); The Association for Literary and Linguistic
Computing (ALLC). TEI P4: Guidelines for Electronic Text
Encoding and Interchange XML-compatible edition. Ed. C. M.
Sperberg-McQueen and Lou Burnard; XML conversion by Syd Bauman, Lou Burnard,
Steven DeRose, and Sebastian Rahtz. Oxford, Providence, Charlottesville,
Bergen: TEI Consortium, December 2001.
http://www.tei-c.org/release/doc/tei-p4-doc/html/
[Wirzbicka 1992] Wierzbicka, A.
Semantics, culture, and cognition : universal human
concepts in culture-specific configurations. Oxford University
Press. 1992.
[Wittgenstein 1953] Wittgenstein, L.
Philosophical investigations. Oxford:
Blackwell. 1953.
[Wrightson 2001] Wrightson, A.
“Some Semantics for Structured Documents, Topic Maps and Topic Map
Queries.” Proceedings of Extreme Markup Languages
2001 (Montréal, Canada, August 2001).
http://conferences.idealliance.org/extreme/html/2001/Wrightson01/EML2001Wrightson01.html
[Wrightson 2005] Wrightson, A.
“Semantics of Well Formed XML as a Human and Machine Readable Language:
Why is some XML so difficult to read?” Proceedings
of Extreme Markup Languages 2005 (Montréal, Canada, August 2005).
http://conferences.idealliance.org/extreme/html/2005/Wrightson01/EML2005Wrightson01.html
×Marcoux, Y.
“Intertextual semantics generation for structured documents: a complete
implementation in XSLT.” To appear in Proceedings
of the 12th Colloque International sur le Document Electronique
(Université de Montréal, Canada, October 2009).
×Marcoux, Y. & Rizkallah, É.
“Intertextual semantics: A semantics for information design.”
Journal of the American Society for Information Science
& Technology, Volume 60, Issue 9, 2009, pp. 1895-1906.
Published Online: 21 Aug 2009.
doi:https://doi.org/10.1002/asi.21134.
×Smedslund, J.
Dialogues about a new psychology. Chagrin
Falls, Ohio: Taos Institute. 2004.
×Sperberg-McQueen, C. M., Dubin, D.,
Huitfeldt, C., & Renear, A. “Drawing inferences on the basis of
markup.” In Proceedings of Extreme Markup Languages
2002 (Montréal, Canada, August 2002), B. T. Usdin and S. R. Newcomb,
Eds.
http://conferences.idealliance.org/extreme/html/2002/CMSMcQ01/EML2002CMSMcQ01.html
×Sperberg-McQueen, C. M., Huitfeldt,
C., & Renear, A. “Meaning and Interpretation of Markup: Not as Simple
as You Think.” Proceedings of Extreme Markup
Languages 2000 (Montréal, Canada, August 2000).
×The TEI Consortium / The
Association for Computers and the Humanities (ACH); The Association for
Computational Linguistics (ACL); The Association for Literary and Linguistic
Computing (ALLC). TEI P4: Guidelines for Electronic Text
Encoding and Interchange XML-compatible edition. Ed. C. M.
Sperberg-McQueen and Lou Burnard; XML conversion by Syd Bauman, Lou Burnard,
Steven DeRose, and Sebastian Rahtz. Oxford, Providence, Charlottesville,
Bergen: TEI Consortium, December 2001.
http://www.tei-c.org/release/doc/tei-p4-doc/html/
×Wierzbicka, A.
Semantics, culture, and cognition : universal human
concepts in culture-specific configurations. Oxford University
Press. 1992.
×Wittgenstein, L.
Philosophical investigations. Oxford:
Blackwell. 1953.