How to cite this paper
Sperberg-McQueen, C. M. “Calling things by their true names: Descriptive markup and the search for a perfect
language.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). https://doi.org/10.4242/BalisageVol15.Sperberg-McQueen02.
Balisage: The Markup Conference 2015
August 11 - 14, 2015
Balisage Paper: Calling things by their true names
Descriptive markup and the search for a perfect language
C. M. Sperberg-McQueen
Founder and principal
Black Mesa Technologies LLC
Technische Universität Darmstadt
C. M. Sperberg-McQueen is currently (summer semester 2015) a
visiting professor in the department of linguistics and literature at
the Technical University of Darmstadt, teaching in the program for
digital humanities. He is also the founder and principal of Black Mesa
Technologies, a consultancy specializing in helping memory
institutions improve the long term preservation of and access to the
information for which they are responsible.
He served as editor in chief of the TEI Guidelines from 1988 to
2000, and has also served as co-editor of the World Wide Web
Consortium’s XML 1.0 and XML Schema 1.1 specifications.
Copyright © 2015 by the author. Used with permission.
Abstract
One of Leibniz’s many projects for improving the world
involved the construction of an encyclopedia which would lay out the
body of existing knowledge and enable the systematic development of
more. Ideally, the encyclopedia should be formulated in a
philosophical language and written in a real
character (a set of symbols, a universal character set,
whose symbols denote not the words of a natural language but the
objects of the real world). Properly constructed, the real character
would enable a calculus of reasoning:
a set of mechanical rules for logical inference. We may smile at
Leibniz’s idealism; few modern minds can share his optimism that
we can reduce all complex concepts to uniquely determined combinations
of primitive atomic ideas. But there is a reason Leibniz’s ideas
continue to inspire modern readers, and many of the same ideals
motivate some of our best work in markup languages.
Some of you will have heard of, and some of you may have read,
Umberto Eco’s book The Search for the
Perfect Language [Eco 1993]. I read the book some time ago (I think in
1998), because it sounded like a book about SGML and XML. And, of
course, it is. Because everything is about SGML and XML because SGML
and XML are about everything and if SGML and XML are not attempts at
creating perfect languages I’d like to know what they are. I’d
like to talk, today, about some aspects of descriptive markup and
their relation to the idea of a perfect language.
There’s a certain satisfaction in thinking that the work
we do is following in the footsteps of people like Leibniz and other
who have worked on the perfection of language. Leibniz as it happens
is the last person (in the western tradition at least) whom
philosophers still take seriously as a philosopher, who took seriously
the notion of developing a perfect language. So we’re also, as
Eco’s account makes clear, following in the footsteps of a lot
of crazy idealists like L.L. Zamenhof, the inventor of Esperanto, and
Giuseppe Peano, who in addition to being a great mathematician and the
creator of the Peano Axioms, which are at the foundation of modern
mathematics and which play a crucial role in
Gödel’s Incompleteness Proof among other
things. Peano published the Peano Axioms in a book written in Latin,
because he was trying to encourage the use of Latin as an
international scientific language. That was a few years before he
decided that no one wants to use Latin because no one outside Italy
actually studies Latin hard enough in school and concluded that the
right thing to do was to make Latin easier to write. So he proposed
Latino sine Flexione, that is to say,
Latin without any inflections, declensions, or conjugations —
Latin that makes any classicist who sees it weep.
But let’s focus on Leibniz. I would rather think
of myself as following in the footsteps of Leibniz than as following
in the footsteps of the amusing but eccentric collection of characters
who follow Leibniz in Eco’s account of perfect
languages. Leibniz’ perfect language is always really perfect, in part because it’s
always a project, and never a product. When he writes about it he is
almost always seeking support to work out the details — he never
actually got the funding to do it — so his vision remains
unrealized and perfect in the way that our plans are always perfect,
even when our implementations aren’t. His basic idea is that we
need two things.
First, we want what Leibniz calls a character realis or characteristica realis. It’s hard to know
exactly what he meant by either of these phrases because the
translators — both the translators into English and the
translators into any other modern language — render these
phrases unhelpfully as real character
or real characteristic (or the
etymological equivalents in other modern languages). That is, they
shift the word forms from Latin to English (or French, or German), but
otherwise leave his Latin alone. If we ask What does that
mean?
— it clearly doesn’t mean characteristic in
the modern English sense — the translator essentially looks at
us and says, You get to figure that out because I
couldn’t; that’s why I left it alone.
The phrase real character
sometimes appears to denote a system of categories, an analysis of
thought of composite ideas into simpler ideas. And sometimes it seems
to mean a writing system, a symbol system, an ideography for writing
down references to such ideas. So as a whole the real character seems
to be some kind of combination of what we would call an ontology and
an ideographic script. Leibniz mentions explicitly the notion,
widespread in the 17th and early 18th centuries, that Chinese is a
model for the kind of thing he is talking about.
As it happens, Chinese is not a model after all, because Chinese
is not an ideographic script; like every other known human writing
system, it is a logographic script. And it’s clear that Leibniz
did not want a logographic script. With logographic scripts, of
course, as you can guess from the name perhaps, the basic idea is that
you have something in writing and say What does that bit of
writing mean?
pointing at it, the answer is That
represents the following word or sequence of words: ...
Writing systems represent linguistic objects, and then the linguistic
object may denote real things, may talk about reality. But the mapping
from the written word to things goes by way of an intermediate stop in
whatever natural language you are writing in.
In Leibniz’ day, Chinese was often thought to be purely
ideographic because the same script is used all through China, by
people whose natural languages are not in fact mutually intelligible.
Chinese characters are also used to write Japanese and Korean
within limits (very strict limits that essentially defeat the idea
that there is anything relating to ideography here). But the Chinese
script is not in fact ideographic, but logographic.
But there are examples of ideographic scripts. Mathematical
notation is an ideographic script. This is sometimes doubted, because
mathematical expressions can be verbalized. And it is true that we all
learn how to read mathematical notations aloud. But x with a
superscript 2 following it means x
squared or x to the power
2, however we care to verbalize it. It represents an idea,
not a particular form of English words. Germans learn how to verbalize
it, Anglophones learn how to verbalize it, French speakers learn how
to verbalize it, everybody who learns mathematics learns how to
verbalize it, but here the relationship of script to language and idea
are quite different: the written symbol directly denotes the thing
(or, as Leibniz would say, the res).
So mathematical notation is an example of a real character, or an
ideography. Chemical notation can similarly be seen as
ideographic.
So ideography is not really an impossible idea. Leibniz thought
it would be extremely useful to have one. He thought it would be
useful in part because of the second thing we need, for a perfect language:
a perfect language would enable systematic reasoning to be more
reliable. It would, Leibniz suggested, provide a
calculus ratiocinator, a thinking calculus,
or a mechanical technique for thinking.
He argued if we did things right, we could put controversy
behind us because once you have translated an argument into this
universal character, checking the reasoning of the argument becomes a
purely mechanical task, like checking a complicated arithmetic
calculation. So that to resolve any controversy, all the one disputant
has to do is invite the other to a calculating table and say,
Let us calculate.
Nowadays, we might say (and sometimes
do say) Let’s do the math
which comes down to
something very similar for a particular class of questions.
Why Leibniz thought that a real character as he envisaged it
would support reasoning can best be understood if we take a brief
digression through the history of logic. At Leibniz’s time, the
state of the art for formal logic was Aristotle’s syllogistic.
The art of the syllogism had been cultivated and further developed
through the Middle Ages, and scholastic philosophy had refined
Aristotle somewhat, but the approach remained essentially
Aristotelian. By Leibniz’s time, syllogistic was really rather
old-fashioned. So Leibniz was swimming against fashion in regarding it
as important. He regarded it in fact (rightly) as one of the greatest
monuments of human thought, and as the basis for systematic
reasoning.
Let us consider a simple example syllogism; many of you
will have heard this one. We can reason thus:
We have three sentences here: a major premiss, a minor premiss, a
conclusion. Each sentence takes the form of a subject and a predicate.
In XML, we might write it this way:
-
<subject>
All Greeks</subject>
<predicate>
are mortal</predicate>
.
-
<subject>
Socrates</subject>
<predicate>
is a Greek</predicate>
.
-
(Therefore) <subject>
Socrates</subject>
<predicate>
is mortal</predicate>
.
Syllogisms can take other forms and operate in other modes. Some
forms and modes lead to valid conclusions, and others don’t. And
the theory of syllogistic is an elaborate account which assigns to
modes names like Barbara
and Baralipton
and Celarent
, where the vowels e
,
a
, i
, and o
denote
various properties of the premises (affirmative or negative, universal
or particular). In its late medieval and early modern flowering,
syllogistic is extremely elaborate and hard to understand.
Leibniz, being among other things a good mathematician and
creator of abstractions, reduced the theory of syllogism to a few
basic principles and, from those basic principles, developed the
entire body of syllogistic theory as it was known in his day. And his
perpetual summary is that the predicate is
present in the subject. This is a little difficult to
understand because our way of thinking about it tends to want to go
the other way. We tend to regard logical statements as extensional
statements. And when we say All Greeks are mortal,
what
we are saying is: The set of Greeks is a subset of the set of mortal
things. So we would be inclined to say the (set denoted by the)
subject is included in the (set denoted by the) predicate, and
extensionally that’s true. And in about half of Leibniz’s
sketches of logical calculi he uses an extensional approach, and
certain problems get solved.
But he keeps coming back in the other half of his sketches on
logical calculi to an intensional (with an s
) account
of logic in which, when we say Greeks
we mean
all of those things which have the following set of properties:
...
. And when we’re talking about mortal
beings
we’re talking about a set of things that have
another set of properties. And if all Greeks are
mortal,
what that means is the set of properties that
distinguish being mortal is a subset of the set of the properties that
define being Greek. So (on an intensional view) the predicate is
always (implicitly) present in the subject. And even though he kept flirting with extensional
interpretations of logic, Leibniz stuck by that fundamentally
intensional formulation.
Leibniz’s proposal for a perfect writing system for a real
character starts from the following observation. We have some ideas
that we regard as composite (as made up of other ideas), so if we
analyze those ideas, we have simpler ideas. And if we continue this
process of analyzing composite ideas into simpler components, we
either find an infinite regress (which would seem to suggest that
it’s impossible to talk to each another), or we hit bottom with
a set of atomic ideas. Leibniz observed we do talk to each another; we
have the perception that we are succeeding in communicating. And
therefore, without actually proving it, he assumed that we don’t
actually have an infinite regress, but we will reach bottom, so that
we can analyze human thought into a set of atomic ideas. He
didn’t think this would be easy; that was why he was asking for
money. And he thought it might take a generation of work by
Europe’s academies of science to develop an analysis of human
thought into atomic ideas that would suffice for scholarly work. And
he addressed in advance the concern that is already forming itself in
many of your heads: Well, wait. What if we assume these atoms, and
then later we re-analyze things, and we don’t accept those
concepts as atomic anymore? He worked out to his own satisfaction that
we don’t have to wait for a perfect analysis to make some
progress; we can go on, as it were, provisionally. Leibniz’s
notion of reasoning also included a lot more on probabilistic
reasoning than is common in modern symbolic logic, but that’s
another story.
Just as we can combine atomic ideas into composite ideas, and
then combine those composite ideas into further composite ideas of
still more complexity, so Leibniz’s conception for the real
character was that the symbols for atomic ideas should be combined to
form symbols for composite ideas, just as the atomic ideas themselves
are combined. So the symbol for any composite idea would be
constructed from (and thus automatically contain) all the symbols for
its component ideas. One could thus see at a glance, looking at the
symbol for any concept, what its fundamental constituents were. That
means that checking whether the premiss All Greeks are
mortal
is true would be a simple question of checking to see
whether the symbols that make up the defining features of
mortal
are present in the the symbol denoting
Greeks
. We’ll come back to this a little
later.
This notion that the way we write things down should exhibit
their structure may feel very familiar to you. It feels to me like
precisely what we try to do when we develop SGML and XML vocabularies.
Now the developers of SGML did not, as far as I know, think that they
were imposing on their users the task of creating an ontology or of
analyzing human thought into atomic bits. But they were trying to make
it possible to develop vocabularies that allowed the reuse of
data.
Now, why is it that data for a particular application cannot be
reused by another application? In the cases we are familiar with
— in the cases that are suitable for treatment with XML —
markup does three things. Among the things markup does, it identifies
the constituent parts of the document. And by naming things with
generic identifiers, it allows us to say that for our purposes, these
two things are the same, because they have the same name —
they’re both <p>
, for example — and
these two things are different, because they do not have the same name
— one is <p>
and one is
<ul>
or <list>
or
<section>
or what-have-you.
Some names are the same as other names, and some names are different
from other names. Those are the two absolutely fundamental services
that generic identifiers offer us.
And the reason that data intended for one application cannot
always be reused for another application is that many applications
ignore some of the distinctions important for other applications. If
the data are tagged for typographic processing, for example, and if we
say (as we often do) for the purposes of display or print rendering
some things which are different will be treated the same way —
perhaps we will italicize not only technical terms but also foreign
words and emphatic words and subtitles — then those distinctions
are not needed for typography. And the reason we can’t reuse
typesetting tapes to do all of the things we might want to do with
documents is that we don’t always want to treat those things as
the same. And if you ask yourself, how do we know what things we will
always want to treat the same way and never want to treat differently,
the answer is we will always want to treat things that we think of as
the same in the same way. We may choose to take things that are
different and treat them in the same way, but unless we are engaged in
a kind of experimental typography in which technical terms are italic
on odd-numbered pages and bold on even-numbered pages unless they are
powers of two, then we will always want technical terms, if
that’s one of our basic, primitive categories, to be treated the
same way.
This is such an obvious point that it can take a while to work
out its implications (though in the 1980s and 1990s it didn’t
take very long for people to start acting as if it were obvious). What
that boils down to eventually is that if you want really good
descriptive markup, you need to say what things are. What do you think this is?
This led many SGML and XML practitioners to a beautiful sense of
freedom and power: it’s like being Adam and getting to name all
of the animals. That’s incidentally another reason that XML and
SGML have always felt to me as though it were a search for a perfect
language, because the Adamic language was for many centuries regarded
as quite obviously the most perfect of all languages, if we could only
recover it. Eco is very good on that part of the history. And even now
when, for reasons I’ll come to shortly (I hope), we don’t
always aim quite as high as Leibniz was aiming, we are interested in
using markup to exhibit the structure of things, to visualize things,
to make diagrams of thing, to make it easier to look at particular
aspects of things. Liam Quin’s paper on diagramming XML and
Priscilla Walmsley’s paper on detecting and visualizing
differences between complex schema, both give good examples of that
continuing interest [Quin 2015, Walmsley 2015]. The elevator pitches Tommie Usdin recommended
to us in her opening talk seem also to require that we identify the
essentials of our story, as a way of making things clear to the people
we are talking to [Usdin 2015].
Leibniz was not the first person to worry about these things.
And Leibniz was able to be confident that from a set of atomic ideas
we can generate all of the huge variety of concepts that we use in
natural language in part because he had read Ramón Llull.
Ramón Llull was a 13th century Majorcan minor nobleman who led
— as so many 13th century saints seem to have done — a
life of dissolution and sin until he had a conversion experience,
joined a friary, and became a theologian. And in a vision on the side
of a mountain God revealed to him the method in which he could find
the truth and convert the heathens. Llull’s method, which he
refined in book after book after book over a period of decades, with
variations over time, basically involves identifying what, for
purposes of my exposition, I will call fundamental concepts and
assigning symbols to them and combining the symbols to make other
concepts. So, in the final exposition of his great art, the work
called the Brief Art (or ars brevis,
although it’s not actually all that brief), Llull uses the
alphabet consisting of the letters B C D E F G H I K. (The letter A is
left out because being the first letter of the alphabet it should
denote the divinity and the unity of all of those things. B through K
are aspects of A, as it were.) And he assigns to each letter a
principle: goodness, magnitude, eternity or duration, power, wisdom,
will, strength, truth, glory. And then to each symbol he assigns a
relation: difference, agreement, contrariety,
beginnings,
betweenness,
end, being greater, being equal, being less than. Then, to each symbol
he further assigns a question, and a subject, and a virtue, and a vice.
Once this infrastructure of symbols and meanings is set up,
from a simple combination of letters like BCD
, Llull can
create essentially all of the sentences that we can create by
combining the words goodness
,
difference
,
whether?
,
God
,
justice
, or
avarice
, with
magnitude
,
agreement
,
which?
,
angel
,
wisdom
, or
gluttony
,
and
eternity
(or duration
),
contrariety
,
of what?
,
heaven
,
fortitude
, or
lust
.
It’s easy to see that the number of sentences we can form is
very large, no matter how exactly we imagine the rules for
sentence construction to be set up (Llull is rather vague
on this topic, by modern standards); the ability to
produce a large number of combinations from a small number
of basic constituents is precisely what is meant by the
phrase combinatorial explosion
.
Now, to be honest, it’s not entirely clear exactly what
Llull thought he was doing. He gives his treatises titles like
Method of Finding the Truth so that
it sounds like he thinks his method is a way of deciding questions and
learning things, proving true facts about the world. By analogy with
modern systems like context-free grammars, from which a mechanical
process can generate all possible grammatical sentences, and no
non-grammatical sentences, we may be tempted (that is, we are
certainly tempted) to believe the Llull is creating a system which
will generate true sentences and only true sentences. But it’s not
quite so simple. Every now and then Llull remarks that applying the
prescribed procedure to a particular configuration of symbols will
result in a sentence which is false, or suggest an inference which is
invalid. The user of Llull’s technique, that is, must understand which
sentences to accept as true and which to reject as false. This leads
Eco and others to say that Llull and the user of Llull’s technique are
only getting out of the system precisely what they already
know. Eco writes that Llull’s methods
do not generate fresh questions, nor do they furnish new
proofs. They generate instead standard answers to an already
established set of questions. In principle the art only furnishes
1,680 different ways of answering a single question whose answer is
already known. It cannot, in consequence, really be considered a
logical instrument at all.
Eco considers Llull’s treatment of the question
Whether the
world is eternal
and concludes:
At this point, everything depends on definitions, rules, and a
certain rhetorical legerdemain in interpreting the letters. Working
from the chamber BCDT (and assuming as a premise that goodness is so
great as to be eternal), Lull deduces that if the world were eternal,
it would also be eternally good, and, consequently, there would be no
evil. But
, he remarks evil does exist in the
world as we know by experience. Consequently we must conclude that
the world is not eternal.
This negative conclusion, however,
is not derived from the logical form of the quadruple (which has, in
effect, no real logical form at all), but is merely based on an
observation drawn from experience.
I’m not entirely certain that Lull’s method is quite
as vacuous as Eco suggests. I think it may be possible to view it as
a kind of heuristic. You want to solve a certain problem, you can
think about it in these ways, and the combinatorics will show you new
ways to think about it. Maybe there is an algorithm for providing all
possible combinations of ideas so that when you think about it, you
will say, Oh, wait, that one will help.
Modern books on heuristics are not much different. If you read
Polya’s book How to Solve It
[Polya 1945], he will not tell you exactly how to solve
your problem. He gives you hints about ways to think about it that may
help. But you are responsible for recognizing that this one will help,
or at least trying it and seeing if it helps. Polya does not generate
fresh questions, nor furnish new proofs. He helps the reader find ways
to think about your problem which may enable the
reader to formulate relevant questions which put the
problem in a new light, and which may, if all goes well, lead to fresh
proofs. If we do not find Llull’s method as helpful as
Polya’s, it may merely be that we are not as interested in
theology as Llull was (or that we are more interested in geometry and
the other branches of mathematics Polya talks about).
Leibniz had other predecessors. The first Secretary of the
Royal Society of Great Britian, John Wilkins, wrote an enormous book
called An Essay towards a Real Character and a
Philosophical Language [Wilkins 1668].
There’s that phrase, real
character again. Now, most people, if they have heard the
name John Wilkins at all, know the name from a short piece by Borges
in which Borges says Wilkins reminds him of a certain Chinese
encyclopedia [Borges 1981]. (Some people have thought
that this Chinese encyclopedia is a real Chinese encyclopedia.
It’s not; Borges made it up.) In this encyclopedia, the
Celestial Emporium of Benevolent
Knowledge,
it is written that animals are divided into: (a) those that
belong to the Emperor, (b) embalmed ones, (c) those that are trained,
(d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs,
(h) those that are included in this classification, (i) those that
tremble as if they were mad, (j) innumerable ones, (k) those drawn
with a very fine camel’s-hair brush, (l) others, (m) those that
have just broken a flower vase, (n) those that resemble flies from a
distance.
This piece became famous in part because Michel Foucault
read it and laughed so hard that he decided to call the entire history
of western philosophy into question. Perhaps what we think looks as
ridiculous from outside as this classification looks to us.
Borges tells us that he has never actually seen Wilkins’
book, because even the national library of Argentina lacked a copy.
We, on the other hand, can read Wilkins because the book has been
scanned as part of the Early English Books Online project. There are
scans available on the web, and it has been transcribed by the Text
Creation Partnership, so that there is even a TEI-encoded version
publicly available.
When you read Wilkins, instead of just the parody of Wilkins in
Borges, I expect that many of you will have the same reaction I did,
which is Well, no, he’s not crazy at all.
Wilkins’s work reads like a very complicated spec that involves
a lot of serious work and a number of unavoidable compromises.
(Perhaps Wilkins should be regarded as the world’s first Working
Group editor. Except that he had, essentially, a Working Group of
one.) The experience of reading Wilkins is not unlike the experience
of reading, say, any proposal for a top-level ontology written by
people in artificial intelligence or in the semantic web. Actually, it
is slightly different: I feel more sympathetic towards Wilkins;
I’m not quite sure why.
Ontologies in the sense of AI and the semantic web are also a
continuation of Leibniz’s concerns, a continuation that for all
of his hundreds of pages and hundreds of bibliography entries Umberto
Eco doesn’t talk about. But they show us that the notion of
perfect languages is alive and not dead after all. The idea of perfect
languages has, however, been split in two. People developing
ontologies don’t normally expect to make them into languages or
make them components of languages. They are there to enable reasoning,
but not necessarily to capture arbitrary utterances.
The other branch of modern work that descends from
Leibniz’s concerns is, of course, further work on the
systemization and automatization of reasoning: logic. One of the
creators of modern logic, Gottlob Frege, explicitly identified his
goal as the creation of a language in the spirit of Leibniz [Frege 1879]. Now, to my great astonishment, he did not regard
himself as creating what Leibniz called a calculus
ratiocinator, or thinking calculus. He thought he was
creating a universal character, and his belief is a source of
continuing puzzlement to me, both because Frege makes such a sharp
(and value-laden) distinction between the two, and because, if one
does want to make the distinction, Frege’s work looks very much
more like a thinking calculus (it is, after all, a system for logical
inference) than like a language or set of atomic ideas (since for all
non-logical concepts Frege has recourse to conventional mathemetical
notation). Perhaps Leibniz was not, after all, the last person
philosophers take seriously as a philosopher to try to build or want
us to build a perfect language; maybe that was Frege.
Now, when they hear talk about identifying the atomic units of
human thought and defining things explicitly so that we can reason
about them, a lot of people get nervous, because surely that amounts
to an attempt to banish ambiguity and vagueness, and make everything
purely regular. And it might. But in fact, one of the great (and
occasionally surprising) things about modern logic is that it has far
more capacity (or at least tolerance) for vagueness and
underspecification than we sometimes give it credit for. At the heart
of this mystery is the fact the modern logic is developed without any
fixed vocabulary: it is, if you will, Leibniz’s
calculus ratiocinator without his
characteristica realis. The only thing modern
logicians say about vocabulary is that, yes, there are identifiers;
they mean whatever they mean — which is to say, they mean what
the person using them says they mean. The actual
interpretation, that is formally speaking the
mapping from identifiers to objects in the domain of discourse, is
completely outside of scope for formal logic. Half of the books on
formal logic I have on my shelf don’t actually talk about the
structure of an interpretation; they just say That’s out
of scope.
Modern logic says, in effect, You have these ideas. You
can reason about them this way.
What that means is you can
make them as vague or as underspecified as you need. So the kind of
ambiguity and vagueness that Yves Marcoux was talking about as being
essential parts of the formalization of his application domain [Marcoux 2015] — that’s consistent with modern
logic. It’s not actually a contradiction of Leibniz’s
goal. It is possible to have logic that follows, if you will accept
the metaphor, the cowpath of human thought rather than imposing a sort
of rectangular system of paved sidewalks.
Another aspect may be worth mentioning. Many of the attempts at
perfect languages that Eco talks about really will work only if they
are universally successful. They depend crucially on the network
effect to have a reason for being. If everybody learns Esperanto, then
anybody can talk to anybody else in Esperanto, and we will never,
any of us, need to learn a third language. We’ll
have our native language, we’ll have Esperanto, and that will
suffice. And in the long run, anyone who has ever compared an N-to-N
translation problem with a translation into a interlingua and then
back out (which gives you a two times N translation problem) will know
it would be better — the overall cost to society would be much
lower — if everybody would learn an intermediate language. But
such a choice would require the same willingness to ignore the short
term in favor of the long term that Sam Hunting was talking about the
other day [Durusau and Hunting 2015]. The long-term payback only
accrues if the entities involved have survived through the short term.
And so a lot of entities really want short-term return, and if
you’re given the choice between learning a language that would
be useful if and only if everybody else in the world learns it and
learning a language which is useful now because a lot of people in the
world already know it, then you will learn Chinese or English or
whatever the lingua franca is in your geographic region. Maybe you
will learn Esperanto for other reasons. But if you’re learning
it because you hope to use it as a universal language, you will
probably be disappointed for a few more centuries.
Jean Paoli, who was one of the co-editors of the XML spec and
who performed the signal service of persuading the product groups
within Microsoft to support XML, had a very straightforward way of
saying this, which I call the Paoli Principle
: If
someone invests five cents of effort in learning your technology, they
want a nickel’s return, and they want it in the first 15
minutes. If they do see an advantage within 15 minutes, then okay.
Maybe they will persevere with your new technology. If they
don’t see a return within 15 minutes, you’ve probably lost
them.
Now, many of us have struggled with managers with 15-minute
attention spans, and many of us probably think the world would be a
better place if they had longer attention spans (say, at least 30
minutes). But people are the way they are, and if we want to persuade
them, we need to pick them up where they are rather than demanding
that they change.
Another way in which what we do is sometimes different from what
Leibniz was talking about is that we have learned that vocabularies
are often a lot simpler when they do not attempt absolutely universal
coverage, so we get simplification efforts like the one Joe
Wicentowski was talking about the other day [Wicentowski and Meier 2015]. I think it is probably a common experience
within the room that really complicated schemas that attempt to
blanket an entire application domain tend to be really, really big and
really, really hard to learn, and to spawn simplification efforts left
and right. So, we often straddle this divide; we create those big
schemas, but then we also create partial schemas because partial
schemas are easier to understand, easier to use, and easier to teach.
And as long as they suffice for a particular sub-application area,
they’re extremely useful. We don’t place the burden of
supporting all of scholarship on every vocabulary that we write, only
on a few of them.
Another difference, at least as of this conference. Some of us
will say, Wait, not everything needs to be explicit.
David Birnbaum taught us that sometimes things don’t have to be
explicit [Birnbaum and Thorsen 2015]. Even when
they’re clearly relevant, we may get by without
tagging them, without making them explicit. I still have to think
about that, because I’ve always thought the purpose of markup is
to make things explicit, and it does make things explicit. David has
now pointed out that it does not follow that we must use markup to
make explicit representations of everything we wish to think about. We
may be able to get by without such explicit representations, and if we
are worrying about return on investment, the resulting reduction of
effort may make all the difference.
And we don’t normally actually reduce all of the concepts
in our vocabularies to atomic thoughts. Some of us think it would be
really interesting as an intellectual exercise, and possibly as a tool
in documentation, to say what atomic ideas go together to make up the
notion of chapter
, say, but in practice the public
vocabularies we use don’t actually
define those atomic ideas. And they don’t need to. All we say is
we’re going to need chapters; we’re going to need
paragraphs.
They have some things in common; they are
different in other ways.
Mostly we are happy that we have been successful over the last
decades describing concepts like those purely in natural language,
without trying to identify their atomic constituents. Partly
that’s laziness — sorry, intelligent use of resources.
Partly, however, it’s that for whatever reason — possibly
because we actually are sitting in working groups, some of us —
we no longer share Leibniz’s faith that every time we analyze a
composite idea into its constituent atoms, we will get the same
result. Leibniz used the analogy of prime and composite numbers, and
some of you will remember that a proposition called the Prime Number
Theorem tells us that every number has a unique decomposition into
primes. Every time we factor the number 728, we will get the same
decomposition into primes, and there is only one such decomposition.
Can any of us believe that everybody who decomposes the concept of
chapter or section into its constituent parts will get the
same answer every time they do it? I don’t believe it, and our
practice tells me that none of us believe it. So, leaving some of
those things inexplicit is one of the ways we achieve agreement and
inter-communicability.
The biggest difference, though, between what we do and what
Leibniz wanted to do is that the entire field of markup since before
ISO 8879 was a work item is founded on saying no
to the
idea that we will have a single language. The Gencode Committee,
formed by the Graphics Communication Association in the late 1960s,
was chartered, as I understand it (I wasn’t there), to design a
set of generic codes that everyone could use. And I don’t know
how many meetings they had before they said, No.
And,
like many a Working Group after them, they rebelled against their charter and
said, We’re not going to do that. We’re going to do
something better; we’re going to do something different.
The Gencode Committee escaped to the meta level. They said, We
will define a meta language that allows you to define the tags you
want. (Then we do not have to endure the hours of disagreement that
are necessary to reach agreement on whether to call it chapter or section or div.)
So, maybe we’re not actually following in the footsteps of
Leibniz. We don’t seem to aim for languages that exhaustively
categorize the atomic units of our thought, or that absolutely anyone
can use without change for their own purposes. Sometimes we don’t even
aim for vocabularies that make explicit all of the features of our
texts, even the ones we think are relevant.
And yet sometimes when we struggle long enough with a particular
problem in document analysis and modeling, we achieve solutions that
just feel right. And that is an exhilarating experience. That
exhilaration is a lot like the feeling offered by some poetry, which
is perhaps appropriate. When I took a course in the writing of poetry
as an undergraduate, the instructor told told us that, in her view,
poetry is calling things by their true names.
When we design our systems — our languages and their
supporting software — some of what’s needed is technique,
and some of what’s needed is inspiration. From other
people’s work, we can improve our own technique, and from other
people’s examples, we can often draw inspiration. With luck,
Balisage this year has provided you both, with tips on technique and
inspiration for your own work. Thank you for coming.
References
[Borges 1981]
Borges, Jorge Luis.
The Analytical Language of John Wilkins,
tr. Ruth L. C. Simms.
In Borges: A Reader, ed. E. R. Monegal and
A. Reid.
New York: Dutton, 1981, pp. 141-143.
(Frequently reprinted.)
[Birnbaum and Thorsen 2015] Birnbaum, David J., and Elise Thorsen. Markup and
meter: Using XML tools to teach a computer to think about
versification.
Presented at Balisage: The Markup Conference
2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Birnbaum01.
[Couturat 1901]
Couturat, Louis.
La logique de Leibniz,
d’après des document inédits.
Paris: Felix Alcan, 1901.
On the Web in
Gallica: Bibliothèque numerique
and at
archive.org.
[Durusau and Hunting 2015] Durusau, Patrick, and Sam Hunting. Spreadsheets - 90+
million End User Programmers With No Comment Tracking or Version
Control.
Presented at Balisage: The Markup Conference 2015,
Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Durusau01.
[Eco 1993]
Eco, Umberto.
La ricerca della lingua perfetta nella cultura europea.
Bari: Laterza, 1993.
English translation by James Fentress as
The search for the perfect language.
Oxford: Blackwell, 1995; paperback London: HarperCollins, 1997.
[Frege 1879]
Frege, Gottlob.
Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens.
Halle: Louis Nebert, 1879.
Reprinted since by a variety of publishers.
On the Web in
Gallica: Bibliothèque numerique.
[Leibniz 1982a]
Leibniz, Gottfried Wilhelm.
Generales inquisitiones de analysi notionum et veritatum.
Allgemeine Untersuchungen über die Analyse der Begriffe und Wahrheiten.
Hsg., übers. und mit einem Kommentar versehen von
Franz Schupp.
Lateinish — Deutsch.
Hamburg: Felix Meiner, 1982.
Philosophische Bibliothek Band 338.
[Leibniz 1982b]
Leibniz, G. W.
New essays on human understanding.
Translated and edited by Peter Remnant
and Jonathan Bennett.
Abridged Edition.
Cambridge: CUP, 1982.
[Leibniz 1996]
[Leibniz, Gottfried Wilhelm.]
Leibniz.
Ausgewählt und vorgestellt von
Thomas Leinkauf.
München: Diederichs, 1996.
[Leibniz 2000]
Leibniz, Gottfried Wilhelm.
Die Grundlagen des logischen Kalküls.
Hsg., übers. und mit einem Kommentar versehen von
Franz Schupp,
unter der Mitarbeit von Stephanie Weber.
Lateinish — Deutsch.
Hamburg: Felix Meiner, 2000.
Philosophische Bibliothek Band 525.
[Llull 1993]
[Llull, Ramon].
Ars brevis
.
In
Doctor Illuminatus: A Ramon Llull reader.
Ed. and tr. by
Anthony Bonner.
Princeton: Princeton University Press, 1993,
pp. 289-364.
[Llull 1999]
Lullus, Raimundus.
Ars brevis.
Übers., mit einer Einführung hsg. von
Alexander Fidora.
Lateinisch — deutsch.
Hamburg: Felix Meiner, 1999.
Philosophische Bibliothek Band 518.
[Marcoux 2015] Marcoux, Yves.
Applying intertextual semantics to Cyberjustice: Many reality
checks for the price of one.
Presented at Balisage: The Markup
Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Marcoux01.
[Peano 1889]
Peano, Ioseph.
Arithmetics principia nova methodo exposita.
Romae, Florentiae: Bocca, 1889.
[Polya 1945]
Polya, G.
How to solve it: A new aspect of mathematical method.
Princeton: Princeton University Press, 1945; second
edition Garden City, NY: Doubleday Anchor Books, 1957.
[Quin 2015] Quin, Liam R. E.
Diagramming XML: Exploring Concepts, Constraints and
Affordances.
Presented at Balisage: The Markup Conference
2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Quin01.
[Russell 1900]
Russell, Bertrand.
A critical exposition of the philosophy of Leibniz,
with an appendix of leading passages.
London: George Allen & Unwin, 1900; new edition 1937, rpt.
several times since.
[Sampson 1985]
Sampson, Geoffrey.
Chapter 2, Theoretical preliminaries
in his
Writing systems: a linguistic introduction.
Stanford, California: Stanford University Press, 1985,
pp. 26-45.
[Usdin 2015] Usdin, B. Tommie. The art of the elevator pitch.
Presented
at Balisage: The Markup Conference 2015, Washington, DC, August 11 -
14, 2015. In Proceedings of Balisage: The Markup
Conference 2015. Balisage Series on Markup Technologies,
vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Usdin01.
[Walmsley 2015] Walmsley,
Priscilla. Comparing and diffing XML schemas.
Presented
at Balisage: The Markup Conference 2015, Washington, DC, August 11 -
14, 2015. In Proceedings of Balisage: The Markup
Conference 2015. Balisage Series on Markup Technologies,
vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Walmsley01.
[Wicentowski and Meier 2015] Wicentowski, Joseph C., and Wolfgang Meier. Publishing
TEI documents with TEI Simple: A case study at the U.S. Department of
State’s Office of the Historian.
Presented at Balisage: The
Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In
Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Wicentowski01.
[Wilkins 1668]
Wilkins, John.
An essay towards a real character,
and a philosophical language.
London: Printed for Sa. Gellibrand, and for John Martyn, 1668.
Scanned pages are available from multiple sources in the Web, including:
Early English Books Online,
Bayerische Staatsbibliothek,
Google Books, and
second copy (Munich) at Google Books.
The TEI encoding made by the EEBO Text Creation Partnership mentioned in the text
is at
the EEBO TCP site.
×
Borges, Jorge Luis.
The Analytical Language of John Wilkins,
tr. Ruth L. C. Simms.
In Borges: A Reader, ed. E. R. Monegal and
A. Reid.
New York: Dutton, 1981, pp. 141-143.
(Frequently reprinted.)
×Birnbaum, David J., and Elise Thorsen. Markup and
meter: Using XML tools to teach a computer to think about
versification.
Presented at Balisage: The Markup Conference
2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Birnbaum01.
×Durusau, Patrick, and Sam Hunting. Spreadsheets - 90+
million End User Programmers With No Comment Tracking or Version
Control.
Presented at Balisage: The Markup Conference 2015,
Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Durusau01.
×
Eco, Umberto.
La ricerca della lingua perfetta nella cultura europea.
Bari: Laterza, 1993.
English translation by James Fentress as
The search for the perfect language.
Oxford: Blackwell, 1995; paperback London: HarperCollins, 1997.
×
Frege, Gottlob.
Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens.
Halle: Louis Nebert, 1879.
Reprinted since by a variety of publishers.
On the Web in
Gallica: Bibliothèque numerique.
×
Leibniz, Gottfried Wilhelm.
Generales inquisitiones de analysi notionum et veritatum.
Allgemeine Untersuchungen über die Analyse der Begriffe und Wahrheiten.
Hsg., übers. und mit einem Kommentar versehen von
Franz Schupp.
Lateinish — Deutsch.
Hamburg: Felix Meiner, 1982.
Philosophische Bibliothek Band 338.
×
Leibniz, G. W.
New essays on human understanding.
Translated and edited by Peter Remnant
and Jonathan Bennett.
Abridged Edition.
Cambridge: CUP, 1982.
×
[Leibniz, Gottfried Wilhelm.]
Leibniz.
Ausgewählt und vorgestellt von
Thomas Leinkauf.
München: Diederichs, 1996.
×
Leibniz, Gottfried Wilhelm.
Die Grundlagen des logischen Kalküls.
Hsg., übers. und mit einem Kommentar versehen von
Franz Schupp,
unter der Mitarbeit von Stephanie Weber.
Lateinish — Deutsch.
Hamburg: Felix Meiner, 2000.
Philosophische Bibliothek Band 525.
×
[Llull, Ramon].
Ars brevis
.
In
Doctor Illuminatus: A Ramon Llull reader.
Ed. and tr. by
Anthony Bonner.
Princeton: Princeton University Press, 1993,
pp. 289-364.
×
Lullus, Raimundus.
Ars brevis.
Übers., mit einer Einführung hsg. von
Alexander Fidora.
Lateinisch — deutsch.
Hamburg: Felix Meiner, 1999.
Philosophische Bibliothek Band 518.
×Marcoux, Yves.
Applying intertextual semantics to Cyberjustice: Many reality
checks for the price of one.
Presented at Balisage: The Markup
Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Marcoux01.
×
Peano, Ioseph.
Arithmetics principia nova methodo exposita.
Romae, Florentiae: Bocca, 1889.
×
Polya, G.
How to solve it: A new aspect of mathematical method.
Princeton: Princeton University Press, 1945; second
edition Garden City, NY: Doubleday Anchor Books, 1957.
×Quin, Liam R. E.
Diagramming XML: Exploring Concepts, Constraints and
Affordances.
Presented at Balisage: The Markup Conference
2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Quin01.
×
Russell, Bertrand.
A critical exposition of the philosophy of Leibniz,
with an appendix of leading passages.
London: George Allen & Unwin, 1900; new edition 1937, rpt.
several times since.
×
Sampson, Geoffrey.
Chapter 2, Theoretical preliminaries
in his
Writing systems: a linguistic introduction.
Stanford, California: Stanford University Press, 1985,
pp. 26-45.
×Usdin, B. Tommie. The art of the elevator pitch.
Presented
at Balisage: The Markup Conference 2015, Washington, DC, August 11 -
14, 2015. In Proceedings of Balisage: The Markup
Conference 2015. Balisage Series on Markup Technologies,
vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Usdin01.
×Walmsley,
Priscilla. Comparing and diffing XML schemas.
Presented
at Balisage: The Markup Conference 2015, Washington, DC, August 11 -
14, 2015. In Proceedings of Balisage: The Markup
Conference 2015. Balisage Series on Markup Technologies,
vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Walmsley01.
×Wicentowski, Joseph C., and Wolfgang Meier. Publishing
TEI documents with TEI Simple: A case study at the U.S. Department of
State’s Office of the Historian.
Presented at Balisage: The
Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In
Proceedings of Balisage: The Markup Conference
2015. Balisage Series on Markup Technologies, vol. 15
(2015). doi:https://doi.org/10.4242/BalisageVol15.Wicentowski01.