Welcome to Balisage
The first thing I have to say is “Welcome to Balisage, and welcome to Montréal.” This is my favorite week of the year, and I want to be
sure that you all enjoy it too.
Before I start talking about what’s really on my mind and what I said I was going
to talk about, which is “semantics”, I want to talk about something that should go
without saying, but two people caught me yesterday and asked me to say a few words
about expected behavior this week — as I did last year and the year before and probably
every year before that — it’s what I think of as the “Conference Mommy Moment”. I’m
talking about courtesy for everyone at this conference. As far as I’m concerned,
every one of you is my guest, and that means I expect every one of you to treat each
other as my guests. I don’t expect you to agree with each other, and in fact a large
part of what we’re here to do is to disagree. That’s a lot of the fun; we frequently
disagree on how to address problems that we agree are important. And we sometimes
disagree on what is important and why. I hope such disagreements are heartfelt, articulate,
and clear. I insist they be respectful.
This means challenging the idea, not the person. By the way, “That’s the dumbest
idea I have ever heard” is not challenging the idea. That’s challenging the person
whether the sentence seems to use the word “idea” as its subject or not. “I think
there are some important factors that you should consider that might change your mind,
for example, …” is challenging the idea.
Another aspect of treating each other with courtesy is using the microphones. Many
of you are used to speaking to large groups. Many of you believe you can be heard
well without a microphone. I am telling you that there are people in this room who
will not be able to hear you or understand you if you don’t use a microphone. If
you have anything more to say than “Yea, verily, yea,” get up out of your seat and
go stand at the microphone. (Shouting “Yea, verily, yea” from your seat will do just
fine.)
Eschewing the microphone means that people with less than perfect hearing may have
trouble understanding you, and people with relatively little English — or with an
English accent dramatically different than yours — will have a difficult time understanding
you. And it wouldn’t be a bad idea to eschew words such as “eschew” which many people
don’t understand.
I like Balisage for a lot of reasons. I like Balisage because I think of it as a playground where a group of really smart people gather
together to learn from each other. This means it is a gathering of friends, but it
is also a gathering of “might be” friends. I suggest that those of you who know
twenty or thirty people in this room consider the fact that there are probably twenty
or thirty other people in this room who might become friends if you spent time talking
with them. It is easy for those of us who know and like each other and see each other
once a year to concentrate on old friends. I’m not saying you shouldn’t do that —
I’m not saying you shouldn’t enjoy the people you know — but meet some of these other
people because it’s a good bet they’re pretty smart.
“Tell them where to meet!” First of all, there’s the riskiest thing to do — which
I highly recommend: sit down next to somebody you don’t know at lunch. What’s the
worst thing that’s going to happen? They’re going to be boring. Odds are, they won’t.
Because they’re here. And we don’t have a lot of boring people here.
However, we also have a conference office downstairs. You can tell it’s the conference
office because there’s a Balisage logo on the door. And when that door is open, come on in! We have chairs, we have
tables. We’ve also got a printer there, so stop in if you need to print a few pages.
And we’ve got a refrigerator with some water in it. So come by! This evening go
across the hall to dinner. After that, if you’re looking for a group of people to
go out with, check the office; in the early evening, people will gather there and
make groups. It’s a good place to be.
This is sort of a week-long party. I don’t actually go to parties where people have
projectors and throw their slides up on the wall, but I do go to parties where people
climb on metaphorical soaps boxes and make speeches about all sorts of things that
they care a lot about, which is one of the things we’re going to do at Balisage.
I actually go to dinner parties where things get a little weirder than that. I remember
one group — I think there were eight of us — in a very, very fancy French restaurant
in the Georgetown area of Washington where somebody pulled out a dataflow diagram
and passed it around the table. And the first person who got it, looked at it for
a minute and snickered, then passed it to the next person who laughed. It went all
the way around the table, and everybody laughed. After everybody had had a chance
to look at the joke, we discussed where the error had to be and how silly this diagram
was. It said there was one office that was going to fill with paper documents because
there were all these paper documents being printed, and they went to this one person,
and they never left. And then the guy who brought this diagram pulled a photograph
of that office out of his pocket. It was stacked to the ceiling with paper, with
little paths through it. We laughed at that, and it occurred to somebody that there
we were sitting in a restaurant drinking wine … and passing around a dataflow diagram.
Some of those people are here, and many of the rest of you would have enjoyed the
evening. It’s that kind of crowd.
I like Balisage because I learn a lot here. I learn about interesting things people are doing with
marked up content and interesting applications people are developing to create and
manipulate and display marked up content. But that isn’t surprising. It is the markup
conference, after all.
More interesting, I learn about the sorts of problems that people are paying attention
to in the first place. As I read the submissions to this conference, I read about
people using markup to solve problems I hadn’t even realized were problems. People
are doing work I never thought anybody cared about, solving problems I had not thought
about at all. That’s really interesting.
Every year I leave here knowing that I know less than I thought I did when I arrived.
I recommend that approach. I suppose that’s the reason my house is filling with books,
though now that I’m buying both print books and electronic books, I keep hoping that
will mean the rate at which I acquire physical books is going to be reduced.
I leave Balisage with a long list of things I need to learn more about: details of things I didn’t
know about, aspects of specifications and tools I hadn’t known about, big things I
need to know about, whole new specifications, capabilities, problems. That’s a lot
of what you’re going to pick here: things to go learn about.
One of the things that I keep learning about is the limitations of my knowledge and
the way that I reveal those in my use of language. I, for example, cannot have an
informed conversation with my nephew about Pokymon because I use the language incorrectly
and it is clear that I don’t know what I’m talking about. (I’m actually okay with
not being able to have an informed conversation about Pokymon with a five-year-old.)
But I’m not so okay about not being able to have what sounds like an informed conversation
about some of the things that I think matter to me, for example, semantics. (Hey,
I finally got to what I said I was going to talk about.)
The word “semantic”
I recently learned that I don’t know what the word “semantics” means. This is interesting
because I thought I did. I thought I’ve known that for a long time. Way back in
pre-history — before I had ever heard of markup when the only use I knew for pointy
brackets were mathematical expressions (some of us were not born knowing what chicken
lips are good for, you know) — I thought I knew what semantics meant. I thought semantics
was the study of what words meant, and the study of semantics focused on how the meanings
of words were created and changed and, occasionally, lost. I had more than a casual
interest in words and terms and their meanings; my first real job — after I escaped
from the educational establishment — was as a lexicographer. A lexicographer?? A
person who creates dictionaries, or in my case, controlled vocabularies for search-and-retrieval
systems. Semantics was the heart of my work. I thought I knew what semantics meant.
We were very careful to create controlled vocabularies that had a minimum of homophones
in them and a minimum of ambiguity — notice we never promised we would get rid of
either — we were very careful with selecting our terms and defining them to control
the semantics of the datasets that were going to be indexed with them. So, a long
time ago I knew what semantics meant.
And then I got involved with markup. I got there because the way we were creating
these full-text databases that we were indexing with the controlled vocabularies I
was making is that people typeset and printed this material, and the first two copies
to come from the printer were then sent immediately to be processed into the full-text
system. So we waited until we had bound books, then we cut off the bindings, and
we sent them to two different rekeying shops to be retyped. Those two files were
compared, and then we built a searchable database from them.
This was problematic. Among other things, people had already proofread it once to
make the print, and they didn’t want to do it again, besides which it took a long
time and was expensive. The full-text databases might take up to a year or year and
a half after the print was out. This was imperfect. Very imperfect.
So, we started getting involved in ways we could make both the print and the database
from the same source, and we had these pointy-bracket things that we were putting
around parts of the document. We had documents where parts of them laid out in a
grid shape. And there were two different ways that we could identify the information
we wanted to see in a grid. We could identify them by what kind of information it
was, for example, “condition, patient age, and dosage.” Or we could identify it in
rows and columns; how did you want it laid out in that grid shape? You could do a
lot more with the information if you identified it by what kind of information it
was — “condition,” “patient age,” and “dosage” — than if you had “first column,” “second
column,” “third column.” But it was a lot harder to set things so you displayed it
in the grid shape you wanted it in. You only did that for stuff that you had a lot
of and it was worth spending effort on.
I learned that when you identified the kind of information it was, we called that
semantic tagging. And when you identified what cell in the table you wanted it in,
we called that syntactic tagging.
Okay, this was related to what I thought I knew. Semantics means “what it means,”
and syntax means “what you want it to look like.” Okay, we can work with this.
And then I learned that the line between syntax and semantics is a squishy line; it
slides around. Somebody explained it to me: “If you understand how to cope with
it, it’s syntax. And if you can’t quite manage it — there’s a little magic in the
formula — it’s semantics.” (Actually, I think there may be something to that.)
I have recently learned that semantically-rich content is the key to future wealth:
precise searching, high quality retrieval, good health, great weather. Or at least
that’s what it seems like. The “semantic web,” for example, is going to solve all
of our economic, social, and perhaps even climate problems. Some of the people promising
these wonders berate me because I’ve advised clients to manage their content in ways
that are not “future friendly,” specifically in ways that are not semantic. Okay,
what do we mean by that?
Well, I’ve advised them to identify names and given names and family names and to
associate them with institutional and public identifiers; to identify institution
names, street names, drug names; to identify if a drug name is a brand name, a generic
name, or a street name. I think identifying all that stuff in a document collection
is creating a semantically-rich document collection, so why are these people so cross?
Because there’s no semantic tagging in those documents! What?? Well, there isn’t
a triple in sight. Oh, I understand. There is only one appropriate syntax for semantics
now. Whoops.
That’s just a little bit narrow-minded. I wish it were really news; it actually isn’t.
A few years ago — actually, quite a few years ago, now that I think about it — someone
asked for permission to include the proceedings of one of Balisage’s predecessor conferences in a topic map — Remember topic maps? Topic maps were
cool. — about things related to SGML (Does anybody remember SGML?) So, after all
the appropriate logistical details had been taken care of and the lawyers had all
waived their hands and blessed this activity, I provided the necessary SGML files
and instantly received back a howl of anger. What could we have been thinking??
This data was unusable. How stupid could we possibly be? Didn’t we know that the
point of creating SGML was to make it repurposable? (Actually, I had thought that
the point of creating that particular SGML was to create the proceedings for that
event, not to accommodate unknown future users and to work with products that had
not yet been invented, but perhaps not.)
So, what was the problem that caused this outrage? The keywords were hierarchically
structured; this was already problematic. Worse, the nesting was indicated with colons!
This was outrageous. It was unprofessional. It was unacceptable. It was slovenly.
It meant that they would have to preprocess the data before they could pour it into
their tool.
I smell a failure of imagination.
These sorts of disagreements on the meaning of semantic don’t actually end there,
and they didn’t end many years ago when we were talking about inconceivably badly
structured SGML files because they contained colons. My colleague, Debbie Lapeyre,
who is sitting over there, was the JATS expert in a group of people who recently wrote
a paper entitled “From Markup to Linked Data: Mapping NISO JATS version 1.0 to RDF
using the SPAR Ontologies.”
Her co-authors were astonished that, in this day and age, a respectable and respected
vocabulary such as JATS could be totally lacking in semantics. Her co-authors were
shocked to learn that in JATS, for example, the names and the definitions of the tags
embedded in the document identified what the information meant; there wasn’t a separate
ontology that identified what the tags meant.
Perhaps even more peculiar to them was the fact that structures nest and context affects
meaning. For example, the tag <article-title> in the header of the document contained
the title of the article, and the tag <article-title> in a citation in the reference
list contained the title of an article being referenced by this article. That was
very peculiar to them.
That and a variety of other oddnesses convinced them that either Debbie was from another
planet or there were no semantics in JATS. They were very kind to her. I think they
treated her as if she were an idiot savant, knowing a lot about this peculiar XML
stuff, but knowing nothing about semantics … or anything semantic.
To give them credit, they did include a section in the paper that I think properly
should have been entitled “Peering Through the Looking Glass.” They called it something
like “Philosophical Differences between RDF and XML” — which in fact they didn’t really
address — in which they described the difficulty of describing in RDF terms data that
is based on such a different philosophical point of view. I wish they were here at
Balisage because they were able to recognize that a completely alien viewpoint might not be
stupid and might provide something of interest and importance to them.
We at this conference — each of us — are going to encounter a few completely alien
viewpoints. That’s one of the things the Balisage committee looks for as we select the papers. When we get peer reviews back for Balisage papers, the papers we take instantly are the ones where some of the peer reviewers
say “Yes, that’s brilliant!” and other peer reviewers say “No, that makes no sense.”
We like those papers!! We like the alien viewpoints.
To circle back: it’s not that I don’t think ontologies are useful. I think they
can be. In some circumstances, they are very valuable indeed. And I’ll even acknowledge
that it’s possible — although in my opinion it’s far from a sure thing — that widely
shared ontologies will be of increasing value. I’m suspicious because I used to write
ontologies, and I know how good many of them aren’t. I don’t think links to ontologies
are the only way semantically rich information can and should be created or managed.
I don’t think RDF triples are the only syntax in which semantic information can be
created, managed, or stored.
I’m reminded of long passionate arguments at predecessor conferences to this one about
what kind of information should be stored in elements and what kind of information
should be stored in attributes. People cared. People got red in the face. They
banged on tables.
There were people who said all element content should be banned and all content should
be stored as attribute values. They were serious, and they thought it was important.
There were people who said attributes were nothing but syntactic sugar and should
be banned from the language. Everything should be element content; it would be much
easier to process. They were equally serious.
There were people who came up with complex formulations for what should be in which
place. My favorite one — which in some circumstances may be reasonable — was: If
the end user should see it, then it should be element content. But if you use it
to control the display or for other back room uses, it should be attribute content.
That makes sense until you realize that you can’t actually decide for all time for
any content what users will see and what they won’t because the line between data
and metadata is really fuzzy and keeps moving.
But this mattered to them. It mattered to them a lot. And then XSLT came along,
and it seemed to occur as a flash of insight to huge numbers of people simultaneously
that it was actually a pretty trivial transformation to take something that was attribute
content and make it element content, and take something that was element content and
make it an attribute value if you really cared. We can change the syntax if it isn’t
what we want for our tool at this moment.
Suddenly, that argument became moot. Fortunately, I haven’t heard it in a while.
I suggest that much, if not most, of the posturing about the appropriate syntax for
semantic information is or should be similarly moot. It’s just not important. And
if you disagree with me, I guess I can’t punch you in the nose — we’re at Balisage. And I’m not going to call you a nasty name because we’re at Balisage. So, I’m going to ask you to explain why you think it matters in a clear and coherent
fashion. And I will remind you that if you explain it in a clear and coherent and
respectful fashion, I might even change my mind and agree with you. And if your argument
is “because that’s the way my application wants to receive it,” I will suggest that
you go and buy a book on XSLT.
Which brings me back to Balisage. This is a week that seriously stretches my imagination. There are talks on the
program that are on topics in which I have absolutely no interest; I’ll listen to
some of them. Perhaps I’ll find that I should be interested. Perhaps I’ll find that
I am interested.
In general, I have found that the more I know about anything, the more interesting
it is. (Except perhaps basketball. I can’t manage to get interested in basketball
despite working with a basketball fan. You know, to me it’s just basketball.)
I suspect you will find that some of the talks at Balisage are “just basketball.” You don’t have to be interested in everything, and you don’t
have to pretend you are. But consider going to some things you don’t think you’re
interested in because you might be surprised.
This week I expect us to hear about new languages, new projects, new uses for old
languages, new names for old technologies. A colleague of mine recently told me that
all the new ideas in computing are recycled and renamed from work the engineers at
IBM did in the 1950s. “They invented everything,” said my colleague. For example,
he insisted there is no such thing as “big data” — it’s just “volume.” And all the
things they’re inventing now to deal with big data such as link lists and indexes
were invented in the 1950s by the guys at IBM.
I think he was exaggerating just a little. But he does have a point. We as a community
are doing a lot of relearning, reinventing, and recycling. I also think we are doing
some new thinking, combining some old clothes into new outfits, knitting some new
fabric — have I mixed enough metaphors in that sentence? Can I get a few more in
there if I try hard?
I have noticed the lifecycle of many languages seems to be similar, including many
of the languages we’re going to be talking about here. They grow out of a need for
an easy way to do something. A person or a group of people find that it’s unreasonably
difficult to do something that would be very easy with a tool that was designed to
support what it is they want to do. So, they design it, and some people build it,
and users say “That’s wonderful! But it also needs to do this and this and this to
meet our needs.” So they extend it. And then more users show up and start using
this wonderful tool … and say “But it also needs to do this and this and this.” And
it gets a little more flexible. “And we need it to be a little more abstract so it
has a little more flexibility.”
And before you know it, you have a turing complete language. Cool, right? This is
great. And people start using it for things the originators never imagined and that
are completely unrelated to the original mission. And at this point, two things happen.
First, a group of enthusiasts start talking about using this tool or language for
everything, eliminating the need for other languages that are older and clearly competing.
And at the same time, some people — sometimes the same people, and sometimes different
people — start talking about the need to simplify the language, removing features
that are only used by a few fringe cases … like the original point of the language.
Watch for that at Balisage this year. I think we have three or four papers that are really addressing just
that.
Yesterday I heard someone say “I know I can do anything I need if I have a turing
complete language, especially one that I like.” Well, to be more precise, it was
Michael Sperberg-McQueen, and what he said was “I know that I can do anything I need
if I have a turing complete language, especially one I like, like XSLT.” So, the
question is: How do we decide which languages to like? Or perhaps, how do we convince
other people to use the languages we like instead of the languages that they already
know and already like?
It seems to me that we at Balisage have a joint interest in markup, marked-up documents, and tools that deal with marked-up
documents. And one of the challenges that many of us face is convincing people who
have had this bizarre XML stuff thrust upon them to use tools that are markup smart.
To use the markup instead of treating an XML document as if it were a string. Not
that they can’t do everything they need to do using string processing tools — they
most assuredly can. But because that has them doing the same work two times, three
times, four times … and maybe introducing errors. When they are using our technologies
incorrectly, we tend to get very angry at them. Or we tend to feel sorry for them
because they just haven’t been enlightened.
One of the things I challenge all of you to think about is how to persuasively explain
to them that if they have markup, they ought to use it even if that means learning
a new tool or a new way of thinking.
Balisage is a place to say what you think and think about what you think and why you think
it. My brother recently asked me what I thought about genetically modified food crops.
I said I didn’t know enough about them to have an informed opinion. He said, “Most
people don’t. But what do you think?” He was fishing for an uninformed opinion.
One of the joys of Balisage is that we will have a mixture of informed and uninformed opinions, probably both
equally passionately stated. One of the things I challenge you to do is to detect
the difference, not only in other people’s informed and uninformed opinions — it really
isn’t as simple as if they agree with you, they’re informed, and if they don’t agree
with you, they’re uninformed — but also to challenge that in yourself. How many of
your opinions are informed, and how many of them are uninformed? Can you tell the
difference?
As you listen at Balisage, remember that the speaker may have a dramatically different point of view than you
do. Try to understand it. Question the premises behind their point of view and their
methods, processes, and conclusions. But start from the assumption that they are
smart people working on important problems and that they have made educated decisions
that put them where they are, even if you find that surprising. Maybe especially
if you find it surprising.
When a speaker says something that contradicts something you know to be true, do not
leap to your feet, rush to the floor microphone, and shout “You idiot!” Figure out
a polite way to ask a question that unearths the reason this person so clearly disagrees
with you. If possible, ask it in a way that convinces them to investigate the tool,
technique, or position you hold. You will be more persuasive if you are polite.
You will not agree with everything you hear. It is my opinion that any statement
that absolutely everyone agrees with is so bland there’s no point in stating it.
I recently went to a concert sponsored by the local folklore society where I live,
held in a local church. I got there early so I could get a good seat. I was sitting
with nothing to do for a few minutes, and being me, I picked up the local reading
material and opened it. What was there to read, sitting in the church? There was
the hymnal which had in its front cover the creed of this particular organization.
A full page of fairly small print in which I decided that there was not one statement
with which any human being on planet Earth could disagree. In other words, there
was no content on that page. This was what they believe — so does everybody else.
They weren’t telling me who they were.
We’re not going to have any of that sort of content-free talk at Balisage. I don’t want to aim for them; I don’t want the speakers to give them. I want the
audience to respect the fact that there will be content in the talks and therefore
there will be things we disagree with. That’s a good thing.
State your positions, describe the reasons for them, allow for the possibility that
there are things you don’t know — yeah, even you — and have a great time at Balisage.