Usdin, B. Tommie. “Explicit markup: a fool’s errand or the next big thing?” Presented at Balisage: The Markup Conference 2019, Washington, DC, July 30 - August 2, 2019. In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). https://doi.org/10.4242/BalisageVol23.Usdin01.
Balisage: The Markup Conference 2019 July 30 - August 2, 2019
Balisage Paper: Explicit markup: a fool’s errand or the next big thing?
B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy specializing
in XML and SGML. Ms. Usdin has been working with SGML since 1985 and has been a supporter
of XML since 1996. She chairs the Balisage conference. Ms. Usdin has developed DTDs, Schemas, and XML/SGML application frameworks
for applications in government and industry. Projects include reference materials
in medicine, science, engineering, and law; semiconductor documentation; historical
and archival materials. Distribution formats have included print books, magazines,
and journals, and both web- and media-based electronic publications. She is co-chair
of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group and a member of
the NISO STS Standing Committee. You can read more about her at
http://www.mulberrytech.com/people/usdin/index.html
In 1998, at a Balisage predecessor conference, Brian Reid told us we couldn’t have
the world we wanted. XML wouldn’t deliver. He used twenty-year-old slides, slides
that he had originally presented at a conference in 1981 to make his point. I still
want the world that Brian Reid told us we could not have; I still want Brian Reid
to have been wrong. I still believe that separating meaning from format will enable
our documents to be displayed in many forms and media, that a markup format that makes
hierarchy explicit makes complex documents tractable, that when content creators author
in systems that make declarative markup visible and use the author’s knowledge to
add value to their content, we will be able to make documents sing! And I have the
twenty-year-old slides to prove it.
In a recent conversation with a fellow markup enthusiast, I found myself saying Yes, I agree that making the information products they want from their content would
be easier, and the products would be better, if they authored in a way that captured
the structure and some of the meanings as they wrote. But it isn’t going to happen.
Ever. Because the average person writing a document is not thinking about the process
of writing the document, or the structure of the document, or how they might want
to use or reuse the document, they are thinking about the subject matter of the documents.
They want to use a process to create the document that is as transparent as possible.
This means that not only will they use the popular-at-the-moment authoring tool, they
will use it as thoughtlessly as possible.
I do not say this as a criticism of these people, it is simply as observation of the
way they do, and probably should, write.
That conversation brought back a conference paper I heard many years ago, from Brian
Reid.
Brian Reid
How many of you were at Markup Technologies ’98 in Chicago? At that conference we invited Brian Reid to keynote.
Aside: do you know who Brian Reid is? If you play in the markup space you probably
should.
As I recall, it took some persuasion to get him to go to Chicago and talk to a markup
conference. He said that he had spoken at the Conference on Research and Trends in Document Preparation Systems in Lausanne, Switzerland, in 1981 and said what he had to say to the SGML world.
But … he paused … he had been wrong in 1981 and would be happy to tell us why in 1998.
He found the plastic transparencies he had used as the visuals for his talk in 1981,
scanned them, and used these scans as the basis for his 1998 talk. They are on his
website to this day at: 20 years of abstract markup - Any progress? [Reid, 1998] (We’ll look at a few of them in a moment — and the whole set is the appendix to this paper.)
In 1981 he was singing the praises of what we now call declarative markup. He talked
about the goals of Scribe:
These slides would fit perfectly into the XML Basics for Text Processing class my group will be teaching in a few weeks. This is EXACTLY the song many of
us sing for a living in 2019. This talk was in 1981. He was talking about a working
tool that:
Used markup to identify parts of documents by what the information was
Separated content from format
Was platform independent
I believe that that conference in 1981 was the conference at which the SGML effort
was announced. That is, work on SGML had just begun, and Brian Reid had a working
tool and the philosophy underlying the tool, that met many of the goals of SGML and
now XML. (Do you begin to see why I think we should all know about the work this man
has done?)
Note, however, that these are Brian Reid’s 1981 slides.
He revisited the ideas in 1998:
He continued with:
Declarative markup (SGML, XML, Scribe, or some other syntaxes) was a lost cause. People
were not going to do it. People did not want to think about structure. People were
not capable of thinking about structure.
He proved this by sharing some user interface research that some well-known organization
(I don’t remember who and his slides don’t say) had done.
People were shown this image:
Then they were asked what would happen if a user clicked on the X and hit delete.
He told us that most average users expected this:
Aaaargh! PROOF that people are NOT capable of understanding declarative markup. Proof
that we are wasting our time talking about, and wishing for, success of generic-markup-based
tools.
End of Story. Game Over. Go Home!
Really. End of Story. Game over. Go home! Time to give it up as a bad investment.
Teaching Reading
Perhaps the reason I remember his paper so well is that I don’t want to agree with
him. I don’t think declarative markup is impossible, I haven’t given up, and as far
as I’m concerned the game is NOT over.
It reminds me of that time, a very long time ago, when I was a student, and I volunteered
with an adult literacy program. The organizers were mostly social workers, and most
of the teaching was done by college students; the participants were mostly men in
their 50s who were making their way in the world without the ability to read — and
generally successfully hiding that fact from their families, friends, and employers.
They were smart, hard-working, and generally affable people. It usually took some
breakdown in their lives to get them to admit that they couldn’t read, to make the
time to go to the literacy workshop, and probably most difficult, to accept help from
college students. Some had been unable to complete the tests at the end of a court-ordered
educational program, some couldn’t apply for drug rehabilitation or financial assistance,
some were shamed by children or grandchildren who asked them to read me a story. They were highly unusual; most of their peers (adults in the United States who
cannot read) lie, cheat, bluff, and hide to avoid admitting that they cannot read,
and (even if detected) will find every possible excuse not to do anything as humiliating
as going to a literacy program.
The program started with the alphabet, followed by what I think of as phonics-lite.
That is, a rough guide for how to pronounce many letters and letter combinations,
and how to look at a word and figure out what it probably sounds like and say it.
After the first day or two, we were down to about half the participants. Most of them
had either decided that they were too stupid to learn to read, or that it was too
late for them, or that they had more important things to do with their Tuesday evenings,
or that they knew what the word on the Stop sign was, and that was reading so they could already read and didn’t need this program.
We had one or two who insisted that it was all a trick and nobody could actually make
sense of those little letters on the page, and it didn’t matter anyway. If I cook good enough that you pay $2 for a bowl of my chili, why do I need to read? one asked me.
They fought English phonics like it was a monster. It is not fair, we were told, that
some letters were silent — sometimes — and that the same letter combination would
make different sounds in different contexts. They wanted reading to be like a game:
they wanted clear rules, and they wanted it to be fair. I sympathized. But it isn’t.
English may be especially challenging in the respect, but other languages have their
own challenges.
For many of our participants, heteronyms (words that are spelled the same, pronounced
differently, and have different meanings) were the last straw. (He has tear in the
eye when told to tear a page out of the book.) They couldn’t, they wouldn’t, NOBODY
could, do this. It was impossible. End of story. Game over. Go home!
And yet … there were pressures. The grandchildren who wanted stories, the judge who
had suspended a sentence on condition of completing the program, the need for independence,
the heavy burden of pretending and lying and bluffing to get by. Some of them stuck
it out.
The participants were paired with reading partners (the college students) and a book. We started, word by word, to sound out the words,
then sentences, in the book. We would figure out each word in a paragraph, then go
back and read the whole paragraph. The first few pages took days. It was excruciating — but by
the third or fourth session the participants were beginning to get comfortable with
the process. They were encouraged!
What were we reading?
Not the first readers their grandchildren were using in grade school! We were reading
Mickey Spillane detective novels. (For those of you who have missed these literary
classics, Mickey Spillane is the name under which a LOT of detective novels and short
stories were published, starting in the 1950s. They are short, trite, usually violent,
sexy but not explicit, and sexist even for the time in which they were written. They
use short words, short sentences, have a fairly small vocabulary, and are definitely
not appropriate for children.) The Big Kill starts:
It was one of those nights when the sky came down and wrapped itself around the world.
The rain clawed at the windows of the bar like an angry cat and tried to sneak in
every time some drunk lurched in the door. The place reeked of stale beer and soggy
men with enough cheap perfume thrown in to make you sick.
Two drunks with a nickel between them were arguing over what to play on the juke box
until a tomato in a dress that was too tight a year ago pushed the key that started
off something noisy and hot. …
In retrospect, I think selecting hard boiled detective fiction as the reading material for these farmers, mechanics, painters, and laborers was
an act of genius.
Each session ended with a pep talk: Look how far you’ve come, look how much you got through today, you are making real
progress. And these wonderful patient men would nod, smile, and mutter I still can’t read and don’t think I ever will.
One evening, usually about 6 or 7 sessions into the guided reading part of the program,
the team would get to a spot in the story where something exciting was just about
to happen. The bad guy was going to shoot the good guy, the detective was about to
expose the girl as a spy, or the girl was going to climb into bed with the detective.
The bomb was about to go off if our hero didn’t disarm it quickly enough, or the car
was heading for a cliff. The teacher would excuse him/her self for a moment — and
vanish for a very long time.
When we got back we asked Did he buy her a drink or shoot her? And the no-longer-totally-illiterate participant knew the answer! The would-be victim’s
mother had come to the door, the sheriff stepped out from a shadow, or … something.
The participant had stopped thinking about reading and started thinking about the
story, and suddenly was reading!
The big problem in teaching reading is that as long as you are thinking about reading
you are not, you cannot be, reading. Try it. Try reading something while concentrating
on the activity of reading. While you are thinking about reading, you are not reading;
in order to read, you have to let go of the process and focus on the content you are
reading.
Once the participants had made that leap once, it was relatively easy to get them
to do it again and again. From there all it took was a few proactive sessions reinforcing
the lesson, reading a newspaper, a few government forms, and a children’s book or
two. (Books with nonsense words were particularly challenging for new readers.) They
had been exposed to the written word their whole lives and had probably picked up
a lot of reading basics without knowing it. All they needed was help over one (huge)
logical hurdle.
Back to Declarative Markup
So, it wasn’t End of Story. Game Over. Go Home! It was time to learn a new way of
thinking and practicing that enough to be able to do it without thinking about it.
We KNOW people can do this; most (probably all) of the people in this room can read.
You can read without thinking about it, and you take that ability for granted.
The people Brian Reid was talking about were the declarative markup equivalent of
my non-readers. They were smart, they probably started as willing users of this new
computer-aided writing tool, but even if they understood the premise behind it (and
I suspect some of them did), they hadn’t internalized the concepts of generic markup.
I have spent a lot of time with XML users, or would-be XML users, who have a similar
experience. We spend a lot of time with them, learning what the parts of their documents
are, and selecting, customizing, or occasionally writing, a vocabulary appropriate
to their documents.
It is not unusual for a group of subject matter experts and professional writers,
when asked to identify the parts of one of their documents, to start talking about
the format of the document. What is this? I ask. The answer is sometimes Times 24 Bold or Head 1 or Bell 24. No, I ask, not just the beginning of the thing I circled, the whole thing.Head 1 followed by several paragraphs is the usual answer. If I push it, I can often get Head 1, paragraph, paragraph, Head 2, paragraph, paragraph. And a room full of people who don’t understand why I am being so dense.
With just a little coaching they can, or they learn in the process of doing document
analysis, to identify structures: sections, titles, lists, list items, and footnotes.
They learn to name, define, and identify in documents subject matter that is important
to them and their activity: drug name, ferrous alloy, terrestrial location, ammunition
caliber, street address, conference start date. I think of this as the equivalent
of learning the alphabet.
Like learning the alphabet, it is necessary to learn to see structures and subject
matter content in your documents. Like learning the alphabet is no place close to
sufficient to enable reading, learning how to name structures in documents is no place
close to sufficient to enable writing in a structure-based tool.
Note: this is a sufficient level of knowledge to enable tagging existing documents.
But if authors with this level of understanding are expected to produce declaratively
marked-up documents, we are expecting them to write without the markup and then go
back and add it. We are adding a time-consuming process to the act of writing. One
that they don’t see as integral to the process of writing, and that in fact is not
integral to the writing as they are doing it.
Worse than that, we are asking them to do a process that is rife with negative feedback.
It is all about errors and warnings. In many structured document editing environments,
the most positive feedback you get is silence. And sometimes it is difficult to tell
the difference between Victory; you did it,Still processing, and Ooops, application crashed, possibly because of your bewilderingly bad data.
Once the leap is made to thinking about what you are writing as the structures it
is, this becomes not just habit-forming but addictive. I have a colleague who can
no longer write with a simple text editor without screaming at it. She wants, or perhaps
needs, to identify sections, headings, lists, and such as they are created. She prefers
to write in a model-driven XML editor, but can set up word processor styles to meet
the need. She wants to identify a code block, not specify 10 point courier indented
3 m-spaces. She thinks of it as a code block, not as what it might look like. She
thinks of list items as items in a list, or in a nested list, not as starting with
a solid bullet or a hollow bullet; not about how far they are indented on the page.
Amongst the people in this room, I don’t think that is unusual. Amongst the literate
people on planet earth, it is very unusual indeed. Even amongst the people who write
using computer-based tools (don’t forget that many people still write using ink on
paper), this is a very unusual point of view.
I believe that Brian Reid was talking about people who have learned the alphabet of structured documents but who have not learned to read them. People who are sounding out one word after another, thinking about reading
instead of thinking about the content of the document, and struggling with the process.
Those people, with that level of comfort with declarative markup, will never adopt
it. They cannot.
But I don’t believe that this is Game Over! Just as those farm hands, cooks, and mechanics could make the leap from the alphabet
to sounding out words to reading, so can the authors of today make the leap from presentation
to identifying structures in existing texts to composing thoughts in structural terms.
I did it the way most of you probably did. Through exposure, and repetition, and working
with systems I was fighting tooth and nail but had to use anyway. It was actually
a fairly discouraging process, but it happened so long ago I can barely remember it.
I can completely understand someone trying to write a document refusing to use a tool
that forces them to stop thinking about their subject matter and think about something
unfamiliar in the process of trying to capture their thoughts on some topic.
Declarative Markup is A Good Thing
I believe that we, as a society, would be better off if many, perhaps most, of our
documents were encoded with declarative markup. It would make them more discoverable,
more accessible, and probably better organized and understandable.
I believe that there is information about many documents that the author knows that
would add significant value to the document if it were captured when the author creates
the document. Some of this information can be added later, usually at significant
cost. Some of it is simply unavailable once the document is separated from the author.
(As an example, it is possible for third parties to write descriptions of graphics,
but they cannot be sure that they describe the aspect of the graphic that is the most
important point the author wanted to make in using that image.)
How Do We Get There?
The success of that adult literacy program I talked about was based on three things:
Motivation
Instruction on fundamentals
Absorbing first materials
Let’s talk about these one at a time.
Motivation. There are a few, very special, circumstances, in which people have the motivation
to learn to write in a declarative way. I have worked with helicopter pilots turned
helicopter documentation specialists who learned to write in an SGML editor. There
are professional editors who work comfortably in grammar-driven editors, and technical
writers, and people who are working with very stable structured documents of many
types.
But for most writers of most documents, there is no reason to care. Even if their
documents will be published in several media, and even if they would be more discoverable
if they were better structured, that is generally hidden from the author.
They will only make the investment in learning something new if we make the reasons
clear. No, if we make the reward for overcoming the significant hurdle in the path
more than worth the effort.
I suspect that that reward will have to vary from user community to user community.
I worked with the support-analysis group of a very large organization. Decision-makers
would ask this group to research topics of interest. Sometimes they wanted a one-page
summary of options and advantages and disadvantages of each, sometimes they wanted
in-depth studies that looked like research reports or even dissertations. Sometimes
they specified that they needed the information within the next few months, but more
often they wanted it as soon as possible. Using the word-processing based systems,
there was usually about a 3-day lag between completion of the analysts’ work and presentation
of the document to the requestor. Once we brought in the markup-based application,
the analysts had a choice: they could continue to use their familiar word processor,
and the publication team would convert the content and format it in the new tools
— which would take about 2 days. Less than the old system; this was a win! But if
they used the new authoring environment and created marked-up documents, their content
would be approved by legal and formatted for delivery within 24 hours. Sometimes faster
than that. Low and behold: most of them came to the classes on how to use the new
editor, and most of them learned to use the new tool right away. By several months
in there was only one hold-out, and when it became clear that the users preferred
the other analysts because their documents were not only delivered more quickly but
were also better organized — that one asked for private coaching in the @#(*&^ new
system.
Instruction on fundamentals. Well, we have a fair amount of material for this. I don’t think most of it is appropriate
to most users. I say this as someone who has written, and taught, classes on SGML
and XML for years. We need better instructional material. I’ve seen some that was
much worse than mine, but I haven’t seen any that really impressed me. It would be
good if we figured out how to teach this stuff.
But I don’t think that is really the problem; the amount and quality of instruction
on the alphabet and pronunciation in my literacy class was almost embarrassing. That
didn’t matter.
Absorbing first materials. What we REALLY need is the functional equivalent of a Mickey Spillane novel for
beginners with declarative markup.
Not reading material, but some instant gratification application that works quickly
and gracefully if provided with well marked-up text. I don’t know what that application
is. I charge you to start thinking about it. Actually, I hope to see several, dramatically
different, applications that provide instant value for the effort of explicitly marking
up the structure and content of documents. It is these applications that might, just
might, help a significant number of document creators over that big barrier to graceful
and comfortable use of declarative markup.
With luck, the presentations and conversation here at Balisage will help one or more of us create just that barrier-breaking tool.
Appendix A. Presentation Slides from Brian Reid’s Markup Technologies ’98 Keynote Address