Balisage Paper: YAMC? Why are we here? Why are we here again?
B. Tommie Usdin
B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy specializing
in XML and SGML. Ms. Usdin has been working with SGML since 1985 and has been a supporter
of XML since 1996. She chairs the Balisage conference. Ms. Usdin has developed DTDs, Schemas, and XML/SGML application frameworks
for applications in government and industry. Projects include reference materials
in medicine, science, engineering, and law; semiconductor documentation; historical
and archival materials. Distribution formats have included print books, magazines,
and journals, and both web- and media-based electronic publications. She is co-chair
of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group and a member of
the NISO Board of Directors. You can read more about her at
http://www.mulberrytech.com/people/usdin/index.html
Copyright ©2018, Mulberry Technologies, Inc. Used with permission.
Abstract
There is nothing new about markup, or even generic markup. (I have been working with
generic markup for 40 years!) So what is there to talk about after all this time?
What are we accomplishing by gathering at Balisage: The Markup Conference? Why do some of us find events like this one valuable? What can you do to make it
valuable to you and to the others here? Not only is markup old hat, XML is 20 years
old, and some people in the outside world keep trying to tell us that its time has
passed.
Groups are still gathering to create shared markup vocabularies in order to enable
high quality information sharing. Scholars are using bespoke markup vocabularies to
enable them to focus on the works they are reading, interpreting, and writing. Trendy
end user displays are being populated by solid maintainable XML content. An ever-improving
tool set is available to users of marked-up documents. We learn from each others’
projects, tools, techniques, and experiences — and enjoy the process!
Table of Contents
- Welcome to Balisage
- SGML and XML: Where we have been
- XML and HTML
- XML and Generic Markup
-
- Prose Documents
- Aside:
Bad
XML
- XML as a Carrier Format
- Aside: Eulogy for XML at the W3C
- Why Generic Markup
- What is Generic Markup
- Separate Content from Format/Processing
- Identifying What the Content Is
- We Need to TALK about Generic Markup
- Markup at Balisage
Welcome to Balisage
I want to welcome you to a few days when we can focus, not on the urgent, but on the
important. Not on the popular, but on the useful. Not on the cool, but on the effective.
Not on the surface, but on the fundamentals. And on both the how and the why.
To those of you who are new to Balisage: welcome! I hope you will find the next few days interesting, enlightening, and fun.
I hope you will push yourselves to talk with people you don’t yet know and to learn
about them and their markup projects. Alumni, welcome back! I hope you are prepared
to earn your listener ribbons!
Listener ribbons. I should tell you a little about these purple ribbons. Most conferences
identify chairs, the conference committee, and staff — so you know who to ask when
you need something. Some also beribbon first time attendees, which seems like a good
idea so the old hands can seek them out and chat with them, but has never made me
comfortable when I was the recipient of one. And, of course, they flag speakers. It
is the speakers who make a conference; they have done a lot of work to prepare for
the conference; we are all hear to listen to them. Wait … what did I just say? We
are all here to listen! Even, perhaps especially, the speakers, are here to listen.
If we don’t spend more time listening than speaking, we are totally wasting the opportunity
… and being total bores! (While I am talking about listener ribbons I should tell
you that “listener” ribbons were the suggestion of a conference speaker, who groused
when given a speaker ribbon that he intended to listen a lot more that he intended
to talk and he should get a listener ribbon. We didn’t give him one that year; we
didn’t have them, but have been offering them at ever since.)
SGML and XML: Where we have been
So, why are we here? Why “Yet Another Markup Conference”? Isn’t this old news? Like
REALLY OLD NEWS? SGML failed. Right? And XML, which was supposed to take over the
way the world handles information, is SOOO over. All those grannies and shopkeepers
who were going to be using XML just didn’t get the message. The lawyers in the office
next to mine don’t know what XML is, and don’t care.
I remember when SGML was shiny, new, and exciting. I remember conferences opening
with announcements of new SGML tools and projects. I remember when rooms full of people
cheered when an organization announced that it was experimenting with SGML and demonstrations
in which we created a document on one computer using one authoring tool, put the document
on a floppy disk, walked it over to a difference computer and opened and edited the
document using different software from a different company. Then put the edited document
back onto floppy disk and had ANOTHER company’s software on a different computer format
the document and print it. Wow! That was exciting. 25 or 30 people were running all
over an exhibit hall to watch this silliness, several times a day!
And then came XML, which was so much easier to understand and use that it would be
the lingua franca of the masses in no time! How easy was it? Instead of the great
big book that was the SGML standard, it was a little pamphlet! (For the moment we
won’t think about the volume of all of the specifications that have grown to support
the specification in that booklet.) XML was going to let us tell the difference between
blue jeans and chocolate chip cookies on the web. Remember?
XML and Generic Markup
However, as I was writing this, I thought about centering this talk around the proposition:
XML isn’t important. It’s the generic markup that’s important, XML is just a carrier
for the generically marked up information. XML is just a tool, or really just a syntax
around which a toolset has been, and continues to be, developed.
Prose Documents
I do believe that … with a few caveats. I work in the world of prose documents. Documents
in which the meaning and significance cannot be conveyed by a bunch of yes/no or even
short-answer fields. Documents for which it is not only impossible to answer questions
in the form of “What is the maximum number of occurrences of a structure” or “What
is the maximum length of a structure”, we must be able to work with our documents
without needing to ask these questions. For the applications in which I am interested,
XML is the carrier syntax for generically marked up documents. Note however, the addition
of “the applications in which I am interested”.
“The applications in which I am interested” are far from all XML applications. There
are a lot of people doing important and useful things with XML that are not based
on generic markup, and while I do know a little bit about what they are doing, I don’t
know very much and, frankly, am not really interested. That doesn’t make what they
are doing “bad” XML by any means. It makes it not what I am interested in.
Aside: Bad
XML
Aside: I often hear about XML that is not based on generic markup, or XML tag sets
that are, by someone’s taste, insufficiently semantic (whatever that means to them
at that moment) characterized as “bad” XML. It usually isn’t bad XML. The person making
the disparaging remarks is usually conflating XML, generic markup, and semantic tagging.
In the document universe I like to work in, these things often co-occur. And in that
environment, generally speaking, the more tightly they are intertwined the more flexible,
powerful, and useful the marked up documents are. That is, we want to mark what the
parts of the documents are, not how they are to be processed for some particular use.
And we prefer to identify the meaning of the content rather than the structure of
our information … when we can. But we are not the XML universe.
XML as a Carrier Format
Back to my, now edited, proposition:
In the context of the types of documents that interest me, XML isn’t important. It’s
the generic markup that’s important; XML is just a carrier for the generically marked
up information. XML is just a tool, or really just a syntax around which a tool set
has been, and continues to be, developed.
That’s a little better, but I really need to remove the “just”. There is nothing “just”
about a syntax that is clearly defined, associated with well-defined syntaxes for
specifying associated processing, and for which there are tools available to support
creation of the marked-up documents and there are a lot of good tools available to
help us do things with the marked-up documents once we have them.
Because XML is (at least for the moment) the syntax and tool set of choice for markup
and manipulation of marked up documents, we frequently take a verbal shortcut and
equate the two. As in “OH, cool, you’re using XML too! I love XML, except I have a
lot of trouble understanding when to use the <q>
tag and when to use the <quote>
tag. You know?” (For those of you who don’t know, <q>
and <quote>
are TEI tags — although tags by those names may also occur in other vocabularies.)
Or perhaps “XML is fabulous! I love being able to convert spreadsheets directly into
graphics, although the visualizations tend to look sort of plaid.” Said by a fan of
SVG with better XSLT than design skills.
So, the edited proposition is now:
In the context of the types of documents that interest me, XML isn’t important. It’s
the generic markup that’s important; XML is a carrier for the generically marked up
information. XML is a tool, or really a syntax around which a tool set has been, and
continues to be, developed.
Aside: Eulogy for XML at the W3C
And while I am editing myself, perhaps I should re-think my proposition again: Do
I really mean that XML isn’t important? Of course not! XML is enormously important!
(If you haven’t read Liam Quin’s eulogy for XML work at W3C I highly recommend it:
https://www.w3.org/blog/2018/07/the-world-wide-success-that-is-xml/.
Note: It is not a eulogy for XML — it is a celebration of the success of XML; it is
a eulogy for work on XML at the W3C. Not the same thing at all!) To me, XML is important
because it provides the foundation for an ecosystem of specifications, tools, events,
and document collections that is both fascinating and important in my world.
Another aside (This talk is about half asides. Does that mean it is well-structured,
or does it mean it is a total mess? I don’t know.): Anyway, I have heard some XMLers
(and in this case I mean users of XML, not necessarily users of generic markup) wondering
how we can survive, wandering the wilderness, now that W3C has kicked XML out of the
house. I would like to remind you that the W3C was an active, and at that time very
rich, ongoing, and busy organization before Jon Bosak and his band of itinerant standards-makers
knocked on their door and asked for a roof under which to do a little collaborative
work. They were, as I recall, looking mostly for an organization in which competitors
could legally work together to write a subset of SGML. And, as I recall, that was
expected to be a very limited activity. And now, 22 or so years later, that “fairly
short” activity is deemed to be complete. So the organization that found themselves
hosting a large, sometimes fractious, activity, has said “Good-bye, we are going back
to who we always were and focusing our now far more limited resources on our primary
mission”.
Why Generic Markup
So, back to my edited proposition, which might now be:
In the context of the types of documents that interest me, XML is important because
it is a powerful and convenient carrier for generically marked-up information.
Or, perhaps:
XML is important because, among other uses, it is a powerful and convenient carrier
for generically marked-up information.
Which could leave you to ask: “So what? Why are you so fixated on generically marked
up information?”
Documents, by which I mean both prose documents and other text-like material such
as business documents, are generally designed and created for human readers. There
are non-human users of, for example, journal articles, but the primary user of a journal
article, a text book, a set of conference slides, a novel, a newspaper, or a manual
is a human reader. In creating this content we work very hard to make it clear, understandable,
and readable by the human. There are a lot of tools (some so popular that many people
can’t imagine any alternatives) available to help the creator of such content make
it look the way they want it to look to their readers. And, interestingly, those tools
are generally (perhaps by now, all) XML under the covers.
Some, perhaps many, XML fanciers consider it a big win for XML when popular software
is using XML. And I suppose it is, if you think XML in and of itself is important.
But I don’t. I think XML is important BECAUSE it is a powerful and convenient carrier
for generically marked-up information.
What is Generic Markup
What do I mean by generically marked-up information? I mean documents in which the
parts of the information that matter (to the creator or expected user) are identified
by what they are or what they mean, rather than how they should be processed by a
particular application. I mean identifying something as a TITLE, rather than putting
a “bell-R01” code at the start of that text and formatting it as medium size and bold
until the next “bell” code is encountered. In this world view, even labeling it as
EMPHASIZED is better than “bell-R01”. Identifying it as a TITLE, a DRUGNAME, or a
PARTNUMBER is better yet.
I am not just winging it, and it is not just my opinion that:
-
identifying a DRUGNAME is significantly different from identifying a word or phrase
that should be displayed in italics, or that
-
identifying the beginning and the end of a section is significantly different from
identifying where there should be a bit of extra space on the page or where the display
should start to be in 24-point Goudi bold, and in fact
-
identifying the beginning and the end of the phrase that should be italic is significantly
different from identifying where code-page B4300-B should start.
Separate Content from Format/Processing
Identifying what the content is, and by implication, separating content from formatting
or processing, is at the heart of generic markup. It is this practice that empowers
many of the rich document uses on which many of us base our work. It is this separation
of content from format that allows us to make many different formats of the same content
without editing the content. It is this that allows us to make high quality print
documents, useable on-screen documents, and accessible documents from the same content.
It is this that allows us to create content with tools from one vendor, without even
knowing what tools would be used to render the content, and to typeset, display, or
voice the content with tools from other vendors.
It is “generic markup”, the act of tagging content by what it is, not what you are
going to do with it, that powers many of the claimed virtues of XML. This is, for
example, what “they” are talking about when “they” say that XML is “self-documenting”.
(This is nonsense, but seems to sound good to some people.) It is also generic markup
that makes XML truly platform independent.
Identifying What the Content Is
By identifying what content is rather than how it should look at a specific moment
or what it should do in a specific tool, we enable it to look or act differently at
different times or in different contexts. We have the ability to make well-designed
print AND well-designed large print (which is not well-designed print writ large),
and well-designed displays for large screens, and well-designed displays for small
screens; and to support voice synthesis, and … . We have the ability to create documents,
or collections of documents, on which we can do literary, historical, linguistic,
cultural, and statistical analyses, and on which future researchers can do who-knows-what.
By using generic markup we have made it a bit more likely that future users will be
able to comprehend and process our documents. Note: Although I have heard that XML
is future-proof (meaning, I think, that generically marked-up documents in XML syntax
are future-proof), I have seen too many incomprehensible, or in more cases partly
incomprehensible, tagged documents to buy that!
It is the design of the markup vocabulary, its documentation, and its consistent use,
that give us the benefits of generic markup.
We Need to TALK about Generic Markup
To touch on a topic that was recently discussed at MarkupUK, and which will be discussed
later at Balisage, we have a community problem in that while there are a lot of excellent
references on XML syntax and XSLT, XSL-FO, XPath, and X-this and X-that, they (rightly)
concern themselves with XML and the related specifications. A few of them mention
generic markup briefly, but that is all. And that is appropriate; it is not their
topic. However, there are few documents that focus on generic markup or the principles
behind good generic markup, and because many of us conflate XML and generic markup,
we haven’t articulated this need. There are discussions, I am thinking especially
of threads on XML-Dev, on “Is it better XML to do this or that?” which are actually
about design of markup vocabularies, but these tend to be limited in scope and are
often quite weird.
So, I am a fan of generic markup, and a fan of XML because has enabled the creation
of tools that make it practical for many people to create generically marked-up documents
which are of great value in many circumstances.
Markup at Balisage
We are at Balisage: The Markup Conference. We will talk about XML, and we should. XML is at the core of our most useful tools.
We will also talk about documents, the design and use of markup vocabularies, and
ways in which we can create, manage, manipulate, and store marked-up documents.
While you are here, I suggest that in addition to listening to the what-I-done-good
aspect of the presentations, you listen to how the various speakers approached identifying
their goals, problems, and opportunities. Think about how much of a speaker’s subject
matter is markup, how much is XML or some other carrier syntax, how much is the background
or context of their project, and how much is something else. (Hint to speakers: If
the audience concludes that a significant proportion of your talk is self-promotion
or advertising your company or product, you will not be well-received at Balisage.
This is an environment in which technical talks are more effective sales tools than
sales talks.)
If a talk is about the markup of a document, think about where that markup falls on
the continuum from formatting/processing through generic to semantic markup. That
is, is it identifying what to do with the content, what it is, or what it means —
or most likely some combination of all of these?
Think about the logic of how speakers selected, tested, and implemented solutions
to those problems. Think about, and perhaps question, their unstated assumptions and
possible blind spots. Learn as much as possible about the assumptions behind the technology
as well as learning the particulars of the technology being discussed.
If you took the time and spent the money to come to Balisage you are, whether you think of yourself that way or not, a member of the Markup Congnisenti.
Think about how you approach markup-related tasks. Think about how much of what you
do is really ABOUT XML, and how much of is it ABOUT generic markup and happening to
use XML. Make that distinction, at least in your own mind. And share what you know.
Not just the tools and techniques: the thought processes behind them.