Welcome to Balisage
I want to welcome you to a few days when we can focus, not on the urgent, but on the important. Not on the popular, but on the useful. Not on the cool, but on the effective. Not on the surface, but on the fundamentals. And on both the how and the why.
To those of you who are new to Balisage: welcome! I hope you will find the next few days interesting, enlightening, and fun. I hope you will push yourselves to talk with people you don’t yet know and to learn about them and their markup projects. Alumni, welcome back! I hope you are prepared to earn your listener ribbons!
Listener ribbons. I should tell you a little about these purple ribbons. Most conferences identify chairs, the conference committee, and staff — so you know who to ask when you need something. Some also beribbon first time attendees, which seems like a good idea so the old hands can seek them out and chat with them, but has never made me comfortable when I was the recipient of one. And, of course, they flag speakers. It is the speakers who make a conference; they have done a lot of work to prepare for the conference; we are all hear to listen to them. Wait … what did I just say? We are all here to listen! Even, perhaps especially, the speakers, are here to listen. If we don’t spend more time listening than speaking, we are totally wasting the opportunity … and being total bores! (While I am talking about listener ribbons I should tell you that “listener” ribbons were the suggestion of a conference speaker, who groused when given a speaker ribbon that he intended to listen a lot more that he intended to talk and he should get a listener ribbon. We didn’t give him one that year; we didn’t have them, but have been offering them at ever since.)
SGML and XML: Where we have been
So, why are we here? Why “Yet Another Markup Conference”? Isn’t this old news? Like REALLY OLD NEWS? SGML failed. Right? And XML, which was supposed to take over the way the world handles information, is SOOO over. All those grannies and shopkeepers who were going to be using XML just didn’t get the message. The lawyers in the office next to mine don’t know what XML is, and don’t care.
I remember when SGML was shiny, new, and exciting. I remember conferences opening with announcements of new SGML tools and projects. I remember when rooms full of people cheered when an organization announced that it was experimenting with SGML and demonstrations in which we created a document on one computer using one authoring tool, put the document on a floppy disk, walked it over to a difference computer and opened and edited the document using different software from a different company. Then put the edited document back onto floppy disk and had ANOTHER company’s software on a different computer format the document and print it. Wow! That was exciting. 25 or 30 people were running all over an exhibit hall to watch this silliness, several times a day!
And then came XML, which was so much easier to understand and use that it would be the lingua franca of the masses in no time! How easy was it? Instead of the great big book that was the SGML standard, it was a little pamphlet! (For the moment we won’t think about the volume of all of the specifications that have grown to support the specification in that booklet.) XML was going to let us tell the difference between blue jeans and chocolate chip cookies on the web. Remember?
XML and HTML
There were people believing that XML was intended to, or was going to, replace HTML. (Of course, there were also people who believed that HTML was a failure and that XML, being better, would naturally ease it out of its niche and take over.) The fact that XML has not displaced HTML (which was never one of the goals of the XML activity), combined with the fact that most computer users know nothing about XML, is seen by some as “proof” that XML is a failure. So, just to be clear, in my opinion, XML is not a failure; XML is such a success that it has become boring. Most computer users don’t know anything about it just in the same way that most people who ride on trains don’t know what kind of steel the train tracks are made of.
XML and Generic Markup
However, as I was writing this, I thought about centering this talk around the proposition:
XML isn’t important. It’s the generic markup that’s important, XML is just a carrier for the generically marked up information. XML is just a tool, or really just a syntax around which a toolset has been, and continues to be, developed.
Prose Documents
I do believe that … with a few caveats. I work in the world of prose documents. Documents in which the meaning and significance cannot be conveyed by a bunch of yes/no or even short-answer fields. Documents for which it is not only impossible to answer questions in the form of “What is the maximum number of occurrences of a structure” or “What is the maximum length of a structure”, we must be able to work with our documents without needing to ask these questions. For the applications in which I am interested, XML is the carrier syntax for generically marked up documents. Note however, the addition of “the applications in which I am interested”.
“The applications in which I am interested” are far from all XML applications. There are a lot of people doing important and useful things with XML that are not based on generic markup, and while I do know a little bit about what they are doing, I don’t know very much and, frankly, am not really interested. That doesn’t make what they are doing “bad” XML by any means. It makes it not what I am interested in.
Aside: Bad
XML
Aside: I often hear about XML that is not based on generic markup, or XML tag sets that are, by someone’s taste, insufficiently semantic (whatever that means to them at that moment) characterized as “bad” XML. It usually isn’t bad XML. The person making the disparaging remarks is usually conflating XML, generic markup, and semantic tagging. In the document universe I like to work in, these things often co-occur. And in that environment, generally speaking, the more tightly they are intertwined the more flexible, powerful, and useful the marked up documents are. That is, we want to mark what the parts of the documents are, not how they are to be processed for some particular use. And we prefer to identify the meaning of the content rather than the structure of our information … when we can. But we are not the XML universe.
XML as a Carrier Format
Back to my, now edited, proposition:
In the context of the types of documents that interest me, XML isn’t important. It’s the generic markup that’s important; XML is just a carrier for the generically marked up information. XML is just a tool, or really just a syntax around which a tool set has been, and continues to be, developed.
That’s a little better, but I really need to remove the “just”. There is nothing “just” about a syntax that is clearly defined, associated with well-defined syntaxes for specifying associated processing, and for which there are tools available to support creation of the marked-up documents and there are a lot of good tools available to help us do things with the marked-up documents once we have them.
Because XML is (at least for the moment) the syntax and tool set of choice for markup
and manipulation of marked up documents, we frequently take a verbal shortcut and
equate the two. As in “OH, cool, you’re using XML too! I love XML, except I have a
lot of trouble understanding when to use the <q>
tag and when to use the <quote>
tag. You know?” (For those of you who don’t know, <q>
and <quote>
are TEI tags — although tags by those names may also occur in other vocabularies.)
Or perhaps “XML is fabulous! I love being able to convert spreadsheets directly into
graphics, although the visualizations tend to look sort of plaid.” Said by a fan of
SVG with better XSLT than design skills.
So, the edited proposition is now:
In the context of the types of documents that interest me, XML isn’t important. It’s the generic markup that’s important; XML is a carrier for the generically marked up information. XML is a tool, or really a syntax around which a tool set has been, and continues to be, developed.
Aside: Eulogy for XML at the W3C
And while I am editing myself, perhaps I should re-think my proposition again: Do I really mean that XML isn’t important? Of course not! XML is enormously important! (If you haven’t read Liam Quin’s eulogy for XML work at W3C I highly recommend it: https://www.w3.org/blog/2018/07/the-world-wide-success-that-is-xml/. Note: It is not a eulogy for XML — it is a celebration of the success of XML; it is a eulogy for work on XML at the W3C. Not the same thing at all!) To me, XML is important because it provides the foundation for an ecosystem of specifications, tools, events, and document collections that is both fascinating and important in my world.
Another aside (This talk is about half asides. Does that mean it is well-structured, or does it mean it is a total mess? I don’t know.): Anyway, I have heard some XMLers (and in this case I mean users of XML, not necessarily users of generic markup) wondering how we can survive, wandering the wilderness, now that W3C has kicked XML out of the house. I would like to remind you that the W3C was an active, and at that time very rich, ongoing, and busy organization before Jon Bosak and his band of itinerant standards-makers knocked on their door and asked for a roof under which to do a little collaborative work. They were, as I recall, looking mostly for an organization in which competitors could legally work together to write a subset of SGML. And, as I recall, that was expected to be a very limited activity. And now, 22 or so years later, that “fairly short” activity is deemed to be complete. So the organization that found themselves hosting a large, sometimes fractious, activity, has said “Good-bye, we are going back to who we always were and focusing our now far more limited resources on our primary mission”.
Why Generic Markup
So, back to my edited proposition, which might now be:
In the context of the types of documents that interest me, XML is important because it is a powerful and convenient carrier for generically marked-up information.
Or, perhaps:
XML is important because, among other uses, it is a powerful and convenient carrier for generically marked-up information.
Which could leave you to ask: “So what? Why are you so fixated on generically marked up information?”
Documents, by which I mean both prose documents and other text-like material such as business documents, are generally designed and created for human readers. There are non-human users of, for example, journal articles, but the primary user of a journal article, a text book, a set of conference slides, a novel, a newspaper, or a manual is a human reader. In creating this content we work very hard to make it clear, understandable, and readable by the human. There are a lot of tools (some so popular that many people can’t imagine any alternatives) available to help the creator of such content make it look the way they want it to look to their readers. And, interestingly, those tools are generally (perhaps by now, all) XML under the covers.
Some, perhaps many, XML fanciers consider it a big win for XML when popular software is using XML. And I suppose it is, if you think XML in and of itself is important. But I don’t. I think XML is important BECAUSE it is a powerful and convenient carrier for generically marked-up information.
What is Generic Markup
What do I mean by generically marked-up information? I mean documents in which the parts of the information that matter (to the creator or expected user) are identified by what they are or what they mean, rather than how they should be processed by a particular application. I mean identifying something as a TITLE, rather than putting a “bell-R01” code at the start of that text and formatting it as medium size and bold until the next “bell” code is encountered. In this world view, even labeling it as EMPHASIZED is better than “bell-R01”. Identifying it as a TITLE, a DRUGNAME, or a PARTNUMBER is better yet.
I am not just winging it, and it is not just my opinion that:
-
identifying a DRUGNAME is significantly different from identifying a word or phrase that should be displayed in italics, or that
-
identifying the beginning and the end of a section is significantly different from identifying where there should be a bit of extra space on the page or where the display should start to be in 24-point Goudi bold, and in fact
-
identifying the beginning and the end of the phrase that should be italic is significantly different from identifying where code-page B4300-B should start.
Separate Content from Format/Processing
Identifying what the content is, and by implication, separating content from formatting or processing, is at the heart of generic markup. It is this practice that empowers many of the rich document uses on which many of us base our work. It is this separation of content from format that allows us to make many different formats of the same content without editing the content. It is this that allows us to make high quality print documents, useable on-screen documents, and accessible documents from the same content. It is this that allows us to create content with tools from one vendor, without even knowing what tools would be used to render the content, and to typeset, display, or voice the content with tools from other vendors.
It is “generic markup”, the act of tagging content by what it is, not what you are going to do with it, that powers many of the claimed virtues of XML. This is, for example, what “they” are talking about when “they” say that XML is “self-documenting”. (This is nonsense, but seems to sound good to some people.) It is also generic markup that makes XML truly platform independent.
Identifying What the Content Is
By identifying what content is rather than how it should look at a specific moment or what it should do in a specific tool, we enable it to look or act differently at different times or in different contexts. We have the ability to make well-designed print AND well-designed large print (which is not well-designed print writ large), and well-designed displays for large screens, and well-designed displays for small screens; and to support voice synthesis, and … . We have the ability to create documents, or collections of documents, on which we can do literary, historical, linguistic, cultural, and statistical analyses, and on which future researchers can do who-knows-what. By using generic markup we have made it a bit more likely that future users will be able to comprehend and process our documents. Note: Although I have heard that XML is future-proof (meaning, I think, that generically marked-up documents in XML syntax are future-proof), I have seen too many incomprehensible, or in more cases partly incomprehensible, tagged documents to buy that!
It is the design of the markup vocabulary, its documentation, and its consistent use, that give us the benefits of generic markup.
We Need to TALK about Generic Markup
To touch on a topic that was recently discussed at MarkupUK, and which will be discussed later at Balisage, we have a community problem in that while there are a lot of excellent references on XML syntax and XSLT, XSL-FO, XPath, and X-this and X-that, they (rightly) concern themselves with XML and the related specifications. A few of them mention generic markup briefly, but that is all. And that is appropriate; it is not their topic. However, there are few documents that focus on generic markup or the principles behind good generic markup, and because many of us conflate XML and generic markup, we haven’t articulated this need. There are discussions, I am thinking especially of threads on XML-Dev, on “Is it better XML to do this or that?” which are actually about design of markup vocabularies, but these tend to be limited in scope and are often quite weird.
So, I am a fan of generic markup, and a fan of XML because has enabled the creation of tools that make it practical for many people to create generically marked-up documents which are of great value in many circumstances.
Markup at Balisage
We are at Balisage: The Markup Conference. We will talk about XML, and we should. XML is at the core of our most useful tools. We will also talk about documents, the design and use of markup vocabularies, and ways in which we can create, manage, manipulate, and store marked-up documents.
While you are here, I suggest that in addition to listening to the what-I-done-good aspect of the presentations, you listen to how the various speakers approached identifying their goals, problems, and opportunities. Think about how much of a speaker’s subject matter is markup, how much is XML or some other carrier syntax, how much is the background or context of their project, and how much is something else. (Hint to speakers: If the audience concludes that a significant proportion of your talk is self-promotion or advertising your company or product, you will not be well-received at Balisage. This is an environment in which technical talks are more effective sales tools than sales talks.)
If a talk is about the markup of a document, think about where that markup falls on the continuum from formatting/processing through generic to semantic markup. That is, is it identifying what to do with the content, what it is, or what it means — or most likely some combination of all of these?
Think about the logic of how speakers selected, tested, and implemented solutions to those problems. Think about, and perhaps question, their unstated assumptions and possible blind spots. Learn as much as possible about the assumptions behind the technology as well as learning the particulars of the technology being discussed.
If you took the time and spent the money to come to Balisage you are, whether you think of yourself that way or not, a member of the Markup Congnisenti. Think about how you approach markup-related tasks. Think about how much of what you do is really ABOUT XML, and how much of is it ABOUT generic markup and happening to use XML. Make that distinction, at least in your own mind. And share what you know. Not just the tools and techniques: the thought processes behind them.