Nordström, Ari. “In Defence of Style Guides.” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Nordstrom01.
Balisage: The Markup Conference 2018 July 31 - August 3, 2018
Balisage Paper: In Defence of Style Guides
Ari Nordström
Ari Nordström is a Senior XML Specialist (fancy speak for "markup geek") at Karnov
Group, a Scandinavian legal publisher. He is based in Göteborg, Sweden, but has been
known to provide angled brackets across a number of borders over the years.
Ari is the proud owner and head projectionist of Western Sweden's last functioning
35/70mm cinema, situated in his garage, which should explain why he once wrote a paper
on automating commercial cinemas using XML.
This is a paper about that twilight zone beyond schemas, the place where style
guides, those arcane instructions to authors about house style and how to produce
content that is not only valid but stylistically consistent, are supposed to kick
in
but, these days, increasingly don't. It's a paper written in defence of style
guides, why they are needed, and why better tools cannot replace them.
This is a paper about that twilight zone beyond schemas, the place where style guides,
those arcane instructions to authors about house style and how to produce content
that
is not only valid but stylistically consistent, are supposed to kick in but, these
days,
increasingly don't.
It's a soapbox paper, basically, a result of years of irritation, agitation, and
random shouting.
Some Examples
Allow me to begin with a couple of examples to illustrate where I come from.
Here is a favourite pastime of Brits and Swedes:
This, of course, is a queue. For us markup folks, it's basically a list. The semantics
are clear, right?
But let's have a look at a more scary example:
This is McDonald's on Fleet Street in London during lunch hour. There are a number
of
cash registers but, for the most part, no clear queues and no help in sight. People
are
literally all over the place. What are the semantics here? How do you get your food?
Clearly, there should be queues, but none is readily apparent[1].
Let's do a more markup-centric example:
<para>Here are my favourite films:
<list>
<item>Close Encounters of the Third Kind</item>
<item>2001</item>
<item>Amadeus</item>
</list>
</para>
This, a list construction recognisable from schemas such as DocBook, has, on the
surface of it, clear semantics. There's an introductory sentence and a couple of list
items. A lot clearer than the McDonald's chaos, above, right? There is a problem,
though.
If your world is like mine, there are a couple of usual suspects when it comes to
what
are known as block-level elements. Paragraphs, notes,
admonishments, tables... and lists. And on the surface of it, this would qualify as
a
list, except I've always instinctively read the DocBook-style lists as inline because
they are inside a paragraph. Something to be presented like
this:
Here are my favourite films: Close Encounters of the Third Kind, 2001, and
Amadeus.
Here's how it's usually presented, though, with everything on block level:
Here are my favourite films:
* Close Encounters of the Third Kind
* 2001
* Amadeus
Looking at the schema, how does one know what is actually meant here?
In practice, since DocBook and others allow lists both (seemingly) inline and on block
level, I've had plenty of authors write
<para>
Intro to list:
<list>...</list>
</para>
But just as many write
<para>Intro to list:</para>
<list>...</list>
And many write lists in both ways, frequently in the same document, seemingly
oblivious to the difference, or the pain they cause me.
That intro text is part of the list, of course; if you remove the list
during some processing, that processing should remove the intro, too. To illustrate
this
problem:
<para>Here are my favourite films:</para>
<list>
<item>Close Encounters of the Third Kind</item>
<item>2001</item>
<item>Amadeus</item>
</list>
Here, the introductory para clearly belongs to the list if you bother to read the
contents, but isn't actually part of it. The quick fix in a schema would be to add
a
para to the list model and use that:
<!ELEMENT list (para, item+)>
(Yes, required. I hate lists without neither motivation nor explanation[2].)
In real life, there are any number of reasons to want to use a list but not introduce
it with a para, so making the para optional is a prudent first modelling step. In
a lot
of content, though, people like to precede the items with a title rather than a para,
or
a title and a para, and possibly other elements, all of them part
of the list group. Adding all of those to the content model (and making most of it
optional) results in a large model:
<!ELEMENT list (title?, (para|note|admonishment|figure)*, item+)>
Chances are that if your list model looks like this, then quite a few of your other
block-level elements will, too — they'll be complex because you need to cover all
the
use cases. For the author, though, this will increase the risk for markup errors,
with
content ending up in the wrong place, or simply cause a (mostly) unused model. Or
both.
The intended meaning behind the model is far from clear, even though the literal
semantics may be.
Or, to take a different kind of example, have a look at this ATTLIST
depicting the allowed attributes of a list in legal commentary:
Most importantly, there is a type attribubte offering 13 (!) different
list types. There's probably[3] no way for you to know what's going on merely by reading the DTD. In fact,
even if you decide to study their use by looking at actual documents, you'd probably
still miss the point (note the two ordered list types, number and
lower-alpha):
<core:para-grp>
<core:desig value="17">17.</core:desig>
<core:title>General financial arrangements.</core:title>
<core:para>The following are to be paid out ...:</core:para>
<core:list type="number">
<core:listitem>
<core:para>contributory benefit...;</core:para>
</core:listitem>
<core:listitem>
<core:para>guardian’s allowance...;</core:para>
</core:listitem>
...
</core:list>
<core:para>The following are to be paid out ...:</core:para>
<core:list type="lower-alpha">
<core:listitem>
<core:para>any administrative expenses of the Secretary of State...</core:para>
</core:listitem>
...
</core:list>
...
</core:para-grp>
At a quick glance, this might suggest that an ordered list type
is all you need, and that the other types happened because someone thought they would
be
pretty. It's what I, rather lazily, assumed at first.
Not so. The different types are there because in a single unit of the law, what is
known as a paragraph, you are not allowed to use the same type of list
more than once. If you think about it, it makes perfect sense; if you refer to the
second item of a list in a (law) paragraph, the reader will only find the right item
if
the list type used is unique within that paragraph.
Nowhere in the schema is any of this apparent, however, and there was no style guide
available to me.
List types in legal documents are easy to misunderstand, especially if you don't use
them daily and there's no documentation to guide you. Some authors have enough
difficulties understanding the difference between the different types to begin
with.
Which is why it is not uncommon to see a step-by-step instruction that looks like
this:
Follow these steps:
* Do this.
* Then do this.
* Also do this.
This, of course, is simply bad form stemming from inadequate understanding of
semantics. A bulleted list is an unordered list, which is pretty much the
opposite of a step-by-step instruction. The former is a list of
things where the order is of no importance, while the latter is a set of instructions
where order (presumably) matters a lot.
Why Does This Happen?
The question we need to answer first is why does this happen? And
to answer that, we need to define exactly what it is that happens.
Think about the list intro, above, the one that grew to an overcomplicated mess. This
tends to happen because the sources lack consistency[4]. For example, ordered lists are used as procedures and vice versa, and to
cover all the use cases, the schema grows unnecessarily[5] big to allow for cases that should have been identified as edge cases to
begin with. Or, in cases where an existing schema is expanded with new models, the
requirements process is the result of a lack of understanding in the
style the content should follow.
Note
Also, sometimes duplication or near duplication of an existing model happens when
a schema is updated, again because the style that the content should follow is
poorly understood or the sources were inconsistent and poorly modelled to begin
with.
The irony, of course, is that the content resulting from the shiny new (or updated)
schema will rarely or never need everything the schema offers, so all those models
remain either inconsistently used (with one document using one model and another a
different one) or not used at all.
On the other hand, an overly complex model might actually be correct but poorly
understood by its users. Think of those 13 different list types (or rather,
formats; to me, type implies semantics). It's
all too easy to dismiss most of those types, again because the intended style
of the content is poorly understood.
The documentation there is will probably not tell you enough; most schema
documentation I've seen is half auto-generated, the other half not up to date. None
of
it explains the use cases (sometimes because there is not enough room, more often
because the information analysis that resulted in the model wasn't properly
documented.
Looking Pretty
The legal lists above are seemingly about being pretty, but as we saw, their
reasoning was actually far more than that. Maybe it's because so many schemas do
this sort of thing:
Us markup folks see this sort of thing so often that we become jaded. Yes, you
haven't bothered about the semantics here (is it a GUI object? a spare part? an
important word?), just provided the author the means to have the text look pretty.
We see it and lazily assume that formatting in markup is either about lazy modelling
or no actual semantics was needed[6]. The opposite is sometimes true, as seen in the ATTLIST
example, above, but how are we to know without enough information?
I have no problems with using this sort of thing, mind; sometimes it's what you
need. What I don't accept is just letting it all out there. When you say bold
italic, what do you mean? And pretty doesn't count.
It's about consistency. If you do this now, do what you've done before, and what
your co-workers have done before. But, if you've all done it before, what do you
actually mean?
And is it too much to ask that you document what you mean?
What To Do About It
Some of the practical-minded and result-oriented markup folks will now be saying
things like add Schematron rules! Add Schematron Quick
Fixes!
This is true but not nearly enough, in my ever-so-humble opinion. By themselves,
Schematron rules are merely painkillers.
Schematron rules check for patterns, relying on XPath expressions to match a pattern
and offer appropriate messages. Sometimes, these are merely informational, sometimes
they warn against a practice or report an error a schema either can't or shouldn't
warn
about. Sounds useful, right?
But where should the patterns come from? Why do they happen to begin with? Some
developers will now reiterate the last paragraph, emphasising the parts about a schema
being unable to check for condition A or warn against error B. Yes, but what a
Schematron should really check for is adherence to a house style.
In olden days, this style was described in a style guide, and so
that's what you need to look for.
So, what should we really do about the mess outlined in the previous sections? Locate
the style guide, see what it says, and act accordingly. And if there is no style guide,
then write one![7]
Um, What Is A Style Guide?
When I started writing this paper, the conclusion was to use a style guide, and
that's pretty much it. Maybe a little sugar on top — tools such as Schematrons — but
essentially, the paper concluded with use a style guide,
without any explanation of what a style guide is.
So, what is a style guide?
Think of it as a poet's schema. There are rules, such as how many section levels
to use, or how to describe a procedure, including things like what a single step is
and what kinds of things warrant a procedure. But a style guide will also explain
how to write[8] — passive vs active voice, gerunds in headings, that sort of thing — and
what to include in a certain document type. And once upon a time, it would list
explain what an index needs to contain — today, of course, people increasingly
equate indices with search engines, which is just not the same, but search boxes is
what we have, rather than indices.
There was a time when most technical writing departments had a style guide
detailing how their documentation was written, but these days, style guides tend to
only be used by newspapers (although this practice is also disappearing) and
publishers. The reasons, I imagine, are much the same as with indices — for some
reason, the thinking is that just as search engines can replace indices, schemas can
replace style guides. Ugh.
Style Guide Examples
In a former life, I worked as an editor (as opposed to author; see section “Roles”) of a global
telecommunications company. Among other things, I was responsible for editing
and updating their Style Guide[9]. The company produced most of their documents in unstructured
FrameMaker format, but with well-defined paragraph and character formats, a
style guide, and an actual editor — me! — to enforce the content styles[10]. That's a subject for a different paper, or perhaps my memoirs.
Suffice to say that the content produced at the time was more consistent than a
lot of the XML content I see these days, and it was easy to convert to SGML when
the time came.
I do want to highlight some of the instructions in that long-forgotten book,
though, as I feel it still illustrates my points rather well[11]. For example, here's a screenshot from a section that deals with
ordered lists:
Note how ordered sublists should avoided if at all possible. This was about
keeping ordered lists simple enough to process and fit onto a low-res screen
(this was in the 90s), among other things.
Procedures (not to be confused with ordered lists) had different style
instructions:
There were a lot of different procedures in the documentation, and they all
had their own style. The following example is rather long, but should illustrate
how style ties into structure:
As should be apparent above, there is an overlap between the style guide and
the structure, which worked quite well for FrameMaker-based content. Also, when
the time came, the SGML DTD did complement the style guide quite well.
Today, some twenty years after the fact, the style guide is all but forgotten,
and the editors have all left.
Similar Models, No Way To Share
To illustrate how important style is, let me tell you another story. Some
years after the demise of the style guide at the big telecom company, above, I
was tasked with creating an XML production DTD, an exchange format that would
allow two car manufacturers to exchange service information. The two were
already sharing a lot of the hardware; both manufacturers shared platforms,
engines, gearboxes and more to make many of their car models.
The production DTD itself was easy enough to create. There were a couple of
differences in the respective DTDs¸but most differences were about trivialities
like cardinality and different element names, and so the production DTD that
resulted was a superset of the respective DTDs used by each manufacturer. I
think that DTD took me a few days to do, all in all.
But looking at the actual contents from the respective manufacturer, it became
clear that sharing information would be a lot trickier than sharing hardware.
Manufacturer A used a text-based approach to write their service information,
adding a few illustrations where necessary. Manufacturer B, however, used a
comic-book approach — very little or no text, but at least one image per
step.
This was not a modelling problem at all, this was purely a style problem, and
neither side would give up their way of producing content. They never did share
their service information with each other.
Neither manufacturer used a style guide, and it certainly never occurred to
them to ask the other how they wrote their information.
DTDs were sent back and forth during early decision-making[12], but that was about it.
Lists Revisited
So, to return to the list problems that followed the queues, here's a style
guide excerpt that addresses my list examples (drawn from memory; I don't have
the actual pages):
Always introduce a list with a paragraph that
explains what is listed. The introductory paragraph is
not a title; rather, it is a qualifier, giving the
list its proper context. It, just as the list, is an integral part of the
text flow, and should, just as the list, be written to fit the surrounding
text.
Never use an ordered list when you are writing a
procedure (and don't even consider writing it using an unordered
list).
Never insert a list or its introductory text inside a
paragraph unless you intend to present your list inline.
...
That last bit I added here and now; my style guide did not
discuss markup.
OK, So Where (How) Do I Get One?
If you don't have use a style guide but have a lot of XML, plus some schemas and
schematrons, chances are that your documents are inconsistent and would need that
style guide. Is it too late?
Ideally, I think a style guide should be the first result of the information
analysis that will later lead to the schema(s) when starting out with structured
information. This, of course, may not be possible, so I'd settle for the next best
thing: do a new information analysis by looking at the current XML sources and the
Schematron schemas, figure out what the problems are — I'm guessing looking at the
more common Schematron errors will point you in the right direction — and then
having a think about what the content should look like, in terms of style. Define
a
desired house style, in other words. Once there — and this is just as iterative a
process as writing a schema — you should formalise your findings in a style
guide.
This will result in better semantics and more consistent content. Chances are that
you'll be able to tighten the schema(s) and get rid of unused models while improving
the ones you keep. This will help you create better, more focussed, Schematron rules
and achieve a separation of concerns — let the schema enforce the structure and the
Schematron suggest a style defined in the style guide.
Yes, I do think it's worth your while.
Roles
Authors are opinionated people. They care very much about their content, and they
all have very definitive ideas about what makes it good. This, sometimes, can be
bad, because when allowed to do what they want, the documents will differ from one
another; the reader, will suffer.
This is why publishers used to have editors.
Some years ago, before the true state of things was readily apparent to me, I
innocently asked a client of mine if they had editors. Yes, they had a whole
department of them, why? It took me a few moments to realise that they were talking
about authors. Writers. They had no editors, and hadn't had them for years. That's
why they moved to structured information, right?
An editor, of course, is the person who makes sure that everyone follows the style
guide, is the final arbiter of all things style, and frequently the one who edits
the style guide[13].
So, does it make sense to have an editor on the staff, in addition to authors?
Aren't there tools that can do the job, these days?
Tools
The obvious tool beyond a schema is a Schematron — those XPath-based,
context-sensitive soft rules that go beyond what schemas can
express, and what schemas should express.
A Schematron rule can, with a few well-expressed XPaths, make sure that any
ordered list in a law paragraph will use a different list type (see section “Some Examples”). It can
suggest a list to have an introductory paragraph if it lacks one, and, in a
similar way, help out with most other rules. What it can't do is to explain what
a complete procedure should or shouldn't look like. Schematrons are not
instructions, they are a help when validating, and if you don't know how you
should write your content, it won't help you, only point out what's wrong with
what you've already written[14].
Schematrons — and certainly Schematron Quick Fixes — are great for
context-sensitive reminders of what's in a style guide, but they can't replace
one. Nor can they replace an editor — an editor is the guy who will look through
your content and explain, in broad strokes, what doesn't comply with the style
guide and why. If you've created content consistently and with consistent
errors, Schematron warnings could be numerous and therefore overwhelming; an
editor will be able to summarise.
Of course, with enough time and code, there's a lot you can do to convert your
numerous Schematron warnings into summaries, say, by eliminating duplicate
errors, but in the end, an editor will be able to do that much more quickly
while also being able to explain further if you don't understand the finer
points.
And perhaps more importantly, if the style guide changes, the editor can take
this into account without any coding whatsoever, and also spot
why the style guide needs to change.
A Schematron, then, is a tool that aids rules expressed in style guides and
enforced by an editor.
Queues Reinvented
So, what to do about the long queue and the chaos at Mc Donald's on Fleet Street I
started this paper with? Well, if you haven't thought about it already, this is what
everyone should do:
This is a fairly advanced queue numbering system display for a waiting room. Once
you've picked a queue number from the machine, all you have to do is to wait for
your turn. It's multiple lists merged into a single one, really — you won't ever
risk picking the wrong queue, and you won't miss your turn. The semantics are clear
and reasonably unambiguous.
I'm betting that a lot of thought and careful analysis went into designing this
display and its underlying system. Instead of the long line or the chaos that is
McDonald's on Fleet St during lunch hour, this simplifies the model (multiple lists
are merged into a single one) and allows for a separation of concerns where the
business rules help the end user to complete his or her tasks (waiting for your turn
and finding the right counter) while being able to relax.
This, of course, is a paradoxical example, considering that it's a (mostly)
technological solution to the queue problem opening this paper. Where is
the style guide in all this? Glad you asked; it would have been easy
to present the whole thing as a straight list[15]:
148 (6)
293 (8)
774 (3)
694 (4)
616 (10)
102 (9)
X (5)
602 (2)
X (7)
X (1)
This is a made-up example, of course, but my point should be clear. The style
guide is involved:
Don't display any unmanned counters.
Show the latest update in a larger font.
Limit the number of counters shown.
...
See how this works? Yes, it is probably entirely possible to check the above rules
in, um, a Schematron and then enforce the findings by adding some XSLT and CSS[16], but the Schematron only checks what's already been done
rather than telling you what to do before you start. We want to
prevent the bad habits rather than catch them later!
Conclusions
You need to start with the style guide. The style guide should be
an organic part of your information analysis — if you're starting out, it should be
the
first thing produced by the analysis — and later allow you to make informed choices
when
writing the schema. Which should then allow the authors to use the new schema in the
right way and using the correct style.
Ideally, this is how it should be done:
Information analysis
Style guide produced
Schema produced (enforce structure)
Schematron(s) produced (enforce style)
Rinse and repeat until done
Authors can then produce content in the style prescribed by the style guide, the
structure as described by the schema, and with schematron rules highlighting problems
with both. And ideally, with an editor making sure that it's all done properly.
End Note
Hoping to find a few examples of modern style guides by searching Google for
online style guide, the first several results were all about web
design. I rest my case.
William Strunk Jr. and E.B. White. Elements of Style, 3rd
Edition. Simon & Schuster.
[1] The answer is that there are almost no queues. The people waiting
have already ordered; they are waiting for their
burgers to be ready. If it's your first time eating at McD, Fleet St, there's no
way to know without pushing your way through to a counter. If you're like me,
this is very disconcerting.
[2] This, actually, is the kind of thing that belongs in a style guide, not
schema. Let the para be optional but stress its importance in the
style guide. But I'm getting ahead of myself.
[4] I'm not saying there's never a reason for complex models. Of course there is.
It's just that in my experience, overmodelling is more common.
[5] In my experience, FrameMaker sources are especially vulnerable, paradoxically
because FrameMaker templates can be used as semi-structured because of the way
paragraph and character formats are defined.
[6] Although bold italic in a single emphasis type always made
me suspicious.
[7] In a way, this is the easiest paper I've ever written. The one-stop solution
is actually to write a style guide!
[8] How you write content will influence the schema, too, but above all, it's
the kind of thing best explained in a style guide.
[9] This also led to me setting requirements for, and eventually writing,
their SGML DTDs.
[10] Yes, I did use a red marker, and yes, the authors hated me.
[11] I'm not taking credit for all of it; we did work I'm very proud of to
this day, but we also borrowed heavily from other style guides, such as
Chicago Manual of Style, Strunk & White's
Elements of Style, and many others.
[12] I was not part of this — I would have asked for style guides then, and
most of the misery that followed would have been avoided. When I did
come aboard, I asked for them, got some puzzled looks, and was
eventually given the DTDs instead.
[13] And is at least partly responsible of the schema, if you're lucky.
[14] This is not entirely true; a clever Schematron can make things a lot
easier if you have an inkling of the direction in which you need to
go.