How to cite this paper
Sperberg-McQueen, C. M. “Knock Down This Wall.” Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August 4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen02.
Balisage: The Markup Conference 2023
July 31 - August 4, 2023
Balisage Paper: Knock Down This Wall
C. M. Sperberg-McQueen
Founder and principal
Black Mesa Technologies LLC
C. M. Sperberg-McQueen is the founder and principal of Black Mesa Technologies, a
consultancy specializing in helping memory institutions improve the long term preservation
of and access to the information for which they are responsible.
He served as editor in chief of the TEI Guidelines from 1988 to 2000, and has also
served as co-editor of the World Wide Web Consortium’s XML 1.0 and XML Schema 1.1
specifications.
Copyright 2023 by the author
Abstract
Life can be comfortable inside a walled garden. But if we wish to engage with the
world, we need to knock down those walls.
Table of Contents
- Introduction
- The Right Thing
- Getting to Results
-
- Artificial Intelligence
- When Things Don’t Go as Expected
-
- XML Stack
- But … Life Happens
-
- Praxis
- Albania
-
- Commerce
- Unicorns
-
- Just in Time Leather
- Conclusion
Introduction
Thank you all for coming. I am speaking to you today from land
that is historically part of Kah’p’oo Owinge, also known as Santa
Clara Pueblo in New Mexico.
On Monday, Tommie Usdin invited us into a secret garden [Usdin 2023]. It’s secret, and I imagine it as a walled
garden because how else can a garden be secret? Also, there’s a long
tradition of Christian iconography around the image of the
hortus inclusus (or enclosed
garden
).
I hope that you have enjoyed our visit this week to this secret
garden, but what I want to suggest now in closing is that perhaps the
time has come to tear down those enclosing walls. I don’t want to
tear up the garden, but I do want to knock down the wall or at least
put a couple of gates in it.
And to explain why, I think I’m going to need to tell you a
story. Some years ago, when I was a senior in college and avoiding
whatever work I was supposed to be doing at the time, I was wandering
around in the stacks of the main library at my university, picking up
books that looked interesting; and I spent half an hour looking at a
book on Indian logic. The preface explained that the author had for
many years been hoping to write an account of classic Indian logic and
how it compares with western logic derived from Greek practice.
Eventually he had succeeded in putting a first draft together and had
shown it to some friends, and they were enthusiastic about it. One of
the nice things about finishing the first draft was that it gave him a
better overview of the subject matter and allowed him to see that a
different organization would be much better, so he set about the work
of restructuring the presentation. He was writing it again from
scratch when he was interrupted by some of his friends who had
apparently despaired of his perfectionism and his unwillingness to
call anything finished. They presented him with galley proofs
produced by a typesetter to whom they had given the first draft, and
they said, We’re going to publish this. Would you please
proofread these galleys and write a preface?
This has been on my mind I think partly because it illustrates a
tension that is visible in many places in the world between a desire
to get something right — to get the right answer to a question no
matter how long it takes — and deadlines — the desire to get some
results, even if they are imperfect.
The Right Thing
The descriptive markup community has plenty of people who
continue to seek the right answer to the question however long it
takes. Patrick Durusau’s paper on Monday is, I think, a good example
[Durusau 2023]. I have been hearing people worry about
overlap since at least the late 1980s; I have done a fair amount of
worrying about overlap myself. And some people are still trying, as
Patrick’s paper illustrates, to find a solution that works as
generally, and is as convincing for the general case, as XML is for
the case where you only really care about one hierarchy. They are
trying to find some solution that has a convincing story about a
serialization form and a data structure and the conception of a
document vocabulary as a document language for which there could be a
grammar and for which you can imagine some kind of validation.
Patrick’s answer may or may not be the general solution we’ve been
looking for. I’m sure the discussion will continue. Of course, many
people have solved this problem in a purely pragmatic way; if they had
deadlines, they chose an approach, and they did it.
But even if what you really need is a result, things can change
— things can get better. Amanda Galtman demonstrated a specific
technique using xsl:accumulator
that can help a lot with
the kind of discontinuous elements that sometimes trouble people
[Galtman 2023].
Of course, looking for the right answer no matter how long it
takes works most easily when you don’t have a deadline. I talked the
other day about finally finding a solution to a practical problem that
arose in generating an ebook for Frege [Sperberg-McQueen 2023], but the first time I thought about
that ebook was 2014. And I have had the luxury of being able to spend
off-and-on nine years thinking about it, before iXML came along and I
realized that it was the key to my solution.
Not everybody has that luxury because sometimes you do have a
deadline, and when you do have a deadline, quite often you just need
results, no matter how you get them. I have always thought that
processing instructions in both SGML and XML are a signal that the
designers of those languages included people who knew that sometimes
you might need to resort to impure methods to get the results, but
also people who cared enough about the difference between pure methods
and impure methods to want to mark the places where you had resorted
to impure methods, to processor-specific instructions and so on, in
order to get your results. If you mark those places, you give
yourself the chance to go back later and fix things to work with purer
methods.
Getting to Results
Sometimes, of course, results matter, deadline or no deadline.
Some years ago, I was at dinner with a bunch of people in the XML
Schema working group; and the discussion turned to artificial
intelligence and to the shift in artificial intelligence that Elisa
Beshero-Bondar talked about, away from symbolic computation and
towards purely statistical computation [Beshero-Bondar 2023]. And somebody said, You know,
it’s really kind of a shame because with symbolic AI once a problem is
solved, the result represents insight; the problem crystallizes things,
and it presents an advance in human understanding.
And Paul
Biron, one of the editors of the data types specification, who worked
at that time for a very large healthcare organization, said, It
doesn’t matter in the least to me. If a black box that we don’t
understand will give us better patient outcomes, I would much rather
have better patient outcomes than better insight that doesn’t lead to
better patient outcomes. What I care about is the result, not the
process.
And it is a fact that symbolic AI produced some
really nice results, but statistical AI has produced more results and
better results — more impressive results.
Artificial Intelligence
We may or may not be approaching the singularity that some
people have predicted, but clearly things are changing and very
rapidly. I am extremely grateful to Uche Ogbuji and Joel Dubinko, who
each showed us that there is useful work that AIs can do right
now, as well as useful work that is right
now perhaps a little beyond them [Ogbuji 2023, Dubinko 2023]. But for how long
will it be beyond them? They persuaded me — and I hope they
persuaded you — that we do need to get our hands dirty, not just to
make sure that we’re not left behind — although that does matter in
some respects — but also to make sure that the artificial
intelligences and those who are developing them engage fully — or
more fully than they have so far — with human diversity in language
and otherwise, and that our work with AI can serve the kind of values
that the people in this community care about.
If what you care about is the process, then
how the results are achieved matters; if what you care about is
product, then how they’re achieved doesn’t really
matter, as long as the results are correct. And that’s why I think
one of the important papers at this conference was the report from
Paul Prescod, Ben Feuer, and others, reporting on a project which is
attempting to provide a method to allow us to produce testable
reproducible answers to the questions Are the results of this
auto-markup process any good? How good are they? Where can they be
better?
[Prescod et al. 2023]. I think there is great
promise in the general principle that they described: specifying one
or more target documents and then scoring the result by measuring the
edit distance (for some edit distance or other) between the results
produced by the artificial intelligence (or auto-markup process) and
one or the other of the target documents. Like Elisa Beshero-Bondar,
I wish that we could combine the strengths of large language models
and the statistical approach with the strengths and explanatory power
of the symbolic approach, but I have no idea how that will happen
[Beshero-Bondar 2023]. All I know is that this will
continue to be an important tension and an important challenge.
When Things Don’t Go as Expected
Sometimes I think that the shift in AI research from symbolic to
statistical computation is just one instance of a much larger and more
complex pattern. What happens when there is a group of people who
expect things to go in one way and find that they don’t go that way?
What happens then? This is particularly visible and interesting in
cases where you have a community of people who would like to change
the world — and who expect to change the world — and then encounter
a certain recalcitrance in the world as the world refuses to change in
the way that they expected. Early AI researchers thought that general
machine intelligence was just around the corner; from the late 1950s
on, there were predictions that general intelligence was maybe five
years away. But the five-year interval never got shorter and
eventually became a joke, until finally AI as a field said
We’re not getting anywhere this way; we need other
approaches.
For a long time, socialists and Marxists the world over expected
world revolution; world revolution was just around the corner. And it
is interesting to look at the history of the various flavors of socialism
to see how different people react to the fact that world revolution
hasn’t happened and now looks less likely than ever. Sometimes people
change gears — they change paths — the way the AI field did. The
old ideas have failed; we’re going to try something new. So there are
plenty of Communists and Bolsheviks who became neoconservatives.
When I first learned SGML and learned about descriptive markup,
I found it very hard not to think that world
revolution was just around the corner. It was so obvious that this
was a better way to do things, so obvious that it was useful both in
the short-term and in the long-term that surely everybody will soon
see that this is a better way to do things. Surely, very soon the
whole world will be using SGML, because it’s so obviously the right
thing.
XML Stack
After all, look at the stuff we can build on top of descriptive
markup and nowadays on top of XML and XML technologies. John Chelsom
showed us the other day that classic symbolic methods of AI like
forward and backward rule-chaining can be done in XForms — not even
calling an external program but writing the executable code in XForms
itself [Chelsom 2023]. Ari Nordström argued this
morning (and I believe him) that the XML stack provides a nicer basis
for content management systems than existing content management
systems have [Nordström 2023]. The XML stack is nice
for the implementers of the content management systems, because the
XML stack provides features useful for document management. And it’s
nice for those who maintain and run the content management system: it’s
very convenient for people like Ari and me and no doubt most
of you, if we can use XML technologies when interacting with the
content management system.
Geert Bormans and Srikanth Venkata Subramanian demonstrated how
XProc, and XSLT, and the web (which for purposes of this talk I’m
going to claim as an application of descriptive markup) can help with
quality assurance in an extremely complex and important area and keep
the complexity of the problems tractable [Bormans and Subramanian 2023]. Eliot Kimber showed us how DITA keys can make
it easier to maintain large bodies of documentation and how you can
use the XML stack to get from here to there — from a world in which
you’re not using the keys to a world in which you are — even if your
conversion window is very, very narrow because you are part of a very
large organization [Kimber 2023].
In order to make the most of the strengths of the XML stack, we
need of course to hone our skills so I’m grateful to Amanda Galtman
for teaching us how to use XSLT accumulators to spark joy in our code
and how to use XSpec to check their work along the motto trust,
but verify
[Galtman 2023]. Mary Holstege showed
us how a single programmer suitably guided by laziness and impatience
(I’m sorry, Mary, you may think of yourself as lazy, but we can do the
arithmetic: 150,000 lines of code in three years! You might wish you
were lazy, but you’re not) can build her own tools and make an
impressive library available in both XQuery and XSLT [Holstege 2023].
People can build beautiful things with the XML stack. And they
do.
But as Allen Renear pointed out on Sunday not everyone is sold
[Renear 2023]. Lots of people don’t want to use XML.
Lots of people who use XML don’t want to see it,
so here is another call-out to Geert Bormans and Srikanth Venkata
Subramanian and to their users who do want to see XML — who do want
an XML editor so they can work with Akoma Ntoso instead of in Word
[Bormans and Subramanian 2023]. Huzzah! May that be an omen of things
to come! But in the meantime, when we are dealing with a world in
which not everyone wants to use XML, what is to be done?
But … Life Happens
It’s important to remember that no single technology or family
of technologies is ever going to be the only thing in the world.
However good the technology is, complications will arise,
complications of very different kinds, sometimes
non-technological.
Praxis
We want to build things that last, either for commercial reasons
or for other reasons. We want to use descriptive markup maybe to
preserve our cultural heritage and public data. We know or hope that
our cultural heritage should outlive any single piece of software, but
it’s easy to forget when we’re planning a project that we also need
the cultural heritage that we’re trying to preserve and the project
that is preserving it to outlive us.
Jeff Beck pointed out the other day that for projects to live
on, succession planning is necessary [Beck 2023]. And
when public vocabularies are so successful that everyone is using a
predefined public vocabulary, that will have consequences because it
means that fewer people will have experience designing and maintaining
vocabularies. We have to be careful not to let our successes poison
our future.
Some things are hard to plan for, even when we plan for them.
Ash Clark reminded us yesterday that even when you know that
performance may be an issue and you plan for it
and you think ahead and you
test in order to detect and solve performance
issues, nevertheless, performance issues can arise when you go into
production — performance issues that weren’t there before you went
into production [Clark 2023]. Now, we can say, having
heard the talk, well, remember the bots. Remember that bots will
crawl your site and they will exercise portions of your code that you
weren’t thinking were going to be heavily exercised because humans
aren’t going to do that. And that’s a good lesson to draw. We should
remember that bots will stress our systems.
But what will it be next time? There will always be something
that we have not foreseen, and when that problem appears, the solution
is very likely to involve the same kind of things that Ash described: studying
the problem, thinking, and being willing to re-architect your
design [Clark 2023].
Sometimes the things that don’t work out the way we expected are
that the world turns out not to match a description that we published.
In publishing, as Debbie Lapeyre pointed out earlier today, it’s easy
to assume that once something is published it jolly well stays
published, and that’s an end to it. That’s almost true, but not
always. As Jessica Hymers and Qinqin Lin showed us this morning, it
is really, really important to think about how to handle retractions and
corrections and how to make sure that they are visible to people who
might otherwise be tempted to rely on faulty science [Hymers and Lin 2023].
Sometimes what you need turns out to have been built in, or at
least there were some facilities for it, but you will still need to
work out interoperable ways of working with those built-in
facilities.
Albania
One of the big challenges when you expected the world to change
and then it didn’t, is that when you are thinking about how the world
is going to change and how things are going to be, you make certain
plans and adopt certain behaviors, and then when the world doesn’t change
the way you expected, you find yourself confronted with the continued
existence of people and institutions whose continued existence was not
part of your original plan or expectation. In the case of AI, those
were, for example, problem areas that resisted solution. In the case
of the Communist International, they were the continued existence of
non-socialist countries. For XML people who expected the entire world
to start using XML, it is perhaps the continued existence of non-XML
formats and non-XML users.
There are several things that people can do in this case.
Sometimes there are people who refuse to give in, who just continue
working for the original goal. Leon Trotsky never gave up on world
revolution; that was one of the serious substantive policy differences
between Trotsky and Stalin. Trotsky wanted world revolution, and
Stalin said, you know, we have control of this country; we really need
to try to make it work within this country. And socialism within one
country required a certain amount of attention that Trotsky didn’t
want to give it because Trotsky wanted to focus on world revolution.
But eventually, for most purposes, Trotsky became irrelevant to the
way the world went.
But even under Stalin, the Soviet Union took an attitude that, I
guess, could be described as hostile co-existence: the non-socialist
world continues to exist, but it won’t forever because we are in a
contest; ultimately one system will outlive the other. And you
get remarks like Khrushchev’s remark when he visited the UN in 1960,
We will bury you
which sounded really, really
threatening to many Americans because it sounded as though Khrushchev
had plans to bring about the state of affairs in which it would be
necessary for him to bury us, even though in reality, as I learned
years later, he was just citing an old Russian proverb that means
We will outlive you.
At this point, I find it necessary, however, to remind people
that world revolution was not necessarily part of everyone’s original
plan for XML. Converting everyone to XML is not
one of the goals in the XML specification. The database management
people showed up at W3C and at XML meetings not because anybody told
them to, but of their own accord. I remember, again, a Working Group
dinner at which I said, Don’t take this wrong. I don’t want to
offend you, but why you here? What does XML have that you need? You
know, XML is for documents, and databases are really not
document-like. And I’m happy that you’re here, but I don’t understand
why.
And they said, Well, we have interchange problems.
When we need to move data from one database to the other, we have
trouble.
And I said, What do you mean? You’ve got
comma-separated values; surely that’s all you need. I mean, I’m not
really a database guy, but I have worked enough with relational
databases to know that the tables fit really nicely in
common-separated values.
And they said, Yes, but
comma-separated values are responsible for about half of our of our
support costs because, well, for one thing, the format looks so simple
that many programmers don’t bother to look for a library. They figure
they can write it on their own, and they do, and they don’t always get
it right, so they have quoting problems and white space problems. And
the other thing is that, even if they got it right, comma-separated
value formats have no place to identify the character set being
used.
So the fact that XML was created in part by character
set geeks who had been struggling with character set issues for years
and therefore built an encoding declaration into XML was partly
responsible for the interest of database people in XML. Because XML
had a way to declare the character encoding.
Now, being part of a movement that may change the world is a lot
of fun. It’s very satisfying; it gives context and significance to
our lives. And giving up on the idea that it’s going to change the
world can be psychologically very difficult. But I submit that being
part of that kind of movement can give our work meaning whether we end
up taking over the world or not. I’ll also note that Charles
Goldfarb, for what it’s worth, was always careful, at least when I
heard him speaking, not to imply that SGML was
out to conquer the world or that SGML must conquer the world or count
as a failure. On the contrary, he sometimes
positioned SGML and descriptive markup as a sort of niche technology for
people who could not afford to take sides in battles between
proprietary formats backed by large organizations that they did not
control. I remember a talk in which one slide read When giants
do battle, choose a different location for your picnic.
There’s a threat to all of us when organizations that are larger
than we are are fighting each other and we risk becoming collateral
damage. But there’s also a threat when others are competing with
us and see our non-existence as advantageous to
them. Even if we don’t seek world domination ourselves, they may be
seeking world domination. After all, a lot of people have learned the
lesson of the network effect and think This world will be so
much better if our technology is adopted universally.
And any
technology that’s not theirs is a threat to them.
So what do you do if you find yourself living in a world full
of threats? One approach is to build a wall: ignore them, defend
yourself, and pull back your attention to focus on what is close at
hand. In a political environment, maybe the best example of this is
Albania which was extremely isolated not just from the non-socialist
west, but also the rest of the socialist world. East Germany also
spent a lot of resources trying to isolate itself.
On a smaller scale, we can build a wall around our garden to
make it safe for us to tend our own garden, as Voltaire put it. But
sometimes the threats that led us to shield ourselves from the outside
world turn not to be not quite so threatening as we thought. You may
remember that ominous picture of a very threatening-looking snake in
Tommie’s slides [Usdin 2023]. That was a garter snake,
of no conceivable threat to any human being attending this conference
or not attending this conference.
Sometimes we decide there’s really no threat. We can
live in the world; we can interact with the rest of the world in the
same way that Scandinavian Social Democrats do not find it necessary
to shield themselves off from non-socialists. They engage with others
in parliamentary democracies, they enter into coalitions, they form
governments. They’re just another political party. They have certain
policy preferences, but they don’t regard the rest of the political
spectrum as an enemy with whom it is dangerous to enter into any
commerce. In a similar way, there are XML users and tools that
engage with other formats. And we can learn from their
experience.
Commerce
Charles O’Connor and Mark Gross talked about an XML
early
work-flow where the first step, of course, is moving
data into XML from non-XML formats [Gross and O’Connor 2023]. I was
impressed with how much they could achieve and with their idea of
detecting problems automatically so that problem documents could be
routed for human fix-up. That seemed to me, in its own way, the kind
of self-awareness — awareness of the limitations and of the things that
their system doesn’t do — that Elisa Beshero-Bondar found
missing from the AIs with which she ran her experiments [Beshero-Bondar 2023]. I was also impressed by how much
mileage they got out of early normalization as a way of increasing
their success rate.
Phil Fearon and Gursheen Kaur also exploited early normalization
— in their case, just-in-time normalization — as a way of reducing
the variability in markup and making it easy for their code to focus
on the core task of comparing the two tables rather than comparing the
two tables with one hand while, with the other, desperately trying to
compensate for the wild variation in the explicitness and quality of
the table markup found in some inputs [Fearon and Kaur 2023].
High variability in our input is always a challenge. It is
important in cases where it carries meaning, and
that’s why I have always been amused at the habit the database people
have of attempting (as I like to say) to de-legitimize XML by
describing it as semi-structured
in contrast to the
much more rigid, much more highly structured (in their view) structure
of a relational table. I think it’s not really a difference between
structured information and semi-structured information; I think it’s a
distinction between the kind of structure you find in table salt and
the kind of structure you find in DNA. The variability is where a lot
of the information is. But if the variability is not carrying
information — if it’s accidental — then it’s
very helpful to normalize it away.
In some cases, that normalization will be a real challenge,
especially if one is dealing with some legacy markup formats. Joel
Kalvesmaki gave us a vivid example that will keep some of us awake at
night for some time: a reminder of just how strange and wonderful
formats can be and just how devious and ingenious the inventors of new
formats in the 1960s and ’70s and ’80s could be, especially when they
were single-mindedly aiming at a single application of putting ink on
paper [Kalvesmaki 2023]. In that context, anything that
gets the same ink on paper is fine, and any goal beyond that doesn’t
need to be met.
One of the horrifying things, I think, about large and
far-sighted organizations is that they see the need for
standardization before anybody else does. And if they’re large and
self-confident, they move forward on standardizing things, at least
for themselves. And the result is that they end up committed to
systems that ultimately become utterly, completely unlike anything
anyone else is using, so they are completely isolated and they have a
much harder time learning from other people because other people
started from a different foundation.
Joel’s talk also illustrated, it seemed to me, the huge
potential long-term costs of focusing only on
results and not also on process [Kalvesmaki 2023]:
incomplete documentation, or the complete absence of documentation; no
archival copies of earlier versions. Why? Because those things
didn’t help people get ink on paper in time for the 4:00pm Fedex
pickup.
Unicorns
You know, there is another reason I was thinking about that
Indian logician. The real reason I was thinking about that Indian
logician is a problem related to existential and universal
quantification. By universal quantification, I mean logical sentences
like All human beings are mortal
; and by existential
quantification, I mean sentences like Some human beings are
mortal.
And it sticks in my mind, in part, because the Indian
logician taught me something, in that thirty minutes I spent reading a
book that I will never find again (because I can’t remember his name
and I don’t remember enough about the book ever to find it
again).
He said that the treatment of quantification is one of the
crucial differences between Indian logic and Greek logic. The
sentence All human beings are mortal
is a stronger
sentence than Some human beings are mortal
, and the
stronger sentence entails the weaker sentence. So, in Indian logic,
if the universal quantification is true, then an existential
quantification will also be true.
Greek logicians made an opposite choice. They said, consider
the sentence All unicorns have one horn.
Well, that’s
true because there are no counter-examples. There are no unicorns
that don’t have one horn. And so they established the principle that
if the class involved is empty, then the universal quantification is
always true.
But an existential quantification — a sentence like Some
unicorns have one horn
— that, the Greeks really wanted to be
true only if there existed at least one that had one horn. And that
means the truth of the universal statement does not imply the truth of
the existential statement. And in general, in Western logic, an
existential statement is taken to entail, as the term
existential
suggests, the existence of members of that
class.
I think of it as natural that we should call a statement using
the word some
an existential statement, but that’s only
because I’ve spent a lot of time thinking about it within the
framework of Western logic. The word some
and the word
existence
have no etymological tie, and their semantic
tie is, as the Indian author made clear, the doing of Greek logic.
And he described the difference in approach this
way: he said, two logicians — an Indian and
a Greek — went for a walk. And there were sharp rocks, and they hurt
their feet on the rocks. And the Greek logician said, You know
what we should do? We should get a big piece of leather, and we should
cover the entire globe, so when we go for a walk, there is always a
layer of leather between our feet and the rock, and we can’t hurt our
feet.
And the Indian said, Having leather between the
rock and our feet would be a good idea, but I think it’s simpler if we
just wear sandals.
Just in Time Leather
You will see, I think, how this applies to markup. It seems to
me that this goes right to the heart of the problem of co-existence
of descriptive markup with other styles of information representation,
or XML and other formats. If we had succeeded in bringing the entire
world into the practice of descriptive markup and the use of XML, it
would be like having leather all over the world; we could go barefoot
anywhere, and we would never have to worry about sharp rocks. But we
haven’t reached that point, so it’s better if we can just wear shoes
when there are rocks.
The approach of making things be XML when we need them
to be XML works very nicely for some people. The other day
somebody asked me, in a discussion of Google Sheets, Do you
know how to sort the rows of a spreadsheet in Google Sheets?
And I said, Sure,
and they said, How do you do
it?
I click at the upper left, I export to CSV, I
download, I flip over to BaseX, I load it in BaseX as unparsed data,
and I use the csv:parse
command to turn it into XML.
Then, I can sort the XML or do whatever I want with it. And I’m
happy.
And for some reason, the person I was talking to did not
really think of this as a satisfactory answer to the question
How do I sort the rows of a spreadsheet in Google
Sheets?
And even for me, I confess that sometimes that’s a
step or two more than I would like to take, and I would like it to be
a little simpler. I would be happier, for example, if Google Sheets
had an XML export.
And for the people I work with, the problem is that I become a
sort of informational black hole. I can pull in information and turn
it into XML, and I can do nice things with it. But it’s hard for me to
share the information with them if they can’t deal with XML. I
suppose that, having sorted the records, I could write them back out
to CSV and upload them again and overwrite the spreadsheet. But I’ve
never actually tried that, and I don’t know for sure that it would
work. Life will be better for me and for the people I work with if
we could make it easier to move data into and back out of XML.
There is some work going on in this area. We heard Michael Kay
talk this morning about how to do better at XML-to-JSON conversion by
using schema [Kay 2023]. And you will have seen, if you
attended his talk, just how complicated that problem is in the general
case. Ash Clark talked about how to do better by not necessarily
using the standard or default XML representation of JSON but by using
another one that would have better performance characteristics in your
particular work-flow [Clark 2023]. And in invisible
XML we have a general tool for moving data into XML. Norm Tovey-Walsh
talked on Monday about an important issue in the implementation of
invisible XML, an issue that’s important not just for implementers but
also for grammar writers [Tovey-Walsh 2023].
Conclusion
Invisible XML is focused on moving things
into XML. And I think there is continuing pain
and thus a continuing opportunity in the area of export from XML.
iXML simplifies import. You could always get stuff into XML; you
could write a parser in XSLT or XQuery, using the facilities that are
built in. But iXML makes it a lot easier because all you have to do
is specify a grammar and an ixml processor will parse your input into
XML for you. I wonder if there is an opportunity for a notation that
could similarly define export so that we have a declarative notation
that we could compile to XQuery or XSLT that would produce a suitable
export into a text stream that matches the grammar or obeys the rules
that we’ve laid out for the notation.
It seems to me that there are two difficulties in data
interchange — I thought this for a long time; this was part of our
thinking when we were working on the Text Encoding Initiative — one
problem is understanding the structure of the data you acquired from
somebody else. Where are the field boundaries? Do some things
contain other things? Can things repeat? What is data here, and what
is delimiter? And stuff like that. And the second arises typically
after you have answered the first question and understood the
structure of the data; that’s the problem that arises when you
discover that the information you wanted is not there, at least not in
the form that you wanted.
Maybe they analyzed the world in a different way so it’s going
to be more complicated to get the information you want. Maybe the
information you want just isn’t there because, oddly enough, their
view of the world is different from your view of the world, and the
things that they are interested in turn out not to be the things that
you’re interested in. The TEI, I always used to say, can help with
the first, but it really cannot help with the second. The only way to
help with the second would be to prescribe that everybody has to care
about the same things. And that’s not something most people want to
do, or even if they do want to do it, it’s not something that would
ever succeed; we’re not ever all going to be interested in the same
things.
And analogously there are two difficulties in moving data into
and out of XML. The first is just a difference in format which we can
solve with better import/export tools. We can get data into XML. But
the second is the difference — I’ll call it a difference in worldview
— that people who created that external data format don’t think the
way we do. The people who use it often don’t think the way we do.
And so what you get will be XML, but it will not necessarily be XML
that uses the principles of descriptive markup. And it’s important to
be aware of this, so we can prepare ourselves. There are cases where
we will have to relax our concerns about that second thing. We are
in a position to make it easier to move data into and out of XML; we
are not in a position to make the entire world do descriptive markup.
And just getting the data into XML helps a great deal.
If you doubt that, ask yourself: would you rather work with a
Word 95 binary file or with the XML that you can extract from a .docx
file? I submit to you that no one who has ever worked with either of
those, and absolutely no one who has worked with both of those
formats, will have any doubt that they are much happier working with
the XML than with the binary format. If descriptive markup is as
helpful as we say it is, things will always be easier with descriptive
markup than without it, and the work we do will be better. And if XML
is as helpful as we say it is, then things will be better with XML
than without it, and our work will be better.
Gardens are very nice places to retreat from the world, but
having rested here, we need to go back out into the world in part
because we have things to offer the world. If we
tear down the wall or at least punch a couple of gates in it, there
are new places we can go. But be careful. The world out there is
full of stones, and we’re not going to manage to cover it all with
leather. So we’re going to need leather on our feet; put on your
shoes. Let’s go some places, and then come back next year and tell us
where you’ve been.
Thank you for attending Balisage.
References
[Beck 2023] Beck, Jeffrey. The Future Begins Tomorrow: Succession Planning for XML Infrastructure Resources.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beck01.
[Beshero-Bondar 2023] Beshero-Bondar, Elisa E. Markup and Migratory Workflows in the Context of AI and Big Data Analytics: Reflections
on the Data Modeling Groundwork of the Digital Humanities.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beshero-Bondar01.
[Bormans and Subramanian 2023] Bormans, Geert, and Srikanth Venkata Subramanian. Unveiling Linguistic Harmony: Asserting Interlingual Synchronicity in Documents.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Bormans01.
[Chelsom 2023] Chelsom, John J. Artificial Intelligence with XForms.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Chelsom01.
[Clark 2023] Clark, Ash. A Wondrous Historie of Intertextual Networks: Or, How Not to Index Your Data.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Clark01.
[Dubinko 2023] Dubinko, M. Joel. Building Applications with Generative AI.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Dubinko01.
[Durusau 2023] Durusau, Patrick. Hypergraphs: Escaping the Surly Bonds of Syntax.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Durusau01.
[Fearon and Kaur 2023] Fearon, Phil, and Gursheen Kaur. Processing Lax XML Element Trees: Fixing HTML Tables with a Content Model Directed
XSLT Transform.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Fearon01.
[Galtman 2023] Galtman, Amanda. Accumulators in XSLT and XSpec: Developing, Debugging, and Testing XSLT 3 Accumulators.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Galtman01.
[Gross and O’Connor 2023] Gross, Mark, and Charles O’Connor. Pulling All Production Processes Together with an XML-First System.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Gross01.
[Holstege 2023] Holstege, Mary. Adventures in Single-Sourcing XQuery and XSLT.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Holstege01.
[Hymers and Lin 2023] Hymers, Jessica, and Qinqin Lin. Retractions and Corrections at Scholars Portal Journals.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Hymers01.
[Kalvesmaki 2023] Kalvesmaki, Joel. Serializing the Locator Format of the United States Government Publishing Office as
XML.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kalvesmaki01.
[Kay 2023] Kay, Michael. Schema-Aware Conversion of XML to JSON.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kay01.
[Kimber 2023] Kimber, Eliot. Turning a Battleship: Migrating ServiceNow Documentation to Use DITA Keys.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kimber01.
[Nordström 2023] Nordström, Ari. The Dream of a CMS.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Nordstrom01.
[Ogbuji 2023] Ogbuji, Uche. Privately Automating Common, Uncommon, and Surprising Markup Tasks Using AI Large
Language Models.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Ogbuji01.
[Prescod et al. 2023] Prescod, Paul, Ben Feuer, Andrii Hladkyi, Sean Paulk and Arjun Prasad. Auto-Markup BenchMark: Towards an Industry-standard Benchmark for Evaluating Automatic
Document Markup.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Prescod01.
[Renear 2023] Renear, Allen H. The SGML/XML Approach to Document Processing: [an incomplete] History of Criticisms
and Challenges.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Renear01.
[Sperberg-McQueen 2023] Sperberg-McQueen, C. M. Keyboarding Frege’s Concept Writing: A Case Study in the Use of invisible XML.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen01.
[Tovey-Walsh 2023] Tovey-Walsh, Norm. Ambiguity in iXML: And How to Control It.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Tovey-Walsh01.
[Usdin 2023] Usdin, B. Tommie. The Secret Garden.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Usdin01.
×Beck, Jeffrey. The Future Begins Tomorrow: Succession Planning for XML Infrastructure Resources.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beck01.
×Beshero-Bondar, Elisa E. Markup and Migratory Workflows in the Context of AI and Big Data Analytics: Reflections
on the Data Modeling Groundwork of the Digital Humanities.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Beshero-Bondar01.
×Bormans, Geert, and Srikanth Venkata Subramanian. Unveiling Linguistic Harmony: Asserting Interlingual Synchronicity in Documents.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Bormans01.
×Chelsom, John J. Artificial Intelligence with XForms.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Chelsom01.
×Clark, Ash. A Wondrous Historie of Intertextual Networks: Or, How Not to Index Your Data.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Clark01.
×Dubinko, M. Joel. Building Applications with Generative AI.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Dubinko01.
×Durusau, Patrick. Hypergraphs: Escaping the Surly Bonds of Syntax.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Durusau01.
×Fearon, Phil, and Gursheen Kaur. Processing Lax XML Element Trees: Fixing HTML Tables with a Content Model Directed
XSLT Transform.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Fearon01.
×Galtman, Amanda. Accumulators in XSLT and XSpec: Developing, Debugging, and Testing XSLT 3 Accumulators.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Galtman01.
×Gross, Mark, and Charles O’Connor. Pulling All Production Processes Together with an XML-First System.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Gross01.
×Holstege, Mary. Adventures in Single-Sourcing XQuery and XSLT.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Holstege01.
×Hymers, Jessica, and Qinqin Lin. Retractions and Corrections at Scholars Portal Journals.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Hymers01.
×Kalvesmaki, Joel. Serializing the Locator Format of the United States Government Publishing Office as
XML.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kalvesmaki01.
×Kay, Michael. Schema-Aware Conversion of XML to JSON.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kay01.
×Kimber, Eliot. Turning a Battleship: Migrating ServiceNow Documentation to Use DITA Keys.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Kimber01.
×Nordström, Ari. The Dream of a CMS.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Nordstrom01.
×Ogbuji, Uche. Privately Automating Common, Uncommon, and Surprising Markup Tasks Using AI Large
Language Models.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Ogbuji01.
×Prescod, Paul, Ben Feuer, Andrii Hladkyi, Sean Paulk and Arjun Prasad. Auto-Markup BenchMark: Towards an Industry-standard Benchmark for Evaluating Automatic
Document Markup.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Prescod01.
×Renear, Allen H. The SGML/XML Approach to Document Processing: [an incomplete] History of Criticisms
and Challenges.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Renear01.
×Sperberg-McQueen, C. M. Keyboarding Frege’s Concept Writing: A Case Study in the Use of invisible XML.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Sperberg-McQueen01.
×Tovey-Walsh, Norm. Ambiguity in iXML: And How to Control It.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Tovey-Walsh01.
×Usdin, B. Tommie. The Secret Garden.
Presented at Balisage: The Markup Conference 2023, Washington, DC, July 31 - August
4, 2023. In Proceedings of Balisage: The Markup Conference 2023. Balisage Series on Markup Technologies, vol. 28 (2023). doi:https://doi.org/10.4242/BalisageVol28.Usdin01.