How to cite this paper
St.Laurent, Simon. “Semantics and the Web: An Awkward History.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.StLaurent01.
Balisage: The Markup Conference 2021
August 2 - 6, 2021
Balisage Paper: Semantics and the Web
An Awkward History
Simon St.Laurent
Senior Content Manager
LinkedIn Learning
A troublemaker, Simon St.Laurent has been working with XML since the early drafts
of the specification. His first book on XML, XML: A Primer, went through three editions, each time teaching a new group of developers a variety
of bad ideas. The example using XML to manage lighting supposedly inspired several
protocols for excessively complicated control systems. His book Cookies may be partially responsible for the erosion of privacy. His other books have done
less damage because they haven't sold as well. However, he fears that Introducing Erlang and Introducing Elixir may prove to be contributing factors in the development of Skynet, while Programming Crystal covers a language obsessed with strong typing.
His more positive contributions include a partially-completed book on hand tool woodworking,
various writings on Quakerism, and two delightful children. He has lately become
obsessed with hospitality and craft, leading to binges of repentance for past (and
current) work. He still works on Web-related publishing projects, now in video, at
LinkedIn Learning.
Copyright ©2021 by the author. Used with permission.
Abstract
In the late 1990s, multiple groups had plans to transform the technology world, and
especially the World Wide Web, with semantic techniques. Over the last two decades,
however, semantics seem less and less eager to present themselves as markup.
Table of Contents
- Introduction
- A Note on Semantics
- New Magic
- Old Magic Defeats New Magic
- Is there hope for large-scale use of semantic markup?
Introduction
Markup is powerful stuff, able to solve problems at scales from the personal to the
multinational, across human languages, operating systems, programming languages, and
a wide variety of approaches. We gather at Balisage to celebrate the many things
it can do, to figure out new ways to apply it, and to figure out how best to make
it interoperate with competing (and complementing) approaches.
Unfortunately for the amazing people of Balisage, our markup appears to solve problems
that most people and organizations don't want to solve with markup. Markup enthusiasts
still exist in the world, but their share of the broader conversation keeps shrinking.
Some large organizations and some smaller ones still love our work, but broader public
interest keeps declining. The vast bulk of the markup that gets sent out into the
world over networks keeps getting simpler and simpler, using markup with fewer features
than was common in the 1980s or 1990s, with support from non-markup technologies that
provide meaning.
Are semantics receding from markup? Was this inevitable?
A Note on Semantics
Semantics is a word whose meaning wanders widely. The meaning of meaning and what
is meaning and how does meaning connect to a word meaning meaning... it's all circular,
at the best of times. If someone tells you that they have a neatly pinned-down definition
of semantics, they are trying to sell you something. However, for the purposes of
this paper, the exact meaning of semantics and the ways we attempt to convey it are
less important than where we attempt to do that.
This paper focuses first on moves toward putting semantic indicators into documents
and then on the later move toward using generic structure in documents. Semantics
have moved into the hands of either procedural programming languages that operate
on markup documents, into non-markup declarative formats, or both. While various
groups have proposed different approaches to semantics in documents, from named elements
and attributes to constructions built on URIs, semantics are, at least on the Web,
mostly retreating from markup overall.
New Magic
Markup appeared and was reinvented across multiple systems. Many of them used procedural
markup, embedded commands in the documents, fancy versions of the "turn on bold" and
"turn off bold" control characters I used to make the output of a C. Itoh Prowriter
dot matrix printer look more appealing. Others, like LaTeX, separated markup from
presentation, with macros that operated on the markup. Some developers, notably Ted
Nelson, found the very idea of adding markup to pristine content repulsive. (Nelson 1997)
Out of this early brew of possibilities, one family of markup approaches grew to dominate
the rest. There are still many people using LaTeX and even troff, though probably
not many operating Prowriters. The "angle brackets" family of markup technologies
sprang from the Generalized Markup Language, GML, though the brackets appeared later.
GML also provides our hero, the h1
element, defined first as an IBM formatting option and then in Annex E of ISO 8879,
SGML, and then in HTML, XHTML, and HTML again. GML was used at IBM as part of the
Document Composition Facility, and included a starter set with a core vocabulary that
got documentation writers started. Page 12 of their documentation introduced the h1
tag:
The H1 (head level 1) tag is used for chapters. You will use it a lot, as you will
the rest of the head-level tags.
The starter set always starts a new page when it finds an H1 tag.
The heading for this chapter, Chapter 3, "Paragraphs and Headings" on
page 11, was entered like this:
:hl id=gs.Paragraphs and Headings
Again, ignore the ID attribute. If we didn't want to make a cross-reference to
that heading, we would have entered it like this:
:hl.Paragraphs and Headings
The starter set uses the text of headings 0 and 1 to print a running footing. If the
heading is too long to make a neat running footing, you can specify a shorter
version. The H0 and H1 tags have an attribute, called STITLE (for short title),
that allows you to specify the shorter version to be used for the running footing...
In its earliest days, H1
(or h1
) provided structure to documents that formatting tools could use for presentation,
cross-referencing, and more. It wasn't intricate descriptive markup, but it didn't
require the services of a (marvelous) pizza chef (TEI Pizza Chef) to do its work.
The early GML documentation, particularly for the starter set, focused on formatting
documents rather than describing their structure. The two certainly blended, and
from this potent mix the next generation of markup would emerge.
As GML work shifted toward SGML work, the familiar angle brackets - <
and >
- appeared, and the scope and ambition of this family of markup grew. H1
remained, but now in Annex E of SGML, while most of the specification strove toward
more generalized and more powerful possibilities. The memory of its publishing foundations
remained strong, an early pointer toward greater things:
Indeed, there are publishing situations where an SGML application can be useful with
no processing specifications at all (not even application-specific ones), because
each user will specify unique processing in a unique system environment. The historical
explanation for this phenomenon is that in publishing (unlike, say, word processing
before the laser printer), the variety of potential processing is unlimited and should
not be constrained.
On the other hand, there is sufficient commonality in text processing that the idea
of common semantics has some appeal. Applications that followed the rules for such
common semantics could be run on any system that implemented the semantics, thereby
reducing the cost of application development and facilitating document interchange.
A set of such rules for text processing applications is called a "document architecture".
— Goldfarb 1990, page 130
Annex A.1 of the specification provided more detailed explanation of what this looked
like, called Generalized Markup and Descriptive Markup:
Generalized markup is based on two novel postulates:
a) Markup should describe a document's structure and other attributes rather than
specify processing to be performed on it, as descriptive markup need be done only
once and will suffice for all future processing.
b) Markup should be rigorous so that the techniques available for processing rigorously-defined
objects like programs and data bases can be used for processing documents as well.
— Goldfarb 1990, page 7-8
Over the following years, postulate (a) would lead markup into projects of broad data
interchange as well as document structure identification, while (b) would lead developers
toward creating tools mapping document structures to a variety of processing structures.
(In the Annex, the definition of 'rigorous markup' focuses on markup minimization
through the understanding of structural rules, something that continues in HTML5 but
is rarely described as 'rigorous' today.)
SGML transformed some corners of the world, though as late as 1997, Chet Ensign was
able to title a book $GML: The Billion-Dollar Secret (Ensign1997), and have the "secret" part still be plausible. SGML's most widely used descendant,
however, grew from the specific markup vocabulary defined in its Annex E, "Application
Examples". The definition for the h1
element appears on page 532 of Goldfarb 1990. The AAP DTDs that built on this, eventually ISO 12083 (ISO 12083 DTD), influenced the CERN SGMLguid system that Tim Berners-Lee used and then simplified
for his World Wide Web.
The earliest HTML, seen here in this fragment from the original CERN web page (First Web), was built using pieces of that Annex E vocabulary, plus additional markup for hypertext:
<BODY>
<H1>World Wide Web</H1>The WorldWideWeb (W3) is a wide-area<A
NAME=0 HREF="WhatIs.html">
hypermedia</A> information retrieval
initiative aiming to give universal
access to a large universe of documents.<P>
Everything there is online about
W3 is linked directly or indirectly
BODY
and A
are new, but H1
, P
, and the later DL
, DT
, and DD
are all capitalized descendants of that original Annex E markup.
HTML documents inherited from SGML's approach, and a burst of early browsers demonstrated
that the same documents could be presented by many independently-developed tools.
The vision of declarative markup wasn't just a "standards geek" thing at the beginning.
Technical books aimed at introducing HTML to a broad audience included claims like:
HTML was not designed to be the language of a What You See Is What You Get (WYSIWYG) word processor,
such as Word or WordPerfect. Instead, HTML requires that you construct documents with
sections of text marked as logical units, such as titles, paragraphs, or lists, and leave the interpretation of these
marked elements up to the browser displaying the document.
This model builds enormous flexibility into the system and allows browsers of different
abilities to view the same HTML documents. In fact, there are browsers for everything
from fancy UNIX graphics computers to plain-text terminals, such as VT-100s or old
8086-based DOS computers. As an example, in viewing the same document, a graphical
UNIX browser may present major headings with a large perhaps slanted and bold-faced
font (since elegant typesetting is possible with graphics displays), while a VT-100
browser may just center the title, using the single available font. Both presentations
will look different, but both will reproduce the logical organization that you built
in with the HTML tags.
— Graham 1995, pages 1-2
However, early HTML practice quickly moved using markup targeted exclusively at presentation.
Initially, the limited number of options and their simple structure meant that HTML
practice still focused on structuring documents. However, as HTML standardized its
own extensions beyond the initial set of tags, more and more of those were either
created for or repurposed to very specifically presentation purposes. FONT
, of course, is purely presentation. IMG
combines presentation and transclusion, but with the use of spacer GIFs, was often
far from semantic. Tables are, of course, the classic example of markup created for
a structural purpose which was then (ab)used for presentation purposes. (Siegel 1997) The A
tag shown here, of course, offered the most minimal hypertext possibility, ignoring
the more sophisticated possibilities HyTime had shown the SGML world. (DeRose 2018)
While HTML was spreading quickly, a group of SGML experts converged to create a different
kind of simplified markup. Rather than focusing on a single simplified vocabulary,
the group that created XML focused on shrinking SGML's syntax to a smaller set that
was easier for computers and humans to parse unambiguously. As two of the leaders
of that project described its purpose:
XML... lays down ground rules that clear away a layer of programming details so that
people with similar interests can concentrate on the hard part—agreeing on how they
want to represent the information they commonly exchange. This is not an easy problem
to solve, but it is not a new one, either.
Such agreements will be made, because the proliferation of incompatible computer systems
has imposed delays, costs and confusion on nearly every area of human activity. People
want to share ideas and do business without all having to use the same computers;
activity-specific interchange languages go a long way toward making that possible.
Indeed, a shower of new acronyms ending in "ML" testifies to the inventiveness unleashed
by XML in the sciences, in business, and in the scholarly disciplines.
— Bosak and Bray 1999, page 92.
Between the release of the XML 1.0 Recommendation (XML 1.0) in 1998 and the release of XML Schema 1.0 in 2001, the computing world seemed eager
to absorb as much XML goodness as it could. The W3C, founded to shepherd HTML, rapidly
expanded its XML work. The GCA was reinvigorated and became IDEAlliance. SGML Open
renamed itself OASIS Open and started work with UN/CEFACT to reinvent business communications
through ebXML and SOAP-based Web Services specifications (sometimes referred to as
WS-*).
At the center of these dreams, and most visibly, sat Tim Berners-Lee, now Director
of the World Wide Web Consortium (W3C). In 1999, he wrote:
Even though the computer markup languages for hypertext and graphics are designed
for presenting text and images to people, and data languages are designed to be processed
by machines, they share a need for a common, structured format. XML is it. (160)
The Semantic Web is the web of connections between different forms of data that allow
a machine to do something is wasn't able to do directly.
This may sound boring until it is scaled up to the entirety of the Web. Imagine what
computers can do when there is a vast tangle of interconnected terms and data that
can automatically be followed. The power we will have at our fingertips will be awesome.
Computers will 'understand' in the sense that they will have achieved a dramatic increase
in function by linking very many meanings.
To build understanding, we need to be able to link terms. This will be made possible
by inference languages, which work one level above the schema languages. (185) (Berners-Lee 1999)
The Semantic Web and its competitor (largely from the HyTime SGML world) Topic Maps
has fueled many talks at this conference and its predecessor, Extreme Markup Languages,
but still seems off in the distance. We are still near the bottom of the "layer cake
diagram". (Semantic Web Architecture)
While those big dreams promised big things, the Web Standards Project (WASP 2013) was fighting smaller but crucial battles. The "Browser Wars" had left users with
incomplete and incompatible browsers supporting various parts of standards. Even
as that situation improved, thanks to time, pressure, and the slow end of Internet
Explorer's dominance, many HTML developers' tools were still using old techniques
like table-based layout. A huge education process strove to move developers to put
their formatting in stylesheets, in CSS for HTML, that were more easily maintained
and extended. Sites like CSS Zen Garden showed how to separate presentation from content, leaving clean accessible semantic
HTML in the markup documents, and putting presentation in the stylesheet. Utility
sites like Cleaner Site even automatically replaced table markup with div
elements for easier styling. H1
elements could now take on different formatting than what had been built into the
browser.
The W3C attempted to clean up the HTML mess with XML as well, developing XHTML as
a way to combine the HTML vocabulary with XML structure, creating XHTML. At this point,
H1
became h1
again. At the same time, the W3C specified a Document Object Model (DOM), making
it possible to manipulate HTML, XML, and XHTML with JavaScript or Java. While XHTML 1.0 was mostly a syntax cleanup and documentation, XHTML 1.1 was itself extensible. Its crowning achievement was a set of DTDs that used entity
references to allow developers to add more elements and attributes to XHTML, complete
with support for namespace prefixes. (I believe that Murray Altheim presented on them
at Extreme Markup Languages, but can no longer find records of it.) In 2010 the W3C
added an XML Schema version as well.
At this point, a popular consortium was combining its early success with a particular
markup language, HTML, with a more general markup language, XML, pursuing a grand
vision of a Web shared by humans and computers. Companies and other standards bodies
were racing to build on top of these components. What could possibly go wrong?
Old Magic Defeats New Magic
While WS-* specifications kept sprouting for years, and REST began its rise, XHTML
hit walls early. Some of the issues resulted from years of celebrating markup as
text, and the expectation (dating back to CERN and SGML before that) that many tags
were optional. As browsers competed for the affections of developers and users, they
had become ever more forgiving, each in their own ways, of tagging errors and errors
within the tags. If the results looked right in a browser, they were right, and billions
of lines of legacy code (and even fresh code) were unlikely to change. The dot-com
bust also limited resources for cleaning up this "tag soup", even where companies
were interested.
While HTML validators had long been available, they had never bonded with HTML culture
they way they did with SGML or XML culture. A List Apart, a longtime bastion of web
best practices tied to the Web Standards Project, did offer an article (Koch 2005) on extending XHTML DTDs to add attributes. By 2008, however, even ALA could no
longer wholeheartedly support the HTML validation it had recommended in articles over
the years. (Marcotte 2008) XHTML syntax was only part of the problem.
XHTML standards also ran into roadblocks. While I have heard it reported in the halls
of XML conferences that the browser makers actively wanted XML's strict syntax so
they could reduce maintenance costs on their tag soup code and compatibility, when
it came to actually implementing it, they were... less eager. XML parsers did become
a normal component of browsers, but weren't regularly called upon. The XHTML 1.0 specification
(XHTML 1.0) had listed ways to make XML syntax acceptable to existing browser engines, and that
was about as far as common practice got. Even those were too much for some HTML tools.
I remember having to convert <br />
back to <br>
in a Java-based documentation system to avoid slashes appearing all over the documents.
Opposition to XHTML arose inside and outside of the W3C, with browser vendors creating
the less formal WHATWG to propose other paths. Initially intent on creating JavaScript-based
alternatives to XForms, the WHATWG moved more boldly toward a full HTML5, explicitly
not XHTML. In a distant echo of SGML's "rigorous parsing", HTML5 included its own
parsing model (WHATWG parsing), offering more relaxed syntax options based on an understanding of the HTML vocabulary.
Beyond that, however, they largely kept changes to the HTML vocabulary small, allowing
for some extensibility through data-* attributes. (HTML5 Data Attributes) XHTML 2.0 was ended and its proposals effectively orphaned in 2010 (XHTML 2.0), when the XHTML Working Group was closed. (XHTML 2.0 WG)
At the same time that HTML5 was firing up, limitations in CSS were shifting the way
developers created markup. In the early days of HTML, tag choice was central, as
the tags determined the result. As formatting moved into separate stylesheets, the
markup itself could be far more generic. With the demise of table-based layouts and
the general failure of frames, both of which created major accessibility issues, float-based
CSS layouts took over. Even the best float-based layouts (Levine 2006) used div
elements as generic containers, receiving formatting instructions from the stylesheet
without contributing to the content of the documents. Some layouts required multiple
levels of div
s. "Div-itis" became a common diagnosis (McDermott 2011), but there was little developers could do to avoid it until the appearance of the
CSS Flexbox and Grid specifications a decade later. Even when developers used more
semantic elements, they often used CSS resets (Meyer 2011) that removed the formatting reminders of what those elements had been, simplifying
their work tremendously. (Meyer 2007) On the more semantics-friendly side, WAI-ARIA created descriptions to provide better
accessibility to increasingly generic markup. (WAI-ARIA)
CSS wasn't the only aspect of Web development fond of the div
element. JavaScript had been on the Web since 1995, and had grown from a supplemental
scripting language to a powerful controller of content and interaction. Standardization
of the Document Object Model (DOM) gave JavaScript the ability to listen for activity
and respond to it by modifying documents. The object-oriented model of named generic
containers holding state as values was a good fit for div
elements with id
, data-*
, and sometimes class
attributes. Recent CSS frameworks like Bootstrap and Tailwind CSS are designed to
mesh well with this div
-centric model.
Even within the XML community, developers were looking for other approaches. As early
as 1999, a group of developers on the xml-dev mailing list started searching for an
even simpler XML, SML. (La Quey 1999) Those efforts eventually bore long-lasting fruit in YAML, "YAML Ain't Markup Language"
(YAML), which is commonly used for configuration on large software infrastructure projects.
At the same time, Douglas Crockford was extracting a JavaScript data structure from
that language, and called it JavaScript Object Notation, or JSON. (JSON) By happy coincidence, JSON was easy to make into a clean subset of YAML, letting
XML's two primary text-based competitors work together.
XML got another burst of attention when Asynchronous JavaScript with XML (Ajax) appeared.
(Garrett 2005) Rather than emerging from a standards process, Ajax was a usage pattern that emerged
from complex projects. Instead of refreshing documents constantly in a formal client-server
conversation, a single document served as the frame for multiple changes of content
and structure inside of it. JavaScript programs could use XMLHTTPrequest
object to send multiple requests to servers, rather than having to do a complete
cycle of change in order to have a conversation. Unfortunately, since binding JSON
to JavaScript object was naturally trivial, most of this traffic migrated to JSON
rapidly. At the same time, other APIs were also migrating away from XML to JSON.
(DuVander 2012)
While none of these setbacks meant the end of XML, the spark had faded. XML had opened
the door, convincing people that text-based data and formal document interchange was
possible, but both the data world and the Web world were shifting away from XML approaches.
While most XML-focused working groups, with the notable exception of XHTML 2.0, were
able to complete their work and achieve Recommendation status for their projects,
the W3C's XML Core Working Group closed in 2016 (W3C 2016) and the XSLT/XQuery Working Group closed in 2018. (Jia 2018) The RDF Working Group, which had focused on semantic possibilities above the level
of markup, had previously closed in 2014. (W3C 2016)
XML had largely vanished from the browser-based Web at this point. Maybe it wasn't
needed? Existing standards used in web browsers provided multiple levels of support
for custom yet shareable semantics. (St. Laurent 2013) The W3C took another shot at specifiying a coherent component model, now combining
semantics and behavior. Web Components suggested a path forward, adding features to
the DOM that allowed custom markup and its supporting code to have more private state,
making it easier to mix and match content and behavior specified in different places.
However, while component models have indeed taken off, they tend so far to mostly
operate in the contexts of specific frameworks (notably React) rather than as generic
containers that can migrate freely.
The markup used in those components, and in web pages and applications generally,
has continued its march toward generic <div>
elements. The 2005 data from Google that justified many HTML5 priorities has unfortunately
disappeared from the Web, but a 2016 study showed that <div>
was the most popular tag used in the body of websites, with <p>
still used in 81.5% of sites. Our erstwhile hero <h1>
was in 55.8% of sites, with <h2>
close behind and <h3>
down at 43.4%. List structures, commonly used for navigation menus, did okay around
74%, but most inline markup is much further down the list. (Except the generic <span>
, at 75.6%. However, looking at the usage of tags inside of sites, <div>
is responsible for 55.8% of text content markup, and <p>
for only 12.1%. (Rosu 2016, with an updated version at Advanced Web Ranking. Going forward, the Web Almanac (Meiert 2020) is probably the best source.)
More and more, those <div>
elements get their content through JavaScript. While sometimes the (new and improved)
Server-Side Rendering approaches send the first version of a document or application
with its content already in place, many times a blank template is sent with JavaScript,
which retrieves the actual content on a separate channel. Sometimes that content
still arrives as HTML or XML, but often it arrives as JSON. The latest popular approach
to connecting the client to the server, GraphQL, uses JSON to specify both its requests
and its responses. Markup languages can be part of the conversation for specific
implementations, but are left out of the core specification. (GraphQL 2018)
Markup still has a strong core of supporters, but is falling into secondary or specialist
use. Keepers of the semantic markup flame in HTML still encourage learners to master
the core HTML vocabulary before leaping to CSS and JavaScript frameworks. XML still
gets endless use in sophisticated documentation projects, and "on the wire" in protocols
built before and sometimes after the rise of JSON. (And YAML is taking a share of
JSON's more complex projects.) As a share of visible projects, however, markup's
star has decidedly dimmed.
Is there hope for large-scale use of semantic markup?
Some people and projects still need shared semantics, and semantic tools remain available
on the Web, even in the lands of div
s. In the late 1990s, XML Namespaces (Namespaces) seemed to win a crushing victory over architectural forms (XML Architectural Forms), at least within the W3C. URIs were to be the glue between markup vocabularies and
meaning, rather than mere attributes offering other semantics that might apply to
an element. In the 2020s, that victory seems to have been reversed. Namespaces are
rarely used for the free mixing of vocabularies they were supposed to support, while
millions of HTML developers use architectural forms without being aware of it. (The
concern many of us had about "what does a namespace URI mean?" RDDL 2002 seems quaint today.)
HTML5's decision to base its parsing model on an approach that required knowledge
of the HTML vocabulary limited its extensibility. CSS and JavaScript do work with
other elements dropped into HTML5 documents, just slightly less predictably. Markup
hygiene matters more for element names that HTML5 doesn't already understand. Despite
claims like "the web ecosystem routed around the damage of XML's influence by making
HTML better suited for extensibility than ever before" (Denicola 2014), HTML's extensibility remains limited.
HTML5 does slightly expand the attribute-based extensibility that HTML had long provided.
The data-*
attributes (HTML5 Data Attributes), build on the continuing existence of id
and class
, so that there is frequently a home for semantic information identifying the "real"
purpose of even a generic element. While it is possible to tell that class="headline"
means the same thing that <h1>
once did, its use is mostly limited to individual cases. It is chaotic, cloudly,
often duplicating, and case by case, but it still lingers, while namespace URIs have
vanished to specialist schema zones. Most of the meaning now is kept in the JavaScript
code used to process this markup, so the markup's use is possible but often limited.
It can be used like Architectural Forms, but isn't as general.
While JavaScript is mostly available in the browser, it's not convenient to run massive
quantities of it across thousands of documents while indexing documents. (Though
Google indexing does run some JavaScript. (Google SEO 2021)) Sites indexing and sharing content often suggest additional markup from Web content
that simplifies the work of sharing it. Social media sites often suggest header meta
markup, like Twitter Cards (Twitter Cards) and Facebook Open Graph (Facebook Open Graph), to make it easier for their tools to present shared pages to users. Google supports
Structured Data (Google Structured 2021), built with RDF or more frequently JSON, for similar purposes. All of these are
included within documents, using minimal markup, and often duplicate information that
is already present in the document.
Developers who want to extend the HTML element vocabulary can use Web Components,
which support hyphenated element names. Though this URI-free approach to prefixes
has been described at Balisage as "the coming namespace winter" (Miłowski and Walsh 2014), it does allow developers to create custom elements. It is more direct than the
attribute approach, it is trapped somewhat by the slow and often halting emergence
of Web Components specifications and support. (Web Components)
There may yet be hope for large-scale use of semantic markup, but it would require
a drastic turnaround from the trends of the last two decades. If it arrives, perhaps
we're better prepared for it this time.
References
[Facebook Open Graph] “A Guide to Sharing for Webmasters.” https://developers.facebook.com/docs/sharing/webmasters
[Advanced Web Ranking] Advanced Web Ranking. “The average web page from top twenty Google results.” https://www.advancedwebranking.com/html/
[Al-Awadai 2017] Al-Awadai, Zahra, Anne Brüggemann-Klein, Michael Conrads, Andreas Eichner and Marouane
Sayih. “XML Applications on the Web: Implementation Strategies for the Model Component
in a Model-View-Controller Architectural Style.” Presented at Balisage: The Markup
Conference 2017, Washington, DC, August 1 - 4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:https://doi.org/10.4242/BalisageVol19.Bruggemann-Klein01. https://www.balisage.net/Proceedings/vol19/html/Bruggemann-Klein01/BalisageVol19-Bruggemann-Klein01.html
[XHTML 1.1] Altheim, Murray, and McCarron, Shane. XHTML 1.1 - Module Based XML. https://www.w3.org/TR/2001/REC-xhtml11-20010531/
[Beck 2011] Beck, Jeff. “The False Security of Closed XML Systems.” Presented at Balisage: The
Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). doi:https://doi.org/10.4242/BalisageVol7.Beck01. https://www.balisage.net/Proceedings/vol7/html/Beck01/BalisageVol7-Beck01.html
[Beck 2018] Beck, Jeffrey. “Transcending structure: Applying shared markup vocabularies with your
friends and enemies.” Presented at Symposium on Markup Vocabulary Ecosystems, Washington,
DC, July 30, 2018. In Proceedings of the Symposium on Markup Vocabulary Ecosystems. Balisage Series on Markup Technologies, vol. 22 (2018). doi:https://doi.org/10.4242/BalisageVol22.Beck01. https://www.balisage.net/Proceedings/vol22/html/Beck01/BalisageVol22-Beck01.html
[Berjon 2014] Berjon, Robin. “Mending Fences and Saving Babies.” Presented at Symposium on HTML5
and XML, Washington, DC, August 4, 2014. In Proceedings of the Symposium on HTML5 and XML. Balisage Series on Markup Technologies, vol. 14 (2014). doi:https://doi.org/10.4242/BalisageVol14.Berjon01. https://www.balisage.net/Proceedings/vol14/html/Berjon01/BalisageVol14-Berjon01.html
[First Web] Berners-Lee, Tim. “World Wide Web.” http://info.cern.ch/hypertext/WWW/TheProject.html
[Berners-Lee 1999] Berners-Lee, Tim. Weaving the Web: The Original Design and Ultimate Destiny of the WORLD WIDE WEB by
Its Inventor. New York: Harper San Francisco, 1999.
[Semantic Web Architecture] Berners-Lee, Tim. “Semantic Web - XML 2000- slide 'Architecture'.” http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html
[Biezunski 2012] Biezunski, Michel. “Moving sands: Adventures in XML e-book-land.” Presented at Balisage:
The Markup Conference 2012, Montréal, Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). doi:https://doi.org/10.4242/BalisageVol8.Biezunski01. https://www.balisage.net/Proceedings/vol8/html/Biezunski01/BalisageVol8-Biezunski01.html
(XML Islands discarded)
[RDDL 2002] Borden, Jonathan, and Bray, Tim. “Resource Directory Description Language (RDDL).”
http://rddl.org/
[Bosak and Bray 1999] Bosak, Jon, and Bray, Tim. “XML and the Second-Generation Web.” Scientific American, May 1999. Pages 89-93.
[XML 1.0] Bray, Tim, Paoli, Jean, and Sperberg-McQueen, Michael. “Extensible Markup Language
(XML) 1.0: W3C Recommendation 10-February-1998.” https://www.xml.com/axml/
[Brüggemann-Klein 2012] Brüggemann-Klein, Anne, Jose Tomas Robles Hahn and Marouane Sayih. “Leveraging XML
Technology for Web Applications.” Presented at Balisage: The Markup Conference 2012,
Montréal, Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). doi:https://doi.org/10.4242/BalisageVol8.Bruggemann-Klein01. https://www.balisage.net/Proceedings/vol8/html/Bruggemann-Klein01/BalisageVol8-Bruggemann-Klein01.html
[TEI Pizza Chef] Burnard, Lou, and Sperberg-McQueen, C. Michael. “TEI Pizza Chef.” http://www.tei-c.org/Vault/P4/pizza.html
[Cargill 2011] Cargill, Carl. “Why Standardization Efforts Fail.” http://quod.lib.umich.edu/j/jep/3336451.0014.103/--why-standardization-efforts-fail?rgn=main;view=fulltext
[Carpenter 2016] Carpenter, Todd. “Moving toward common vocabularies and interoperable data.” Presented
at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Carpenter01. https://www.balisage.net/Proceedings/vol17/html/Carpenter01/BalisageVol17-Carpenter01.html (dull gray sea of featureless HTML5)
[CERN tags] CERN. “Tags used in HTML.” http://info.cern.ch/hypertext/WWW/MarkUp/Tags.html
[No Tags] Clark, Kendall Grant. “Look Ma, No Tags.” https://www.xml.com/pub/a/2002/07/24/yaml.html
[Coldewey 2021] Coldewey, Devin. “Docugami's new model for understanding documents cuts its teeth
on NASA archives.” https://techcrunch.com/2021/04/12/docugamis-new-model-for-understanding-documents-cuts-its-teeth-on-nasa-archives/
[Connolly 1997] Connolly, Dan, et al. “The Evolution of Web Documents: The Ascent of XML,” in XML: Principles, Tools, and Techniques. Sebastopol, CA: O'Reilly Media, 1997. http://www.xml.com/pub/a/w3j/s3.connolly.html
[Coombs 1987] Coombs, James H., Allen H. Renear and Steven J. DeRose. “Markup systems and the future
of scholarly text processing.” Communications of the ACM, 30(11):933–947, 1987. doi:https://doi.org/10.1145/32206.32209. http://www.fdi.ucm.es/profesor/jlsierra/e-learning/primera-sesion/MarkupSystems.pdf
[Web Components] Cooney, Dominic, and Glazkov, Dmitri. Introduction to Web Components. http://www.w3.org/TR/components-intro/
[JSON] Crockford, Douglas. “Introducing JSON.” http://www.json.org/
[CSS Zen Garden] CSS Zen Garden. http://www.csszengarden.com/
[Denicola 2014] Denicola, Domenic. “Non-Extensible Markup Language.” Presented at Symposium on HTML5
and XML, Washington, DC, August 4, 2014. In Proceedings of the Symposium on HTML5 and XML. Balisage Series on Markup Technologies, vol. 14 (2014). doi:https://doi.org/10.4242/BalisageVol14.Denicola01. http://www.balisage.net/Proceedings/vol14/html/Denicola01/BalisageVol14-Denicola01.html
[DeRose and Durand 1994] DeRose, Steven and Durand, David. Making Hypermedia Work: A User's Guide to HyTime. Boston: Kluwer Academic Publishers, 1994.
[DeRose 2018] DeRose, Steven J. “Dynamic Style: Implementing Hypertext through Embedding Javascript
in CSS.” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31
- August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). doi:https://doi.org/10.4242/BalisageVol21.DeRose01. https://www.balisage.net/Proceedings/vol21/html/DeRose01/BalisageVol21-DeRose01.html
[No Tags] Dodds, Leigh. “Doing It Simpler.” https://www.xml.com/pub/a/2001/08/01/simpler.html
[DCMI] Dublin Core Metadata Initiative. “DCMI Specifications.” http://dublincore.org/specifications/
[DuVander 2012] DuVander, Adam. “Leading APIs Say 'Bye XML' in New Versions.” https://www.programmableweb.com/news/leading-apis-say-bye-xml-new-versions/2012/12/17
[Ensign1997] Ensign, Chet. SGML: The Billion-Dollar Secret. Boston: Pearson, 1997.
[Manifesto 2013] Extensible Web Manifesto. http://extensiblewebmanifesto.org/
[Flynn 2009] Flynn, Peter. “Why writers don't use XML: The usability of editing software for structured
documents.” Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August
11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Flynn01. https://www.balisage.net/Proceedings/vol3/html/Flynn01/BalisageVol3-Flynn01.html
[Garrett 2005] Garrett, Jesse James. “Ajax: A New Approach to Web Applications.” https://web.archive.org/web/20150910072359/http://adaptivepath.org/ideas/ajax-new-approach-web-applications/
[GML Starter Kit] GML Starter Kit IBM. Document Composition Facility Generalized Markup Language Starter Set User's Guide
Release 3.2. https://ia601900.us.archive.org/35/items/bitsavers_ibm370DCFSmpositionFacilityGMLStarterSetUGRel3.2Oc_8587670/SH20-9186-06_Document_Composition_Facility_GML_Starter_Set_UG_Rel3.2_Oct89.pdf
[Goldfarb 1990] Goldfarb, Charles. The SGML Handbook. Oxford: Oxford University Press, 1990.
[Goodner 2021] Goodner, Marc. “Is JSON the worst format ever?” https://twitter.com/robotdad/status/1390375568017813505
[Graham 1995] Graham, Ian S. The HTML Sourcebook. New York: John Wiley & Sons, 1995.
[GraphQL 2018] GraphQL. https://spec.graphql.org/June2018/
[Hicks 1998] Hicks, Tony. “Should we be using ISO 12083.” Journal of Electronic Publishing, Volume 3, Issue 4. doi:https://doi.org/10.3998/3336451.0003.407.
[WASP 2013] “History of the Web Standards Project.” https://www.webstandards.org/about/history/index.html
[Hook 2021] Hook, Anselm. “Orbital Web Browser.” https://orbitalweb.github.io/
[Hopgood 2001] Hopgood, Bob. “History of the Web.” https://www.w3.org/2012/08/history-of-the-web/origins.htm
[Cleaner Site] HTML, CSS, & JS Cleaner. “Replace HTML Tables with <div>s.” https://html-cleaner.com/features/replace-html-table-tags-with-divs/
[ISO 12083 DTD] “ISO 12083 Article XML DTD.” http://xml.coverpages.org/iso12083xmlarticledtd19990125.html
[ISO/IEC 19757] ISO/IEC 19757 - DSDL. “Document Schema Definition Languages.” http://dsdl.org/
[Jellife 2012] Jelliffe, Rick. “XML's Dialect Problem: Diversity is not the problem; it is the requirement.”
https://web.archive.org/web/20130703024142/http://broadcast.oreilly.com/2012/03/xmls-dialect-problem.html
[Jia 2018] Jia, Xueyuan. “XSLT and XML Query Working Groups now closed.” https://lists.w3.org/Archives/Public/public-xsl-wg/2018Oct/0000.html
[JSON Alternative] “JSON: The Fat-Free Alternative to XML.” http://www.json.org/xml.html
[Katz 2013] Katz, Yehuda. “Extend the Web Forward.” http://yehudakatz.com/2013/05/21/extend-the-web-forward/
[Koch 2005] Koch, Peter Paul. “Validating a Custom DTD.” https://alistapart.com/article/customdtd/
[La Quey 1999] La Quey, Robert E. “SML: Simplifying XML.” https://www.xml.com/pub/a/1999/11/sml/
[Levine 2006] Levine, Matthew. “In Search of the Holy Grail.” https://alistapart.com/article/holygrail/
[Marcotte 2008] Marcotte, Ethan. “Where Our Standards Went Wrong.” https://alistapart.com/article/whereourstandardswentwrong/
[Mason 2019] Mason, James David. “Do we really want to see markup?” Presented at Balisage: The
Markup Conference 2019, Washington, DC, July 30 - August 2, 2019. In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). doi:https://doi.org/10.4242/BalisageVol23.Mason01. https://www.balisage.net/Proceedings/vol23/html/Mason01/BalisageVol23-Mason01.html
(visible markup)
[McDermott 2011] McDermott, Megan. “Divitis: What it is and how to avoid it (Updated!)” http://www.apaddedcell.com/div-itis-what-it-and-how-avoid-it
[MCE] “Semantics of MCE.” https://www.assembla.com/spaces/IS29500/wiki/Semantics_of_MCE
[XML Architectural Forms] Megginson, David. “XML Architectural Forms.” http://www.megginson.com/XAF
[Meiert 2020] Meiert, Jens Oliver, Rosu, Catalin, and Devlin, Ian. “Markup.” https://almanac.httparchive.org/en/2020/markup
[Meyer 2007] Meyer, Eric. “Reset Reasoning.” http://meyerweb.com/eric/thoughts/2007/04/18/reset-reasoning/
[Meyer 2011] Meyer, Eric. “CSS Tools: Reset CSS.” https://meyerweb.com/eric/tools/css/reset/
[Miłowski 2009] Miłowski, R. Alexander. “XML in the Browser: the Next Decade.” Presented at Balisage:
The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Milowski01. https://www.balisage.net/Proceedings/vol3/html/Milowski01/BalisageVol3-Milowski01.html
[Miłowski and Walsh 2014] Miłowski, R. Alexander and Walsh, Norm. “How to survive the coming namespace winter.”
Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014.
In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:https://doi.org/10.4242/BalisageVol13.Milowski01. https://www.balisage.net/Proceedings/vol13/html/Milowski01/BalisageVol13-Milowski01.html
[Nelson 1997] Nelson, Ted. “Embedded Markup Considered Harmful,” in XML: Principles, Tools, and Techniques. Sebastopol, CA: O'Reilly Media, 1997. http://www.xml.com/pub/a/w3j/s3.connolly.html
[Patterson 2013] Patterson, Matt. “Where did all the markup kids go? Open-source, markup, and the casual
developer.” Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August
6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Patterson01. https://www.balisage.net/Proceedings/vol10/html/Patterson01/BalisageVol10-Patterson01.html (open source shrinkage)
[Piez 2009] Piez, Wendell. “How to Play XML: Markup Technologies as Nomic Game.” Presented at
Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Piez01. https://www.balisage.net/Proceedings/vol3/html/Piez01/BalisageVol3-Piez01.html
[Quin 2012] Quin, Liam R. E. “Characterizing ill-formed XML on the web: An analysis of the Amsterdam
Corpus by document type.” Presented at Balisage: The Markup Conference 2012, Montréal,
Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). doi:https://doi.org/10.4242/BalisageVol8.Quin01. https://www.balisage.net/Proceedings/vol8/html/Quin01/BalisageVol8-Quin01.html (Amsterdam corpus of XML)
[Roselli 2020] Roselli, Adrian. “Be Wary of doc-subtitle.” https://adrianroselli.com/2020/08/be-wary-of-doc-subtitle.html
[Rosu 2016] Rosu, Catalin. “The Average Web Page (Data from Analyzing 8 Million Websites).”
https://css-tricks.com/average-web-page-data-analyzing-8-million-websites/
[Ruby 2007] Ruby, Griff. “The Lost Tags of HTML.” http://www.the-pope.com/lostHTML.htm
[Siegel 1997] Siegel, David. “The Web is Ruined and I Ruined It,” in XML: Principles, Tools, and Techniques. Sebastopol, CA: O'Reilly Media, 1997. http://www.xml.com/pub/a/w3j/s3.connolly.html
[Smith 2013] Smith, Michael [tm]. “Getting agreements is hard (some thoughts on Matthew Butterick's
“The Bomb in the Garden” talk at TYPO San Francisco).” http://www.w3.org/QA/2013/04/getting_agreements_is_hard_som.html
[St. Laurent 2007] St. Laurent, Simon. “JSON on the Web, or: The Revenge of SML.” https://www.xml.com/pub/a/2006/07/05/json-on-the-web-or-the-revenge-of-sml.html
[St. Laurent 2013] St. Laurent, Simon. “Stop Standardizing HTML.” http://radar.oreilly.com/2013/04/stop-standardizing-html.html
[Google Structured 2021] “Understand How Structured Data Works.” https://developers.google.com/search/docs/advanced/structured-data/intro-structured-data
[Google SEO 2021] “Understand JavaScript SEO Basics.” https://developers.google.com/search/docs/advanced/javascript/javascript-seo-basics
[Usdin 2013] Usdin, B. Tommie. “The semantics of “semantic”.” Presented at Balisage: The Markup
Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Usdin01. https://www.balisage.net/Proceedings/vol10/html/Usdin01/BalisageVol10-Usdin01.html
[Usdin 2019] Usdin, B. Tommie. “Explicit markup: a fool’s errand or the next big thing?” Presented
at Balisage: The Markup Conference 2019, Washington, DC, July 30 - August 2, 2019.
In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). doi:https://doi.org/10.4242/BalisageVol23.Usdin01. https://www.balisage.net/Proceedings/vol23/html/Usdin01/BalisageVol23-Usdin01.html
[Walsh 2016] Walsh, Norman. “Marking up and marking down.” Presented at Balisage: The Markup Conference
2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). doi:https://doi.org/10.4242/BalisageVol17.Walsh01. https://www.balisage.net/Proceedings/vol17/html/Walsh01/BalisageVol17-Walsh01.html (ABSOLUTELY)
[WASP 2008] Web Standards Project. “Buzz Archive: Validation.” https://www.webstandards.org/buzz/validation/index.html
[WHATWG parsing] WHATWG. “Parsing HTML Documents.” https://html.spec.whatwg.org/multipage/parsing.html
[HTML5 classes] WHATWG. “Predefined Class Names.” https://web.archive.org/web/20070505134313/http://www.whatwg.org/specs/web-apps/current-work/multipage/section-global.html#predefined
[HTML5 Data Attributes] World Wide Web Consortium. “HTML5. A vocabulary and associated APIs for HTML and XHTML,
3.2.3.9 Embedding custom non-visible data with the data-* attributes.” http://www.w3.org/TR/html5/dom.html#embedding-custom-non-visible-data-with-the-data-*-attributes
[Namespaces] World Wide Web Consortium. “Namespaces in XML 1.0 (Third Edition).” Namespaces in XML 1.0 (Third Edition)
[W3C 2016] World Wide Web Consortium. “RDF Working Group Wiki.” https://www.w3.org/2011/rdf-wg/wiki/Main_Page
[W3C 1999] World Wide Web Consortium. “World Wide Web Consortium Releases First Working Drafts
of XML Schema Specification.” http://www.w3.org/1999/05/schema-1st-wd
[W3C 2001] World Wide Web Consortium. “World Wide Web Consortium Issues XML Schema as a W3C Recommendation.”
http://www.w3.org/2001/05/xml-schema-pressrelease.html.en
[XHTML 1.0] World Wide Web Consortium. “XHTML 1.0: The Extensible HyperText Markup Language (Second
Edition).” https://www.w3.org/TR/xhtml1/
[XHTML 2.0] World Wide Web Consortium. “XHTML 2.0.” https://www.w3.org/TR/xhtml2/
[XHTML 2.0 WG] World Wide Web Consortium. “XHTML2 Working Group Home Page.” https://www.w3.org/MarkUp/
[W3C 2016] World Wide Web Consortium. “XML Core Working Group Public Page.” https://www.w3.org/XML/Core/
[WAI-ARIA] World Wide Web Consortium. “WAI-ARIA Overview.” http://www.w3.org/WAI/intro/aria
[YAML] “YAML: YAML Ain't Markup Language.” http://www.yaml.org/
×Berners-Lee, Tim. Weaving the Web: The Original Design and Ultimate Destiny of the WORLD WIDE WEB by
Its Inventor. New York: Harper San Francisco, 1999.
×Borden, Jonathan, and Bray, Tim. “Resource Directory Description Language (RDDL).”
http://rddl.org/
×Bosak, Jon, and Bray, Tim. “XML and the Second-Generation Web.” Scientific American, May 1999. Pages 89-93.
×Bray, Tim, Paoli, Jean, and Sperberg-McQueen, Michael. “Extensible Markup Language
(XML) 1.0: W3C Recommendation 10-February-1998.” https://www.xml.com/axml/
×Ensign, Chet. SGML: The Billion-Dollar Secret. Boston: Pearson, 1997.
×Goldfarb, Charles. The SGML Handbook. Oxford: Oxford University Press, 1990.
×Graham, Ian S. The HTML Sourcebook. New York: John Wiley & Sons, 1995.