How to cite this paper

Durusau, Patrick. “I Know Why The Semantic Web Failed.” Presented at Balisage: The Markup Conference 2025, Washington, DC, August 4 - 8, 2025. In Proceedings of Balisage: The Markup Conference 2025. Balisage Series on Markup Technologies, vol. 30 (2025). https://doi.org/10.4242/BalisageVol30.Durusau01.

Balisage: The Markup Conference 2025
August 4 - 8, 2025

Balisage Paper: I Know Why The Semantic Web Failed

Patrick Durusau

Independent Consultant

Patrick Durusau is the Co-Chair of the OASIS Open Document Format for Office Applications (OpenDocument) TC and has been a member of that TC since its initial meeting on December 16, 2002. His employer/sponsor has changed several times over the years, and Patrick has been a co-editor/editor of the OpenDocument Format (ODF) for the majority of that time. Patrick is also the project editor for the ISO/IEC mirror of ODF as ISO/IEC 26300.

Patrick blogs about topic maps (being one of the co-editors of ISO 13250-5), other semantic issues and of late, how irregular forces can leverage data for their causes at Another Word for It.

CC-BY 4.0 Creative Commons Attribution 4.0 International

Abstract

Proposed: The Non-Oxford/Non-Webster Dictionary, Patrick Durusau, Editor, shall invent new means of identifying the meaning of previous words and a means for processing them. Despite computer science precedence this approach, no known dictionary has ever used it. Much like the Attention is all you Need linguists, dictionaries rely on existing data as a starting point.

As a practical matter, as well as human nature, users prefer names they already know for subjects; witness the persistence non-standard names or terminology, long after new lists have been invented. Character sets and Unicode points are one example. The Semantic Web took that a step further and invited users to create, at their own expense, identifiers for subjects they knew by other identifiers.

An origin paper sums up the problem we face with identifiers this way:

It turns out that given any term, there are many possible subjects that it could denote (to a greater or lesser extent) and conversely, any particular subject of knowledge (whether broad or narrow) usually can be denoted by different terms.

What if instead of adding to the sea of identifiers for subjects, we take inspiration from probabilistic database and large language models to develop a data-driven approach for subject identity? Instead of a universal exactness of subject identity, the degree of certainty or rather uncertainty, is acknowledged as a matter of design.