Balisage Paper: Markup Category Terminology over the Years: a First Look
Allen H. Renear
School of Information Sciences, University of Illinois at Urbana Champaign
Allen Renear is a faculty member in the School of Information Sciences at the University
of Illinois Urbana-Champaign. He served as dean of the School from 2012 to 2019 and
is currently in the Office of the Provost as a Special Advisor for Strategic Initiatives.
Prior to coming to Illinois, he was Director of the Scholarly Technology Group at
Brown University. Renear was an Observer
at X3V1.TG8 during the finalization of ISO8879 and has never recovered. He has served
on several early TEI committees, was the American Philosophical Association’s delegate
on the first TEI Advisory Board, was involved in various roles in the Brown University
(now Northeastern University) Womens Writers Project, and was the first Chair of the
Open eBook Publication Structure Working Group (OEBPS, now ePUB/IDPF). He has been
coming to Balisage (and its predecessors) for longer than he can remember. His academic
specialties are data curation, scientific communication, and the conceptual foundations
of information systems.
Steven J. DeRose
Independent Consultant
Steve DeRose is a computational linguist who works mainly in document structured document systems, NLP, and hypertext. He holds degrees in Computer Science and Linguistics and a Ph.D. in Computational Linguistics from Brown University.
He co-founded Electronic Book Technologies in 1989 to build the first SGML browser
and retrieval system, DynaText
, and has been deeply involved in document standards including XML, TEI, XPath, XPointer,
EAD, Open eBook, OSIS, HyTime, and others. He has served as adjunct faculty in Computer
Science at Brown University and Calvin University, and written many papers and patents,
and two books. He is presently Head of Linguistics at Docugami, a Seattle-based startup
solving business document problems using AI.
Abstract
We’ve been doing the markup
thing for more than half a century, since the beginnings of computerized text processing.
In that time we’ve put a lot of adjectives in front of markup
that reflect how we think of and apply the markup. These qualifiers have tended to
fall into two categories, those that suggest what will happen, particularly in presentation,
as a result of the application of markup to a string of data and those that reflect
what we think about the data itself. Beginning with broad terms, like generic
we have made many attempts to elucidate what our our markup is intended to accomplish:
conceptual
, declarative
, logical
, structural
, and semantic
have all had their times in the spotlight. What do the changing fashions in terminology
say about our data and about what we, the practitioners, think about our work?