Introduction
As with any document collection, as a wiki grows its need for some form of organization increases proportionately. It's not uncommon for wikis to have hundreds or even thousands of pages, without even considering the rather atypical Wikipedia (which as of July 2008 claims “approximately ten million articles in 253 languages”). Yet Wikipedia has no explicit classification scheme.
Category Links
— Gregory Bateson, Steps to an Ecology of Mind
[…] knowledge is all sort of knitted together, or woven, like cloth, and each piece of knowledge is only meaningful or useful because of the other pieces.
A common (some might say traditional) approach to categorization[1] on wikis relies on untyped web links, usually camel-case prefixing a category name with the word "Category" (e.g., “CategoryHomePage”, “CategoryDiscussion”, etc.). Any wiki page containing such a link is considered categorized under that category. There a number of problems with such a system. First, any wiki page containing a given link is categorized accordingly even if that was not the intention of the link author, as the mere presence of a link – not its traversal – establishes the categorization. Another is that as the categorization of a given wiki page must be expressed on the page itself, there is no way to categorize a collection of pages from a different page, via a list or other external structure. The categorization naming scheme is also bound to the link names; there are no synonyms, homographs, equivalent terms; no way of characterizing the relations between terms, no thesaurus-like hierarchy, indeed, no explicit structure at all. One can add or remove categories from a page, but to change the categorization of a page one must change its name, which breaks existing links, and there is no facility for synthetic or combinatory categories apart from creating longer and more complicated wiki page names. It is a simple system and works reasonably well for small document sets.
In library science, free language indexing permits the assignment of any word as an indexing term. One type often performed via an automated process, the terms used in natural language indexing are derived from the natural language of a document rather than a controlled vocabulary of a classification system. Like free language indexing, user-created wiki categorization suffers in the same way due to mismatches between content and search terms and a lack of terminology control. One of the main questions faced in the design of our own system was trying to mitigate these problems.
Classification Systems
Despite being a Linux user, like most people buying a new laptop I had no choice but to purchase Microsoft Vista. One of the first things I did was remove all of the anti-virus software since it was so horribly annoying, such that subsequently going online would be akin to putting my head in an electric oven. Given that my use of Windows is strictly offline, deciding to write this paper in Word became a matter of discipline. Admittedly it does have rather beautiful typography.
Now, while scanning the shiny new Vista menus I came upon OneNote, a software application I didn't even know I owned. OneNote is touted as something I might be very interested in:
OneNote is an idea processor, a notebook, an information organizer — some even call it an "add-on pack for your brain." … a place for gathering, organizing, searching, and sharing notes, clippings, thoughts, reference materials, and other information. All your notes will be visible here — organized by notebooks, sections, and pages.
So how does it organize information? Typical of most software applications devoted to the organization of notes, it uses the staid metaphor of folders, which in the library world might be called an enumerated classification scheme: every subject gets a folder. In OneNote's case the central innovation is that you can put folders inside folders, and change the folder of an item at will. Even in their introductory documentation one can see a foreshadowing of the unfolding nightmare:
As you use OneNote and create more notes, you may want to organize your notes differently:
If you find yourself creating a lot of pages for a topic, try dragging them into a new section
If you find yourself creating many sections in one notebook, try putting some of them into a separate notebook
Make navigation between notes more convenient by creating several notebooks at the top level, rather than putting everything inside one notebook
As any librarian will tell you, the problem with enumerated schemes is that they are inflexible, don't take account of the fact that most resources don't fit neatly into one category (e.g., a book on 19th century French seascape paintings fits into at least four), and there is no ability to map context. OneNote users go through the same process as a librarian classifying a collection of books in the early 19th century: as a book arrives it gets put on a shelf next to books at that moment deemed similar. If there isn't a suitable shelf a new shelf is created. If a suitable category isn't clear, one doesn't have the time or has abandoned all hope, OneNote includes the obligatory category “Unfiled Notes”.
It's okay if this section gets big. You can drag the pages to other sections later, or just use search to find them in this section.
Do you get the feeling a lot of people store most of their notes in Unfiled Notes? The perennial question “where did I put that?” is once again answered by search. This is not an innovation.
Knowledge Representation
— John Steinbeck, Sweet Thursday
Suzy said, "You mean I'm fish?"
"You're fish," said Fauna.
“Knowledge representation” is a strange concept, when you think about it. What exactly is knowledge? How does one represent knowledge? A world of difficulty is opened up by the decomposition of even the simplest statements in natural language. While one might agree that the above sentences communicate information, do they also represent knowledge? Does the spoken word? How does the word “fish” relate to our conception of the word, its meaning in our minds? How is the concept of the word altered by its being composed within the context of the above statements, with their strange (perhaps erroneous) grammatical constructions? Does what we know about the author or our realization that the characters are fictional influence our understanding of the word? If we've not read the book, might we assume Suzy is a fish, instead of understanding that (in the “reality” or context of the book) Fauna is explaining Suzy's horoscope to her, that she was born under the sign of Pisces, the sign of the fish? Are the statements correct, and given any notion of judgment of such correctness, in what context can they be evaluated? And more obliquely, what does "being a Pisces" really mean?
Apart from astrology, these questions are properly in the realms of linguistics, semiotics, and phenomenology, the study of the relationship between directly-experienced reality and phenomena (i.e., our awareness of reality, as received via our perceptions) and how meaning is imparted to phenomena by culture.
Any system dealing with the interplay of knowledge representation and phenomenology is also complicated by the presence of ambiguity in natural language. While considered a favorable quality of poets and writers it isn't appropriate for description within the sciences, even though the terminology found in the domains of knowledge representation and ontological engineering is rife with it. As Allen Newell noted in his influential address to the AI community, The Knowledge Level Newell, the use of the term “knowledge” within that community is “informal”. Despite people's common anthropomorphization of machines, “intelligent systems” are not actually intelligent in any real sense; “knowledge based systems” do not operate based upon human knowledge but on a fixed program or set of rules; likewise “common sense reasoning” relies on mistaken notions of a “common sense” and “reasoning”. The list goes on. The historical use of seductive marketing language as a means of obtaining grant funding made the introduction of “semantic” to the lexicon of the Web almost inevitable. Now everything is “semantic”.
Semantics, Shemantics
— Alan Kay
I don't know who invented water, but it wasn't a fish. –
The extant literature on the "Semantic Web" — as promulgated by the World Wide Web Consortium (W3C) and the plethora of academics now beavering away under its auspices — suggests a world populated by engineers with little grasp of the last half-century's progress in either philosophy or library science, designing systems that promise abilities that cannot possibly be delivered, ignoring profound epistemological issues, linguistic and cultural chasms that have never been breached, at the same time passing over the enormous successes within the field of library science in organizing and classifying information.
In what might be considered part of the ongoing tradition of 1920's visions of bubble cities on Mars, 1970's visions of chatty household robotic butlers enlivened by “artificial intelligence” as they deliver our martinis[2], the “Semantic Web” often appears as merely the latest flight of future fantasy (italics courtesy of the author):
The Semantic Web is an evolving extension of the World Wide Web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. It derives from W3C director Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. WikipediaSW
Even the short excerpt above, taken from Wikipedia's SW page (and typical of the hype), is riddled with terminological problems. Decomposing any one of these terms raises issues that are never addressed in any except the most hand-waving fashion, not to mention the anthropomorphization of the Web itself. We are meant to understand that machines will “comprehend semantic documents and data” BernersLee2001, where “comprehend” is defined by the Random House dictionary as to understand the nature or meaning of; to grasp with the mind. This is surely the realm of science fiction.
Other elements of the SW are expressed in a large number of formal specifications; in their entirety they are intended to provide a formal meta-framework for description and use of concepts, terms, and relationships within a given knowledge domain.
If the language describing these designs is to be believed, the systems themselves must have found solutions to some rather nagging epistemological problems, in particular the connection between informal language usage, computational linguistics and formal logic. The reality of the SW hardly seems to be the following of a unified vision, but merely the ongoing collection of normative specifications for markup syntaxes, several dozen “vocabularies” and roughly fifty technologies and assorted software projects that purport to be part of this grand vision. SWOver
While this profusion of activity might seem to indicate success, there is nothing in any of this work to indicate that any of it has broached the real problems of representing knowledge, if indeed that is something that is actually possible. Robert Brandom suggests
When we try to understand the thought or discourse of others, the task can be divided initially into two parts: understanding what they are thinking or talking about and understanding what they are thinking or saying about it. My primary aim here is to present a view of the relation between what is said or thought and what it is said or thought about. The former is the propositional dimension of thought and talk, and the latter is its representational dimension. The question I address is why any state or utterance that has propositional content also should be understood as having representational content. (For this is so much as to be a question, it must be possible to characterize propositional content in nonrepresentational terms.)
The answer I defend is that the representational dimension of propositional contents should be understood in terms of their social articulation – how a propositionally contentful belief or claim can have a different significance from the perspective of the individual or claimer, on the one hand, than it does from the perspective of one who attributes that belief or claim to the individual, on the other. The context within which concern with what is thought and talked about arises is the assessment of how the judgments of one individual can serve as reasons for another. The representational content of claims and the beliefs they express reflect the social dimension of the game of giving and asking for reasons. Brandom
A key proposition here is that the entire enterprise of shared representation is a form of social agreement, an instance of human communication, not an expression of a universal truth, and that the value of a proposition is not its truth but its utility. This viewpoint is not necessarily shared by a unanimity of philosophers but Brandom's argument does reflect a distinct break from analytic philosophy within the ranks of contemporary philosophers following the work of Ludwig Wittgenstein, whose Philosophical Investigations (Philosophische Untersuchungen) was published in 1953, denouncing the “dogma” in the earlier analytic philosophy of his Tractatus Logico-Philosophicus. More generally, this line of pragmatists and “neo-pragmatists” influenced by Wittgenstein includes John Dewey, Wilfred Sellars, Richard Rorty, Jürgen Habermas, and Hilary Putnam, among others.
Informality
— Chuang Tzu, ~320 B.C.E (Burton Watson, trans.)
Words are not just wind. Words have something to say. But what if what they have to say is not fixed, then do they really say something? Or do they say nothing? People suppose that words are different from the peeps of baby birds, but is there a difference, or isn't there?
It should be noted that as described above, we could hardly design a system that relied on formal logic. The design is therefore distinctly informal, i.e., it is not backed by a formal model conforming to any foundationalist or otherwise representational logic. Its organizational structure is instead considered as a form of human communication — a classification system with no a priori terms — where the meaning of a term is identified entirely with its usage, akin to Inferential Role Semantics IRS. In this regard the system should not be considered formal in any sense; it is by definition a system that supports an informal ontology.
That said, the use of terms that in other contexts might denote an expression of logic (e.g., “type”, “class”, “subclass”, etc.) should not be considered as bridging any logical chasm — they are just convenient natural language terms; no claims are made to Platonic, universal terms or self-justifying beliefs.
This is epistemologically a delicate balancing act. In what is known as
deductive-nomological theory, the causality of declarative sentences is
replaced by what are termed constant regularities, where rather than say that
A
causes B
we only state that
A
is regularly followed by B
. As Manuel DeLanda
notes, it is only theory-obsessed philosophies that can afford to forget about causal connections
and concentrate exclusively on logical relations.
DeLanda
We need to be clear that this is not an attempt to subordinate what might otherwise
be considered statements
of causality or truth claims behind linguistic statements, rather to claim that this
is
all they entail – the interpretation (and thus error-checking) of the
system is entirely in human hands.
It could be argued that all computer-based ontological systems such as the SW fall under this same set of constraints; a further discussion is out of scope for this paper. Suffice it to say that I consider the flaws in putatively formal systems due to epistemological issues as so profound as to demonstrably render them useless, or worse, useful (and therefore dangerous). It is much safer to design a system that is by design an informal toy and is delivered with a healthy disclaimer of having no truth conditional claims, again, as merely to facilitate expressions of human communication.
One of the central tenets of the philosophical investigation of the past century is an understanding of the failure of logic to describe the world [3], and that the role of philosophy is not to provide a framework (logical or otherwise) for understanding the world but, as someone like John Dewey or Jürgen Habermas might advocate, the therapeutic tools to improve it. This has had a profound effect on the nature of epistemological inquiry, indeed, some to suggest that it, like God, has died. But whether or not epistemology has died, it's perhaps time to pay a bit more attention to it even as it wriggles on the ground.
Semantic Wikis
There exist a growing number of “structured” and “semantic” wiki projects, many basing their structure on category and/or typed links, others incorporating formal ontology (e.g., RDF or OWL) features directly into the wiki. Some, like Wikipedia's Semantic MediaWiki SMW, use “semantic annotations” to characterize content according to an implicit or explicit vocabulary of terms.
In many cases these “semantic wikis” may provide a logical framework but often no
classification scheme
whatsoever, so we may have IsSubclassOf
but nothing to subclass. It's a wiki categorization
scheme on steroids, i.e., this is categorization, not classification. There are a
number of difficulties with
this approach, including the complexity of the augmented wiki syntax; weak or absent
metadata; a lack of
user-friendly schema documentation; the requirement of user expertise; lack of support
for synonymous, homographic
and polysemous (multireferential) terms, and other limitations in the available classification
schemes (where they
exist at all). Categorization terms or annotations are themselves often ungrounded,
undefined, or when they are,
they force users into a predefined schema, often defined in a formal logic not otherwise
followed by, related or
appropriate to the user-created structure, such as using set-theoretic language to
describe the genealogical
relationships.
This is in addition to the more profound problems of: linguistic/grammatical ambiguity (e.g., homographs, polysemes, unconnected synonymous terms); lack of context (e.g., temporal, spatial, cultural, individual use of language); ill-, undefined, or misspelled terms; ill- or undefined subject identity; the recursive requirement for a core of a priori terms based on closed-world, monotonic reasoning, or other epistemological and ontological errors. Many if not most of these problems are insurmountable unless one forces users to understand and follow proscriptive rules, lives in ignorance or denial, or throws one's hands up declaring defeat.
The Wrong Solution to the Wrong Problem
— Gregory Bateson, Pathologies of Epistemology, Ibid.
When you have an effective enough technology so that you can really act upon your epistemological errors and can create havoc in the world in which you live, then the error is lethal.
These ambitious projects are typically “knowledge modeling” tools and are not a suitable fit for use by anyone except those whose expertise and expectations comes from the world of artificial intelligence, knowledge representation, expert systems and the like, people who already understand and/or at least accept terms and technologies like RDFS, OWL, XSDT, SPARQL, and GRDDL. Even for those who putatively do, how many truly understand the ontological commitments they are making? Given that the entire RDF-based world of the SW is predicated on a principle of entailment, defined by Pat Hayes in RDF Semantics as
the key idea which connects model-theoretic semantics to real-world applications. […] Through the notions of satisfaction, entailment and validity, formal semantics gives a rigorous definition to a notion of "meaning" that can be related directly to computable methods of determining whether or not meaning is preserved by some transformation on a representation of knowledge. (RDFSemantics, §2)
Should we expect the layperson to understand and accept this? Pat is trying to be helpful when he writes:
Readers who are familiar with conventional logical semantics may find it useful to think of RDF as a version of existential binary relational logic in which relations are first-class entities in the universe of quantification. Such a logic can be obtained by encoding the relational atom
R(a,b)
into a conventional logical syntax, using a notional three-place relationTriple(a,R,b)
; the basic semantics described here can be reconstructed from this intuition by defining the extension ofy
as the set{<x,z> : Triple(x,y,z)}
and noting that this would be precisely the denotation ofR
in the conventional Tarskian model theory of the original formR(a,b)
of the relational atom. (RDFSemantics, §1.1)
If we build systems for use by the public with the knowledge that those systems seldom define or follow a formal model theory[4], and knowing that the average user will hardly pay attention to this, much less understand it, can we be said to be acting responsibly as designers and engineers?
First, it's not the Web that is monotonic (whatever that would mean) but the reasoning from Web resources that must be monotonic. […] Nonmonotonic reasoning is therefore inherently unsafe on the Web. In fact, nonmonotonic reasoning is inherently unsafe anywhere, which is why all classical reasoning is monotonic; this isn't anything particularly new. But the open-ended assumption that seems to underlie much thinking about reasoning on the semantic web makes the issue a little more acute than it often is in many of the situations where logic has been put to use in computer science.
For example, if you are reasoning with a particular database of information, it is often assumed that the dbase is complete, in the sense that if some item is missing, then it is assumed to be false […]. But open-ended domains are not like this, and it is very dangerous to rely on this kind of reasoning when one has no license to assume that the world is closed in the appropriate way. If there were ever an open-ended domain it is surely the semantic web. […]
The global advantages of monotonicity should not be casually tossed aside, but at the same time the computational advantages of nonmonotonic reasoning modes is hard to deny, and they are widely used in the current state of the art. We need ways for them to co-exist smoothly. Hayes2001
John McCarthy and Pat Hayes were publishing papers on the situation calculus while Richard Nixon was president of the US McCHay69, and it's not that there hasn't been significant research in knowledge representation in the decades since. The frame problem and calculus accounts for time, concurrency, processes, uncertainty, and decision theory within the situation calculus provide an “embarrassing richness” of proposed solutions (Reiter2001, 44-46; 149; 283; 335); there has been halting but steady progress in common sense ontologies, including the open source release of the Cyc ontology and associated tools as OpenCyc OpenCyc; and there is a lively and ongoing bi-yearly conference schedule on context Context07. After all, NASA has a couple of semi-autonomous robots very successfully roaming around Mars. Still, almost nothing has been done to breach the epistemological divide between knowledge representation theory and the real world of language and human culture, and contemporary research in related and relevant disciplines such as computational linguistics, narratology and philosophical inquiry hardly seems to be on SW researchers' radar. Much of the work has been focused on syntax and schema development.
One approach might be to use these sophisticated tools without fully understanding
them, as
Tim Berners-Lee writes that Semantic Web researchers […] accept that paradoxes and unanswerable
questions are a price that must be paid to achieve versatility
BernersLee2001.
The problem here is not so much the possibility of paradox, it is the premise that
the Web and
its users can be made to behave according to rules of entailment rigid enough to produce
satisfactory results. If money, health, safety, or security
are involved this hardly seems responsible.
Another approach might be to chill out, not take life (in particular, machine reasoning and computer-based ontologies) so seriously, or perhaps to stop using formal logic to describe the informal world and go elsewhere for a solution to the problem of organizing information.
Librarians are Sexy
In addition to not paying much if any attention to epistemology, the SW folks at the W3C don't seem to have paid much attention to the demonstrable success of metadata and classification schemes within the field of library science. Now, not all the problems the SW is trying to address are related to information organization (e.g., “intelligent” agents, military surveillance) but it does seem telling that library science has been so summarily ignored. It's perhaps not as sexy as AI, but is demonstrably a lot more important in this age of infoglut: we need information organization a lot more than we need robotic butlers, even if we did have complete trust in their ability to make important decisions for us.
Despite the close ties between the W3C and the Online Computer Library Center (OCLC) and its Dublin Core metadata project, a recent search of the over 115,000 pages on the W3C Web site for the word “librarian” returns only 59 hits, and only 324 for “digital library”[5]. So perhaps it's not too surprising that a site search for an important text on information organization published by the MIT Press – MIT is the founding organization of the W3C – produces no hits. Not a one.
For those not acquainted with Elaine Svenonius' book The Intellectual Foundation of Information Organization Svenonius2000, they might be surprised to find lucidly stated solutions to many of the thorny “semantic” problems of information systems, vocabulary and metadata design, as well as other issues germane to the SW. This is not simply theory but a description of demonstrably functional approaches to information organization that are current practice within the library world. We'll revisit this later; suffice it to say that Svenonius' book has been highly informative in the design of this project.
Background
This section describes the goals, history and requirements of the Ceryle Project, and the requirements that led to its design and implementation of the Assertion Framework.
The Ceryle Project
The original intent of the Ceryle Project – now in development for over seven years, admittedly a bit of a kitchen sink – was to develop a software application to organize notes, planning documents, bibliographic references and other research materials, a document management system that permitted user creation and management of its own classification system, a bounded (but only partly-controlled) vocabulary of terms and relations between those terms, where each term must be defined, and all relations typed, again by a defined term.
This was simply the next in a long line of attempts at organizing the matériel for a work of historical fiction that had overflowed dozens of paper notebooks since the early 1990s. Transcribing those notes to a computer hadn't solved the problem, as there were still neither any organizational principles nor tools; I found myself spending almost as much time in wading through the morass as I did in research and writing (and as a writer I'm an expert procrastinator).
After my involvement with the XML Topic Map 1.0 (XTM) specification, the project was also a means of exploring how XTM and other XML markup might be merged with various types of library classification systems in supporting such a software application.
The Ceryle Project began as a content management system (CMS) marrying a text editor, an embedded XML database, search engine, various categorization and classification features, and a graph visualization engine. Originally, Ceryle was a standalone UI application, but after a few years an embedded web server and wiki were added – largely to provide a local rich text editing environment as an alternative to the Java Swing UI. It now resembles a digital library application with a relatively sophisticated user interface as well as a web interface and nascent RESTful web services.
Māori Subject Headings & the Iwi Hapū List
The National Library of New Zealand has been a sponsor in the development of two projects that use a thesaurus database to organize Māori-related cultural materials:
Ngā Ūpoko Tukutuku / Māori Subject Headings (MSH) |
expresses a set of library subject headings in te reo Māori, with mappings to their English subject heading equivalents. Documentation (scope notes) and relations between terms are expressed in both languages. |
Iwi Hapū Names List (IHL) |
expresses the structure of Māori whakapapa (genealogy) from waka (canoe) to whakaminenga (confederation) to iwi (tribe) to hapū (sub-tribe or family group)[6]. |
These databases use the standardized thesaurus structure Z3919 to represent existing relationships between records (see Domain Ontology below for details).
The source materials exhibited hierarchical, mereological, collection, and other relation
types, which were shoehorned
into the existing thesaurus relationships. For example, the reflexive HasIwiOf
and IsIwiOf
relations are subclasses of the Z39.19 Broader Term and
Narrower Term relations, respectively. The use of thesaurus software as a means of
library classification system development is not uncommon (Turner, 111-117).
One benefit of the project is that it has permitted these two databases to be merged into one system while still maintaining their respective scopes. We were lucky in that of the thousands of records there were only two name clashes. Had there been a substantial number we may have had to resort to a Wikipedia-like name disambiguation strategy.
Requirements
In a nutshell, the basic requirements of the Assertion Framework were to provide a wiki environment that permitted expression of an underlying structure using a relatively simple wiki-like syntax. In more detail, the system should:
-
presume no underlying model; that the design be able to declare its semantics over clear and uncluttered ground
-
avoid the problems of the simplistic wiki category links but neither require users to learn a classification scheme, a set of rules for its use, nor any arcane, complicated syntax — the system was to be as "wiki-like" as possible. Any augmentation to wiki functionality should also be “wiki-like” and use a syntax that is easy to learn and understand, an informal system usable by untrained, lay people, not just “knowledge engineers”
-
permit structuring of the wiki based on explicit typed links (relations) between pages (with page names as terms), expressing a graph structure implemented as a user-editable, REST-conforming Web service REST, where each URL could be used as a subject identifier
-
permit the user-modifiable structure to be based on an underlying, unmodifiable endorsed structure, protecting the endorsed structure while permitting view differentiation of the endorsed- from the user-edited structure. This is not to advocate a set of truths but permit an expression and differentiation of a corporate and public opinion.
-
substantial parts of the semantics (particularly domain-specific predicates) should be community-definable
-
all terms in the vocabulary must be defined (and those definitions readily accessible)
-
expose the structure for improved user navigation
-
the underlying ontology (structure) must be exposed and shareable across multiple wikis, and use a standardized notation (in this case, XTM 1.0 was chosen since the wiki structure is maintained as a Topic Map)
-
the overall system must be capable of being archived in such a way as to permit reconstruction of any given moment in its history (so as to be able to reconstruct the context of a given assertion), likewise users must be able to view snapshots of a set of wiki pages at any given point in the wiki's history
-
be based on modifiable, popular (i.e., relatively stable) open source software
Design
This section describes the design of an Assertion Framework, a software library that provides capabilities for categorizing, classifying and organizing a repository or corpus of documents; how classification and ontology are defined and used in the project; the abstractions used in creating assertions and finally how these assertions fit together to create an informal ontology.
The following section then describes the implementation of this design.
Classification vs. Ontology
We have a readily available solution for an organizational structure that avoids the epistemological conundrums of computer-based ontologies by appealing to the functional (i.e., functionalist) approach of classification systems within library science.
Svenonius describes a subject language as an artificial language that is used to depict what a document is about. It is the language providing the terms used to compose a classification scheme. The chapter Subject Languages: Referential and Relational Semantics (Svenonius2000, 147-171) begins:
This chapter looks at the semantic treatment needed to transform a natural language into a subject language. […] a subject language is based on a natural language but differs from it primarily in the semantic structures it uses to normalize vocabulary by setting up a one-to-one relationship between terms and their referents. The referential semantics of a subject language deals with the generalized homonym problem. It consists of methods for restricting term referents so that any given term has one and only one meaning. The relational semantics of a subject language deals with the generalized synonym problem and consists of methods for linking terms within similar or related meanings.
If we consider each page on a wiki as the community's “canonical” information about a given subject, with its unique page title serving as a descriptor of that subject, the set of page titles taken as a whole comprises the subject language of the wiki. The wiki page name and its associated URL act as unambiguous subject identifiers; each use of a term is also a link to its reference documentation.
The terms in a subject language differ referentially from those in ontology or in natural language in that they do not refer to entities or concepts in the real world but to subjects. The extensional meaning of a term in a subject language does not refer to the class of all entities denoted by the term but to the class of documents that the term denotes. From an inferential role semantics point of view, the evolutionary development of a classification structure based on user design and behavior would (assuming active usage) tend towards validating those terms and relations between terms within the classification system that are frequently traversed, with infrequently or unused assertional links potentially deprecated or at least called into question. This warrants further investigation.
Homographs, Polysemes and Other Multireferential Words
One of the problems in classification schemes is due to the multireferentiality of words. Briefly, homographs are words that are spelled the same but have different meanings. Polysemes are words that are spelled the same, have different meanings, but their meanings are related. These constitute only two of a variety of what are called multireferential words. Even words that aren't usually considered multireferential may be interpreted with subtly different meanings when used in different contexts, domains of discourse, in different grammatical constructions, or by different people or groups of people (e.g., children vs. adults).
Library classification systems have dealt with the issue of multireferential terms in various ways. Svenonius notes that this can be accomplished either semantically, via domain specification, qualifiers and/or annotations, or syntactically (Svenonius2000, 148-155). The latter, called nonsemantic disambiguation, is a natural solution for wikis where the use of camel-cased combinatory terms as wiki page names is the norm.
Over the years wiki communities have discussed at length and less frequently experimented with the use of namespaces, similar to the use of colonized tokens in XML. There are a variety of problems with this, such as the incompatibility between various characters as delimiters and URLs, but even with a functional syntax the biggest deterrent is likely to be the additional complexity for the user.
Faceted Classification
Classification methodologies used in modern libraries have been profoundly influenced by S.R. Ranganathan's Colon Classification system ISKOI, developed in the 1930's. While Ranganathan's system[7] did not itself survive, it was subsequently a profound influence on other classification schemes, including both Dewey Decimal and US Library of Congress (LoC), and led to the development of Faceted Classification (FC), where the predominant terms of a given domain of knowledge are sorted into homogeneous, mutually exclusive conceptual categories or facets, each derived from the parent domain by a single characteristic. FC differs from a traditional enumerated scheme in that “it does not assign fixed slots to subjects in sequence, using instead clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics of a class or given subject.” Wynar.
As an example, Gregory Bateson's seminal work Steps to an Ecology of Mind contains the US Library of Congress Cataloging-in-Publication (CIP) data on its copyright page, used by libraries in cataloging the book. This includes:
1. Anthropology, Cultural – Collected works. 2. Knowledge, Theory of – Collected works. 3. Psychiatry – Collected works. 4. Evolution – Collected works. I. Title. II. Series. [DNLM: 1. Anthropology, Cultural – collected works. 2. Ecology – collected works. 3. Evolution – collected works. 4. Schizophrenic Psychology – collected works. 5. Thinking – collected works. GN 6 B329s 1972a].
Anyone interested in this book has a good chance of locating it by subject amongst the plenitude found in a library card catalog, even if they cannot remember the title or author. But it is rather unfortunate that the categories listed for Steps to an Ecology of Mind don't include cybernetics, cognitive science, and systems theory, though Bateson's ideas have been profoundly influential in the foundation of these sciences. We can perhaps forgive the librarians who classified a book that has broken so many boundaries. All classification schemes have their limitations and failings, including the very human problem of an inability to see into the future. The flexibility of allowing classifications to evolve without breaking is extremely important, especially during the early stages, where categories are still relatively unformed. Notably, this problem plagued expert systems implementations, where later changes to root level concepts were very difficult to accomplish (e.g., a change of logical foundations, such as from first order to modal logic).
Many of the refinements in the system have been in the approach to defining the interrelationships between terms. There are two fundamental Faceted Classification relationship types: semantic and syntactic.
A semantic relationship is independent of context and is by definition a permanent relationship, arising from the subjects involved. Semantic relationships are further divided into three types: equivalence relationships, where terms are synonyms (and would be cross-referenced under the same category); two types of hierarchical relationships, superclass-subclass and mereological (whole-part); and affinitive/associative relationships, which are domain-specific relationships such as cause and effect, coordination, sequence, genetic, etc.
A syntactic relationship is an ad hoc relationship denoting otherwise unrelated concepts brought together as a composite subject in a specific context. This is the equivalent of how terms would appear together in a common sentence. It operates similarly to how we use keyword searches on the Web, such as “biblical archaeology”, “poetry by 12th century Japanese authors”, or “medieval weapon manufacture”. This approach is at the heart of the resource organization approach taken by most wikis in their use of camel-cased wiki page names. In the Assertion Framework we didn't set out to avoid the use of wiki page names as subject identifiers, but were looking to supplant the klunky use of category names by use of typed assertions between pages.
A highly recommended text on library classification schemes is Essential Classification Broughton, whose author is an active researcher in Faceted Classification.
User Tagging or “Folksonomies”
A user tag permits the non-hierarchical assignment of a term or keyword to a resource. The syntax of user tagging in most social software sites (e.g., del.icio.us, Flickr) is a simple comma- or whitespace-delimited list of tokens. Whereas on these sites users are generally permitted to add tags to a resource at will (absent any validation or even spell-checking), in our case this would mean that many or most tags would remain undefined and outside the Assertion Framework.
Tags could be seen as keywords in a Dublin Core list of subjects and would in our implementation be therefore searchable, but we wanted tags to play a larger role than that of metadata given their current popularity.
While Faceted Classification permits synthetic classification, its facets are still usually derived from a controlled vocabulary. If each tag were represented by a wiki page, where each definition is discoverable, tags could then be seen as facets. The decision was therefore made to require each tag to be represented by a wiki page, even if this may be seen by some users as onerous. We don't yet have feedback that would permit us to evaluate how this may affect the willingness of people to tag resources.
Assertions
The basic graph design centers around a simple construct called an Assertion, a form of simple declarative sentence using the Subject-Predicate-Object grammar common to natural language[8] as well as other abstract syntaxes (e.g., RDF, where it is called a triple).
In this model each of the three components of the sentence is a Term:
Subject Predicate Object
E.g., an Assertion stating that Gary Snyder is a poet might look like:
GarySnyder IsA Poet
The Terms available for use in Assertions are not composed from a controlled vocabulary in the library sense, but given that the validation of a new Assertion uses the set of Terms dynamically derived from the extant wiki pages, the vocabulary may not be controlled but is at least nominally defined. Users may create new wiki pages as necessary to make available the Terms necessary to their Assertions.
The interpretation of an Assertion is similar (but not equivalent) to the definition in RDF Semantics:
The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be. (RDFSemantics, §1.3)
Throwing out the assumption of a model theory and discarding any notion of a universal truth, we may more usefully consider an assertion in a socio-linguistic sense: a statement of belief made by a person about the world. Given that these Assertions are being made by many people in the collaborative environment of a wiki, this is quite a reasonable stance. In other words, we may infer – based on the stated beliefs of the community – a statement of potentially equal believability.
The set of the wiki's extant Assertion triples forms a bipartite graph, where each pair of Subject and Object Terms are graph nodes linked by their respective Predicates. This bounded corpus of Assertions taken together is considered a form of informal ontology[9].
Core Predicates
In simple terms, taxonomy is a restricted form of ontology, a hierarchy where all relations share a common ancestor type. Because our design uses hierarchical as well as other relation types it cannot be considered a proper taxonomy, but it does include a taxonomic “layer”.
Ontological hierarchies can be composed of a variety of formal relations, based on sets, classes, collections, etc. (depending on the underlying logic, derivation, and context of use). Because we are distinctly trying to avoid connoting formal (e.g., set theoretic) foundations as well as promote simple natural language terminology, the names of the predicates avoid formal titles but make reference to a formal, traditional relation type in their documentation.
There are three fundamental Predicates that compose the core of the ontology:
Kind Of |
the superclass-subclass relation |
Is A |
the class-instance relation |
Part Of |
the mereological part-whole relation |
The "Kind Of" and "Part Of" assertions together form a hybrid hierarchy, with instances or individuals related via "Is A" Assertions[10].
Upper Ontology
As with many ontological models, all graph nodes are some descendant of a root Term called Thing (“everything is a kind of thing”), such that the system is composed of a single connected graph. All Predicates are some descendant subclass of an ur-Predicate called Relation. The core or base ontology is a very simple, as shown in Figure 3.
Most Terms are related via a Kind Of (superclass-subclass) relation. The basic foundation of the ontology is primarily taxonomic, with some additions of mereological components.
Support for Faceted Classification is provided by the relation
Has Facet |
the facet relation |
With the addition of various additional relations the “upper ontology” contains roughly 50 terms and 50 relations.
Domain Ontology
Domain ontologies using additional affinitive or associative relations are considered as “layers” on the Upper Ontology's taxonomical/mereological structure. Being domain-related, these are the most likely to be augmented and/or modified by users. Syntactic relationships are considered as the coming together of other relationship types, e.g., in constructing a character, a setting, an event, or a scene in a narrative ontology.
The project currently includes two domain ontologies:
Z39.19 |
includes the complete set of terms and relations as described by the Z39.19 standard. This includes complete role and relation templates for all Z39.19 predicates actually used by the system. All Z39.19 terms and relations are included as Terms even if currently unused. |
Māori Subject Headings / Iwi Hapū Names List |
this includes the set of terms and relations used in the Māori Subject Headings and Iwi Hapū List, with mappings to equivalent relations in Z39.19. |
Other domain ontologies developed for the Ceryle Project may be included as needed in the project.
Relationship to Z39.19
The ANSI/NISO standard Z39.19 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies (Z39.19 2005) provides features for support of controlled vocabularies, thesaurus terminology and base relation types.
This is used in the context of the wiki using the following definitions (where Z39.19 definitions are in bold):
-
each wiki page is a precoordinated term reifying a single concept or subject
-
the wiki page name is considered the subject heading
-
wiki Assertions are used rather than untyped links between pages to establish semantic relationships
-
the inherent predicate of each Z39.19 semantic relation is reified as a wiki page
-
the wiki is considered a controlled vocabulary as dynamically bounded by gamut of wiki pages.
-
all domain-based predicates are likewise defined as wiki pages
Z39.19 defines three types of semantic (as opposed to syntactic) relationships used in controlled vocabularies: Equivalency (§8.2); Hierarchy (§8.3); and Association (§8.4).
Equivalency relations include synonymy, near synonymy, and lexical variants. This is notably not an equivalency relation under any logic, i.e., it does not relate equivalent concepts or entities but equivalent terms. Our ontology maps these to Use and UsedFor. (This does not map to the logical connective Equivalent relation found in our middle ontology, which is used only for logical relations.)
Our project mirrors the Hierarchy relations, though sadly the Z39.19 standard reflects a common error in knowledge representation texts of conflating IsA with a superclass-subclass relation. This results in Z39.19 using the same IsA name for the “Generic Relation” (ostensibly, superclass-subclass) and “Instance” (class-instance) relations. We provide the two Z39.19 generic relations as IsBroaderThan and IsNarrowerThan, with IsRelatedTo used when hierarchy is unknown or otherwise not implied. In our ontology we use IsA for class-instance and KindOf for superclass-subclass, where the latter may be seen as a synonym for IsNarrowerThan. The mereological whole-part relation is in our ontology named PartOf. Note that we do not differentiate set theoretic from collection-based semantics; this is not a formal system.
Z39.19 defines eleven Associative relations (e.g., Cause / Effect, Process / Agent), and while they are included in the ontology and instantiated as wiki pages, they are not used in our source materials hence are not described here further, but again, these are considered as relations between vocabulary terms, not logical relations.
Endorsement
The concept of an endorsed set means that some Terms and Predicates are protected from being removed or redefined by users, as they are considered part of the required semantics of the application. This was a requirement of the system, in that the set of terms and relations delivered to the public may be seen as a corporate endorsement of their definition or validity. Given that the public is to be permitted the ability to modify portions of this information there must be a way to either protect endorsed content or at least be able to differentiate the endorsed from unendorsed. This is done via scoping (in the Topic Map sense).
User-created Assertions (i.e., those made via plugins on wiki pages) currently have no ability to express scope, so their scope in the resulting Topic Map is hardwired by the plugin itself.
If desired, administrators of the system may wish to develop a workflow-based endorsement process (policies and practices) whereby user-created terms and predicates can be migrated into the endorsed ontology.
Implementation
While the expression language and the functional implementation of the Assertion Framework is largely tied in this instance to its origins in the Ceryle Project and/or its current wiki implementation, in abstraction most elements of this design could be reused in non-wiki applications.
A Software “Framework”
One aspect of Ceryle that was clear early on was that in addition to supporting the stated functional requirements it also needed to serve as its own development laboratory. Over the years there have been a large number of experiments with features, software libraries, conversion tools, visualization capabilities, some of which have survived, though many only in backup archives. As a framework it has gradually matured and now includes user scripting and various extensibility features. The Assertion Framework described in this paper is a module added to the embedded wiki functioning within the Ceryle standalone application, but has also been installed as part of a generic JSPWiki distribution running on Apache Tomcat; in the latter case there are no dependencies on the standalone Ceryle application.
Choice of Wiki Software
There were two primary requirements that led to the addition of a wiki: the ability to provide simple formatting markup in a non-WYSIWYG text editor; and to provide an online and/or collaborative editing environment that could function in either a standalone (local) or online environment.
The availability of software libraries, compatibility with the Ceryle application as a development environment, and my own previous experience made Java the programming language of choice. An evaluation was made among the offerings of web servers and wiki software, based on features, ease of embedding, extensibility, stability, and the health of their respective communities and/or sponsors. Jetty[11] was chosen as the embedded web server (there was little competition). No modifications to Jetty were needed apart from some configuration and embedding code. Following an evaluation of available open source wiki software applications, JSPWiki[12] was chosen as the most versatile and extensible for our purposes. In the past year JSPWiki has been selected by the Apache Foundation as its wiki solution.
Data Conversion
The initial content for the present implementation was derived from the Māori Subject Headings and Iwi Hapū Names List thesaurus (Z39.19-compliant) databases and converted via Groovy scripts into interim XHTML documents, which were then converted into Linear Topic Map (LTM) source files to provide a backing structure for the wiki as well as generating an individual wiki page for each record.
Initially, the database export was only available as a plain text format. The MSH database software has been recently upgraded and now supports an XML export, so the Groovy script was updated to process the new format. This could have been accomplished via XSLT but the existing investment and rapid development of the scripting solution was considered important.
WikiEvents and the WikiEventManager
While JSPWiki was considered a good foundation, earlier versions of JSPWiki did not include any event handling code, so a general purpose WikiEvents API, a number of WikiEvent classes and a WikiEventManager (see Figure 4. WikiEventManager) were developed for this project. This permitted application-level events to be fired as wiki pages were created, edited, renamed, deleted, and displayed. This general purpose wiki events API and implementation are now part of the Apache JSPWiki distribution.
Each wiki page editing session fires a number of associated WikiEvents. By filtering on event type we can capture specific life cycle events for a wiki page.
Wiki Plugins
Given that the basic idea was to permit users to make Assertions, we needed a means of expressing them using a simple wiki-like syntax. JSPWiki has a number of extensibility features, the WikiPlugin API being the most suitable for developing syntax-level functionality.
JSPWiki augments its formatting and linking syntax with a plugin syntax, where a reference to the plugin name is followed by a plugin body, which may contain one or more named parameters. For example, to transclude the wiki page “CopyrightStatement” using the InsertPage plugin one would include the following wiki text:
[{InsertPage page='CopyrightStatement'}]
When the WikiEngine encounters this plugin syntax it instantiates a plugin object based on the plugin's name as mapped to a Java class name. The plugin object's method is called and the plugin then performs its programmed tasks.
From the user's perspective, our entire system is implemented via a set of custom plugins, as described below. We've therefore augmented the default JSPWiki wiki text syntax with an additional assertion-related vocabulary, this vocabulary in keeping with the rest of JSPWiki's plugin syntax. Users already familiar with other JSPWiki plugins would find nothing unusual using the syntax of these new assertion features, and in fact, some of the plugins simplify the JSPWiki plugin syntax by not requiring use of explicit parameters.
Assertion Plugins
The implementation provides both assertion expression and query plugins, including:
Plugin Name | Alias | Description |
---|---|---|
AssertionPlugin | Assert | assert a single Subject-Predicate-Object sentence |
AssertionFormPlugin | AssertForm | asserts properties based on a form template |
AssertedPlugin | Asserted | query for Assertions matching a given pattern |
AssertTagPlugin | Tag | assert one or more tags as properties of the Subject |
HasAssertedTagOfPlugin | HasTagOf | query for wiki pages matching one or more tags |
TopicMapPlugin | TopicMap | query the backing Topic Map (administrative use) |
The AssertionPlugin syntax is quite simple, just wrapping the abstract assertion syntax:
[{Assert [GarySnyder] IsA [Poet] }]
Assertion Templates[13] use the same syntax but include a flag parameter, using the links to assert the roles played by the subject and object in Assertions of that type:
[{Assert template='true' [Instance] IsA [Class] }]
Likewise, the query syntax is similar but uses parameters:
[{Asserted predicate='IsA' object='Poet' logic='and' }]
This would return all Assertions having a Predicate “IsA” and an Object of “Poet”. Regular expressions are also supported. The current wiki page can be referenced in the syntax by “[.]”.
In practice almost all of the plugins developed for the project are domain-specific subclasses of either the AssertionPlugin or AssertedPlugin, developed to provide a specific feature based on a shorthand wiki text syntax. Some plugins are aliased to provide simpler or alternative names or to preset certain default values. For example, the Z39.19 predicates (e.g., “IsNarrowerThan”) were all included by presetting the current wiki page as the Subject and a fixed predicate name so that rather than
[{Assert [Whare] IsNarrowerThan [WhareT?puna] }]
the user need only type
[{IsNarrowerThan WhareT?puna }]
It is hoped that these types of syntax simplifications will keep users from getting bogged down in the details of the syntax. Even the use of the word “assert” might be seen as overly technical by many. As the terminology used is easily changed we look forward to improving the system via user feedback.
User Tagging
A WikiTag feature was developed to provide the ability to tag a wiki page, query pages matching one or more tags, or display a tag cloud. This again uses wiki plugins:
Plugin Name | Alias | Description |
---|---|---|
TagPlugin | Tag | assert one or more tags as properties of the Subject |
HasTagOfPlugin | HasTagOf | query for wiki pages matching one or more tags |
TagCloudPlugin | TagCloud | display a tag cloud based on a query |
Users can add tags to a page using a simplified plugin syntax:
[{Tag Person Actor Honcho }]
This uses a TagManager that functions independently of the Assertion Framework, so that JSPWiki installations can be provided with a user tagging feature and augment the wiki at a later date with the Assertion Framework without any change in syntax. In this case, the only change necessary is the addition of the Assertion Framework library (jar file) and the re-aliasing of the plugins, e.g., changing the “Tag” alias from the TagPlugin to the AssertTagPlugin, the “HasTagOf” alias from the HasTagOfPlugin to the HasAssertedTagOfPlugin. JSPWiki provides this configuration feature via an XML file.
User tags are instantiated within the system as facets, such that the actual relationship between a page and one of its tags is a HasFacet relation. While this is a form of typed property assignment, the system does not currently support property typing of tags, though this can be done using via an assertion, eg.,
[{Assert [Habermas2006] DC_Title ="The Divided West" }]
But this kind of thing might be seen as pushing the envelope on a wiki…
Assertion Framework
When an instance of an AssertionPlugin is parsed on a wiki page it fires a subclass of WikiEvent called an AssertionEvent. An event listener filtering for AssertionEvents passes this on to a parser which generates an Assertion object corresponding to the parameters of the plugin syntax. In addition to Subject, Predicate, and Object we also capture the name of its origin wiki page and a creation timestamp[14].
The system supporting the expression, generation, caching and querying of Assertion objects is called an Assertion Framework (see Figure 5), which can be thought of in both static and dynamic terms: static in the sense that at any point in time a snapshot of the system provides the set of extant Assertions; dynamic in the sense that the system is mutable, continually changing as people create and modify the wiki pages.
The workhorse of the system is the AssertionHandler, which handles all incoming Assertion objects, and managing a cache containing the set of all Assertions on the wiki. The initial set of Assertions is generated by an AssertionCrawler scan of all pages.
The AssertionHandler also provides methods to query the set of Assertions.
WikiMapManager
In the current implementation, the life of the AssertionHandler is very short. While the Assertion Framework using an AssertionHandler is a functional system, our requirements are better met by queries from a Topic Map.
Following initialization of the WikiEngine on startup, the AssertionCrawler crawls the wiki pages. Once this has finished it fires a completion event which triggers creation of a WikiMapManager (see Figure 6), which also implements the AssertionHandler API.
The WikiMapManager obtains the set of Assertion objects from the existing AssertionHandler and converts them into Topics and Associations in an in-memory TopicMap object using the TM4J Topic Map engine.
Given our model of Assertions as simple binary relations, the translation to a Topic Map is straightforward. As described previously, the Assertion object contains references to the wiki page names for Subject, Predicate, and Object, the origin wiki page and a creation timestamp. Each wiki page name reference is converted to a Topic whose base name matches the wiki page name, with the page URL as a subject identifier. The Predicate is further typed as some descendant class of Relation within the Topic Map's subsumption hierarchy for relations. The Assertion itself is converted to a Topic Map Association, typed by the Predicate's Topic and also reified as a Topic. The timestamp is added as a property to this Association. If any of the wiki page name references match an existing Topic name the existing Topic is used. We are not currently scoping page names but will likely add scope to all names in the future given the current development project is multilingual.
Once this conversion is complete the WikiMapManager usurps the functioning of the existing AssertionHandler (which is then destroyed), so that incoming AssertionEvents are now processed as modifications and queries on the Topic Map.
The author of an Assertion relies upon the ability to read the respective wiki page documentation related to the sentence components at the moment of its expression. Since the system is mutable it must also be possible to view a snapshot of the system at the moment when the expression was made in order for others to interpret the Assertion correctly.
Topic Map Interfaces
The TopicMapPlugin is currently used to perform diagnostic queries on the Topic Map from a wiki page. The result of a query via a wiki plugin can return an XHTML fragment or content which could be processed via a Java Applet, so this usage is limited to direct display or visualization features.
There have been a few Web service experiments that return Topic Map (XTM) fragments, e.g., Topic and Association serializations corresponding to a set of Assertion objects. Though an interesting feature, there is no specific application requirement for inter-wiki sharing of structures that can't currently be accomplished via the existing XML-RPC features, so this remains as yet undeveloped.
Note that “Inferencer” is shown as a small box in the diagram. This is to indicate that this is not some big fancy inference engine, just a single Java class that performs simple queries on the Topic Map, such as returning simple transitive inferences upon relations whose predicates have been marked as transitive, such as ancestor and descendant tests, plus a few utility methods. Most return a boolean value so that they can be chained. As this functionality can easily be accommodated within the current use of wiki plugins no additional interfaces have been developed.
Topic Map Modules
One of the features of the system is the ability to provide run-time augmentation with the addition of drop-in LTM files as Topic Map modules. For example, if a user creates a wiki page called “Portugal” and asserts that Portugal IsA Country, we look up the page name in the ISO 3166 Country Codes for Topic Maps, as well as making any additional assertions about that Topic based on any Associations in the Topic Map.
Modules available, in development, or planned include ontologies for languages and countries, places, place names, events, geocode, measurement and datatypes, mappings to other library subject headings, and various other subject domains. One idea in the works incorporates a place names to latitude-longitude database to permit inclusion of Google Maps. For example, any wiki page asserted as a place would look up its wiki page name in the place names database, then use a wiki plugin to insert a Google Map for that location. This remains an area ripe for further exploration, and will likely be determined in part by user feedback.
Endorsement
Implementation-wise, all endorsed Topics and Associations within the backing Topic Map are scoped by an endorsing Topic. All user-created content within the Topic Map is either left unscoped, or scoped as "Unendorsed". In the case of the Māori Subject Headings and Iwi Hapū Names List, their Topics are scoped accordingly so that we may in the future extract Topic Maps corresponding to their respective domains.
Wiki Features
With all of this additional machinery behind the scenes, what effect does this have on the user experience? It must be said that, to reiterate one of the requirements, there is little desire to expose a great deal of complexity, particularly for readers. We likewise have on occasion decided not to implement a feature when it was seen to significantly add to the complexity of either the browsing or editing experience. Thus, most of the Assertion Framework features are experienced as links, structure displays or improved search or navigation. For example, the generated output from an AssertionPlugin shows the Subject, Predicate and Object as active wiki page links. A hideable navigational panel has been added to the page footer that displays any assertions or other information queried from the Topic Map for that wiki page.
We still have not investigated fully the use of many of these features and look forward to doing so following public release of the service.
Next Steps
Some future features are planned and/or partially implemented:
-
the project is currently being socialized amongst funding organizations
-
further develop query and inferencing functionality
-
provide term equivalence, related term, term aliasing, and term disambiguation features, both manual (explicit) and automated (implicit) to deal with both homonyms and polysemes
-
provide enhanced search features based on assertions, tags, and collative terms
-
protect endorsed terms and relations using scope
-
permit scope-based filtering on queries
-
further develop basic ontologies and improve existing documentation
-
create forums for discussion of ontology development
-
with a current reliance on RecentChanges to manually discover errors, develop features for structural analysis and validation
-
develop features to assist users in classifying new wiki pages (terms)
-
develop drop-in Topic Map-based ontologies
-
wiki hive functionality (sharing classifications across multiple wikis)
-
develop administrative features
-
develop (likely off-line) visualization features for analysis of the wiki structure
-
develop better internationalization/localization support
-
develop a RESTful web service providing queries on the Topic Map
-
performance improvement & scalability testing
Conclusion
The Assertion Framework is still in a prototype state so there are many questions as to the viability of both the approach and implementation. An upcoming National Library of New Zealand project related to the Māori Subject Headings has been proposed that would use two wikis to perform a cross-dialect comparison of terms between the predominant te reo Māori language and a Taranaki dialect. We look forward to testing the system in a semi-controlled environment.
This paper has put a substantial focus on epistemological and design issues rather than on implementation details. The reason for this is that the central innovation is not seen as the marrying of a Topic Map engine with a wiki – any programmer could accomplish that – but in the investigation into what the use of this means, i.e., does the expression of declarative statements incur an entailment or not? And if so, of what order? Surely there is no single answer to this question, but in the same way as people commonly anthropomorphize machines (hell, we anthropomorphize rocks), this paper is, if anything, meant to inject a note of caution into inferring that the systems we develop do more than they do, that our claims notwithstanding we are in large part still playing with toys.
References
[Z3919] “ANSI/NISO Z39.19-2005: Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies.” U.S.A. National Information Standards Organization. 25 July 2005. http://www.eric.ed.gov/ERICWebPortal/resources/html/help/Z39-19-2005.pdf
[BernersLee2001] Berners-Lee, Tim, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American. 2001. http://www.sciam.com/article.cfm?id=the-semantic-web
[Brandom] Brandom, Robert. “A Social Route from Reasoning to Representing.” In Articulating Reasons, by Robert Brandom, 157-183. Cambridge, Massachusetts: Harvard University Press, 2000.
[Broughton] Broughton, Vanda. Essential Classification. London: Neal-Schuman Publishers, 2004.
[ISKOI] “Colon classification.” International Society for Knowledge Organization. http://www.iskoi.org/doc/colon.htm
[Context07] International and Interdisciplinary Conferences on Modeling and Using Context (CONTEXT). http://mainesail.umcs.maine.edu/Context/context-conferences/
[DeLanda] “Virtuality and the Laws of Physics.” In Intensive Science & Virtual Philosophy, by Manuel DeLanda, 117+. London: Continuum, 2002.
[REST] Fielding, Roy T., and Richard N. Taylor. “Principled Design of the Modern Web Architecture.” ACM Transactions on Internet Technology (TOIT) (Association for Computing Machinery) 2, no. 2 (2002-2005): 115-150. doi:https://doi.org/10.1145/514183.514185.
[RDFSemantics] Hayes, Pat. “RDF Semantics, W3C Recommendation.” W3C. 10 February 2004. http://www.w3.org/TR/rdf-mt/
[Hayes2001] “Re: Why must the web be monotonic ?” www-rdf-logic mailing list archives. July 2001. http://lists.w3.org/Archives/Public/www-rdf-logic/2001Jul/0067.html
[IRS] “Inferential Role Semantics.” Wikipedia. http://en.wikipedia.org/wiki/Inferential_role_semantics
[McCHay69] McCarthy, John, and Pat Hayes. “Some philosophical problems from the Standpoint of Artificial Intelligence.” Edited by B. Meltzer and D. Michie. Machine Intelligence (Edinburgh University Press) 4 (1969).
[Newell] Newell, Allen. “The Knowledge Level.” Artificial Intelligence 18 (1982): 87-127. doi:https://doi.org/10.1016/0004-3702(82)90012-1.
[OpenCyc] OpenCyc. http://www.opencyc.org/
[Reiter2001] Reiter, Raymond. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems. Cambridge, Massachusetts; London, England: The MIT Press, 2001.
[SWOver] “Semantic Web Overview.” semanticweb.org. 2008. http://semanticweb.org/wiki/Semantic_Web_overview
[SMW] Semantic MediaWiki. 2008. http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
[Svenonius2000] Svenonius, Elaine. The Intellectual Foundation of Information Organization. Cambridge, Massachusetts: The MIT Press, 2000.
[Turner] Turner, Christopher. Organizing Information: principles and practice. London: Clive Bingley, 1987.
[WikipediaSW] “Wikipedia:Semantic Web.” Wikipedia. 2008. http://en.wikipedia.org/wiki/Semantic_web
[Wynar] Wynar, Bohdan S. Introduction to Cataloguing and Classification. Boulder, Colorado: Libraries Unlimited, Inc., 1992.
[1] We define category as an ad hoc division within a subject domain absent an explicit classification scheme. Whereas classification may be said to be systematic, categorization need not be. Categorization is a preliminary process in the creation of a classification scheme but in informal or colloquial systems may be the only process. In library science a category is sometimes considered synonymous with a facet (from Faceted Classification).
[2] subsequently, “SW”
[3] To be fair to the SW folks, these charges are hardly new, and reflect the same issues present over three decades ago with the similar academic flourish of “artificial intelligence”, which assuredly after all these years still deserves quote marks.
[4] It seems that few researchers and/or developers understand RDF Semantics much less follow it in practice, given that the majority of projects I've reviewed bear little relation to the RDF model theory, either in definition or stated references. Indeed, many seem to be in direct conflict with one or more RDF axioms or have extended it in ways contradictory to the model theory (e.g., reference to non-FOL logics, non-monotonicity). I've long thought that the buzz over RDF is simply a misplaced enthusiasm for basic graph theory.
[5] accessed 21 April 2008
[6] It does not currently include the whānau (family unit), but in the future a public version of the site may permit individuals to create pages for their own whānau. ‘wh’ in te reo Māori is pronounced as an ‘f’, e.g., whānau is pronounced “faah′-na-u”.
[7] Ranganathan's scheme included five “fundamental” facets: Personality, Matter, Energy, Space and Time. Contemporary FC research tends to ignore this archaism but retains the methodology.
[8] This could also be localized for VSO or other grammars as necessary; the specific syntax is unimportant.
[9] We will subsequently be using the term “ontology” in this context, i.e., without reference to any formalized model theory.
[10] For brevity' sake, not all details are included here, e.g., the Terms defined for roles played in each relation type have not been included; other ontological structures have been simplified or omitted.
[13] i.e., a template for all Assertions using that Predicate.
[14] as described in the previous section, name-value property assertions are supported.
[15] Note: all Web references accessed as of 18 April 2008.