Hunting, Sam. “Topic maps in near-real time.” Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). https://doi.org/10.4242/BalisageVol1.Hunting01.
Balisage: The Markup Conference 2008 August 12 - 15, 2008
Balisage Paper: Topic maps in near-real time
Sam Hunting
Sam Hunting has been toiling in the vineyards of markup and topic mapping for many years, for
large companies and small. This time, thanks to the "arbor" of Drupal, the vintage
is bottled and ready to ship.
A new topic map module for the Drupal open-source content management system now supports
the publication of collaboratively written topic maps in near-real time, using a plug-in
architecture that can be extended to support specific information sets.
Writer’s block. When I found myself making a process flow diagram figure, I felt there
are a lot of people here who can do that better than I can. So why do it? Ditto for
topic map theory, where I’m sure I’ve committed at least one major howler. What I
do claim to have is an topic map application, complete with a colorable disclosure,
that can deliver unique value to users on a widely used platform, whose development
has been informed by all the work we’ve done together on topic maps over the years.
Why did you write a topic map application?
I believe that the news, like food, should be good, clean, fair--and local: This toxic waste dump, this voting machine debacle, this housing authority scandal, this corrupt official. Contemporary journalism is neither good, clean, fair, nor local
(see Bob Somerby). However, local news gatherers must also be able to connect their narratives to
other, similar narratives that contain subject matter of interest to them. (Toxic
Waste, Inc., knows all about Localities A and B, but unless Localities A and B know
about each other, any narrative they create about Toxic Waste, Inc. will necessarily
remain partial, and any local action based on that narrative could well lack critical
information.) Hence, there is a requirement for a distributed information system that
would enables localities to discover subjects of mutual interest, perhaps serendipitously,
intrinsic to content that they themselves have created.
Hence topic maps.
Why did you write your topic map application in Drupal?
Subjects are a function of content:
Equation (a)
S = f(c)
and Drupal is the content management system par excellence. Drupal is open source (GPL 2.0); Drupal has a vibrant community; Drupal is hot,
being blessed by Google ("Summer of Code"); Drupal has superior community building
and categorization tools; and I am intimately familiar with the platform, having built,
administered, and moderated a Technorati 5000 community site for several years. Drupal
also adds functionality by adding modules, so I determined to write a topic map module
for Drupal.
Proxy disclosure
Can you disclose your proxies?
Yes.
What are your proxies?
Here is the "proxy spider" diagram for my proxy; it works, at least, like the BigAssert assertion model devised
by Steve Newcomb. Keys that are not "map," "association caption," or "association type" are roles.
Values with role keys are players.
Player properties may be related to a notation processor. The presence of a processor
may, or may not, affect the subject identity of the proxy that contains the property.
When are two proxies the same?
Two proxies are the same when they are in the same map, their type properties are
the sam, and their role/player combinations are the same, and if a player has a notation
that can impact subject identity, both notations are the same.
Where are the individual topics?
There are no individual topics, because no topic can exist in isolation. So, in my
disclosure, they’re properties. And I have to admit, that when I looked at Barta's handy guide to the TMRM, I couldn’t make my diagram work any other way. And this may be the howler to which
my introduction alludes!
How can users navigate using your proxies?
As if properties were holes in a punch card:
Implementation
What were the main challenges you encountered during development?
Different layers of the LAMP stack needed different representations.
User input required a representation that would work in a text box (since JavaScript editors
are not ready, and would be WYSIWYG if they were, and XML editors are not available
either). However,
user input is not appropriate for processing proxies than processing anglebrackets instead of the using the DOM would be. Finally,
a relational representation is needed for storage.
What markup did you devise for user input?
A wiki-like markup language:
The wporkflow envisaged is user integrating associations into actual text.
The markup is reasonably easy for the user to enter, and reasonably easy for the application
to parse.
As can be seen from the sample, [[ and ]] delimit the proxy. : delimits the properties of the proxy. Only the value of the player property is visible.
(However, hiding a player is such a ubiquitous use case there's a special syntax
for it: A hat before property delimiter hides the player (^:Bart hides "Bart"). = adds a notation to a player property, a la [[ ... text:Federalist 51=federalist ...]]] (Later, we will see how the plugin for the federalist notation would process the
player.)
Implementation uses Drupal’s nodeapi hook; when content is being processed, any function
of the form [drupal_module]_nodeapi is invoked, with the content as input and output
for the function. So, topicmap_nodeapi takes content marked up as proxies and transforms
it to the data structure used for processing the proxy, to which we now turn.
What data structure did you use for processing?
A tuple-like representation. Here is a sample:
Proxies need to be transformed from user input in are processed in several ways: for
TOCs, for legends, for validation, for plugins, etc.
Because the wiki syntax isn’t suitable for PHP processing, we adopt the PHP mindset
and represent proxies as arrays, so we get to use PHP’s rich variety of array functions.
After a false start using a keyed array (a proxy can have duplicate keys), I adopted
the tuple representation shown. (Jack Park and I did a tuple representation of topic
maps long ago, on the theory, which I still believe to be correct, that a Linda-like
tuple space implementation would be great way to federate topic maps, and this idea
was inspired by that work.
This representation is efficient since the map value is always at position three,
roles are every fourth array element starting at 12, roles are every fourth array
element starting at 16, and so on.
It will also map cleanly to XTM and JSON output formats.
Implementation (still in topicmap_nodeapi) takes post content with wiki-like markup
embedded, parses it, rips out the properties for each proxy, validates each tuple
either generically or by type, and associates the tuple with an offset ("8090","8096")
back into the content. The tuple is then transformed into HTML by adding generic span,
div, and class markup, and notation processing (for example, the federalist notation
processor might transform its player data into an HTML A tag linking to an online
version of the Federalist Papers).
What relational structure did you use for storage?
A table that permits result sets like this, which you will shortly see is convenient
for generation navigation tables, or TOCs:
(I apologize for not having a diagram; I couldn’t find a diagrammer that works with
Postgres on OS X.) Here is the basic idea.
In the text box, users enter text values ("Bart","Alberto Gonzales"); it wouldn’t
be sensible to have users do anything else. However, all processing on the database
side takes place through manipulating integers called (adopting Drupal jargon), "nids,"
or node IDs. Therefore:
There is a table of values
There is a mapping table of value to nid
There is a table of nids (All logging data (creator, date created) goes on the nid
table)
There is a "big table" with columns for each property. The columns are: A[ssociation],[t]type,[r]ole,[p]layer,[c]casting,[n]otation,
and ac (association caption). Alas, ac ("association caption") and notation are optional,
and so these columns can contain ugly NULLs.
The data in each column is a foreign key into the table of nids.
There are ancillary mapping tables of association to source (the Drupal post), nid
to its autogenerated Drupal page, and so on.
The implementation is almost certainly naïve; I never did figure out a way to cram
an association into a single row, because the role and player combinations vary in
number. In practice, that means that to grab a single association, you need to grab
all the rows that have the same values for a, t, r, and p (and are in the same map,
and have notations that either do not affect subject identity, or are the same) and
no others. Relational purists also take the view that auto-generating nids as relational
keys is pernicious; however, that’s how Drupal does things. However, the implementation
is operating fast enough to deliver value to users at the required scale.
Plugin Architecture
What are the advantages of notation plugins?
Notation plugins allow the administration add data-driven functionality to a Drupal
site that uses the topic map module. For example, entering the following proxy:
[[test:test_type_7[role_7_2_1:player_7_2_1] and [role_7_2_2:364 U.S. 507=caselaw]]
Uses the caselaw plugin, and generates a sidebar with metadata about the case:
What notation plugins did you include?
There are notation plugins for aircraft tail numbers, email addresses, citations to
the Bible, the U.S. Constitution, and the Federalist Papers, and geocoded maps.
The geocding/mapping plugin takes a physical address as input, geocodes it, and returns
a map, which illustrates the distinction between this topic map implementation, and
most other approaches to the semantic web: The plugins operate at the data level,
not the resource level, and integrate into content at a point of the author’s own
choosing. This is quite distinct from the model where resources are vertically organized
by site, and then "mashed up" into a new resource that is still not integrated into
content.
What are the advantages of association type plugins?
In a word, validation.
Validation parameters can be set by the user:
And enforced by the application:
The interface is crude, being CSS-driven and therefore not dynamic, but usable.
What type plugins did you include?
Plugins for the types required by ISO 13250 (class/instance and supertype subtype),
as well as types for asserting that two properties are the same, and an "org" type,
for analyzing who reports to whom inside an organization (like, in the sample topic
map, the Bush administration).
What are the advantages of search plugins?
What search plugins did you include?
The topic map paradigm is extremely rich, and provides an almost unlimited number
of ways to "connect the dots." However, since we can’t know method appropriate to
navigating a corpus before actually knowing the corpus, it makes sense to enable developers
to create plugins, rather than decide in advance that I know better than they do.
For example, the "degrees" plugin:
This example shows that indeed the head bone is connected to the neck bone, the neck
bone is connected to the back bone, the back bone is connected to the hip bone, the
hip bone is connected to the thigh bone, and the thigh bone is connected to the knee
bone. Which may seem trivial, unless you want to know how many degrees separate Alberto
Gonzales from Monica Lewinsky, say. (Note that the implementation does not depend on the type of association in which the player participates, but solely on
building a linked list associations with player overlap. And sometimes, indeed, "you
can't get there from here.")
Interlude: The TOC
Why is your presentation of the topic map so resolutely un-Flashy?
Here is what the questioner means by "resolutely un-Flashy."
How the TOC works, from top left: Each numbered "stripe" is a single proxy, and each
column is a property of the proxy. The downward pointing blue triangles link down
into the occurrence of the proxy in the content. Further, each type plugin annotates
its proxies by adding a footnote to a proxy’s property, where appropriate. The footnote
shows metadata for the proxy via a "hover," and links to the TOC legend, also generated
by the plugin. For example, all instances of "cat" are footnoted with "2," in blue,
color-coding the note as applying to a supertype/subtype type of proxy. Clicking on
footnote "2" takes the user to note "2" in the legend, where the user may click on
the "cat" to go to a (dynamically created) page that shows all the proxies that use
the "cat" property, or the user may click on the outline icon in parenthesis, and
go to the "cat" node on the type page, which shows the type hierarchy. (Different
type plugins, as you see, have different colors, icons, and also organize their pages
differently.) Naturally, if a new plugin in were added, it to would add its own notes,
legend, and have is own page, assuming it used the plugin API.
This is not sexy; most designers prefer a graph style presentation, with nodes and
arcs, but I think that’s "Visualization That Doesn’t Help You Vizualize." Such an
approach has a number of disadvantages: The graphs are generally a Flash presentation
or equivalent and so are not searchable via search engines, are static, and can’t
evolve with the community, and are not integrated into any content. In addition, they
take up an awful lot of space on the page, and consume a lot of bandwidth. (There’s
no reason to assume that net neutrality will continue, and so low bandwidth applications
may assume increasing importance.) The un-Flashy TOC presented here has none of those
disadvantages, and all of the functionality listed, none of which the Flash approach
offers.
Example
Do you have an example of a topic map that uses your application?
Yes: The Criminal Bush Regime. Based on stories from the Washington Post and Slate, it contrasts the prose and
parallel Flash-y approach to a topic map. Although it is not "local" (except inside
the Beltway), I like to think it's good, clean, and fair, and a community could add
value to it.
Conclusion
What would you say to anyone thinking about using topic maps?
Great paradigm, great people, great software.
Can I download your module?
Great paradigm, great people, great software.
Is the latest version of the topi map module for Drupal available for download?