Lists, Stacks, and Piles
If you could see my desk at home or my office, you'd probably find it a horrific disorganized disaster with post-it notes on the desktop and monitor, stacks of papers and other items, and giant pile at one end. On the other hand, I see it as disorganized ordered mess. I know where to find what I need and there is a system of lists, stacks, and piles.
The post-it notes contain lists of things I need to do, information from "recent queries", reminders on one sort or another, and they are all ordered and carefully placed on my desk as to their importance. In relation to these lists, I have stacks of bills, papers, and other urgent items that need to be filed, where each stack has its purpose or category. Finally, I have giant pile of things that were formally in stacks. For some reason or another, they've expired but need to be filed. Someday I'll get around to that filing and make that giant pile not so giant anymore.
It shouldn't surprise you to find that my computers are similarly organized. It is easy to recognize the stacks of documents shuttered into folders by their relationship to each other or my stacks of photos or videos from my camera. My desktop has become a giant "pile of things downloaded" overtime and the whole computer is somewhat of a giant pile of information I've retained.
I've come to realized that this isn't going to change. What I need is for software and applications to adjust themselves to my way of storing information. Not finding what I needed and discovering the usefulness of Atom feeds, I created the open-source atomojo atomojo project to serve my "disorganized ordered mess" nature and store my information.
What I discovered was that even though I'm a software developer and work with things are often tightly controlled and highly structured, much of the information I produce or interact with--media, information bits, documents of all kinds--tends to be loosely organized into "stacks" rather than tightly controlled "lists". This is especially true when it comes to media coming from personal interactions, family, vacations and other things not related to business.
In fact, I'd say that the immense popularity of web search and the introduction of desktop search trends well with the idea that people generate stacks and piles of information that are loosely grouped together. I'd go even further to say that a "list" is a rare thing and can be hard to develop.
Let's then face the fact that people don't produce lists
as much as they produce stacks
and piles
. Producing organized and
structured information takes time and so we just shove our media, documents, and other
information into the nearest stack (or pile) that looks appropriate. We do
that with a slightly dishonest hope that we'll come back to it later and
"get organized."
In the end, this information should be the basis for the applications and communications that we share with others on the Internet. It could drive our websites, our picture galleries, and our commerce engines if we only had a way to get it organized. Thus, my messy desk led me to believe that if I got organized I could build better dynamic applications.
Getting Organized with Atom Feeds
Atom feeds are like stacks of objects. There is an order to the stack from the first entry to the last. You often have to sort through the entries from first to last to find what you want. But the feed has a purpose and so there is a loose relationship between each entry.
While certain information resources have processable content, many others are not so easily searched. That is, an image is difficult to search against without human interaction unless there are annotations. As a result, the metadata encoded in the feed entry is very important.
With the recent invention of the Atom Publishing Protocol (AtomPub) apprfc, creation and manipulation of both entries and their associated metadata is relatively easy. As such, rather than shoving that picture from your last vacation into the nearest folder, you can tuck it away nicely into a Atom feed. The consequence is the natural next step is to author some metadata.
Entries have a rich vocabulary with which you can annotate your resources. Just by looking at the entry you can learn about the author, titles, summaries, and many other aspects. This enhances the ability to retrieve that object later.
Term Categorization in Atom
One of the interesting parts of the Atom vocabulary is the
category
element associated with both feeds and entries.
This element has two important attributes called
and
scheme
. The term
scheme
attribute is
an URI value that qualifies
or scopes
the term
attribute's value. The element itself can contain any content--text or
elements--but none is defined by the Atom Syndication Format atomrfc.
If you concatenate the scheme
and term
attribute values and assume a default for when the scheme
attribute is omitted, the result is a URI. This
value can be interpreted as a leaf term in some unnamed ontology
that labels the entry or feed with that term. As the
category
element may contain content, a value can be
associated with the term.
This interpretation means that for each category
element you get a RDF rdf triple. This
triple is constructed such that the subject is the entry or feed,
the predicate is the term URI, and the object is the value of the
element. When the category
element is empty, the value
defaults to rdf:nil
.
Graphical Models for Terms
Each feed generates a simple graphical model using the unique
id
element values to represent the subjects in the triples.
For example, given the feed in Figure 1,
the graph in Figure 2 is generated using
edge labels:
-
E: http://www.atomojo.org/O/type/entry
-
pets: http://www.atomojo.org/O/keywords/pets
-
kind: http://www.atomojo.org/O/keywords/kind
-
breed: http://www.atomojo.org/O/keywords/breed
-
color: http://www.atomojo.org/O/keywords/color
-
name: http://www.atomojo.org/O/keywords/name
Querying to Build Lists
Being able to query your feeds becomes very important after enough
content as been collected. That is, your feeds start to look a lot like
an unorganized pile
. Past a certain number, keeping track of just where
something can be found gets to be very difficult.
While the Atom vocabulary provides many interesting elements (e.g. title) by which you might search, the entry and feed categorization provide a fine-grained set of information on which a query can be performed. Simple queries can be used to retrieve entries from feeds simply by pulling those entries who have certain terms.
We'd like a query that can:
-
provide a set of terms for the basis of the query,
-
allow comparison of any values associated with those terms,
-
return the query result as a "reconstituted" feed.
Fortunately, SPARQL sparql has
recently become a W3C recommendation and we can use this to query our
pile
of feeds.
For example, if we want to retrieve the feed with the keyword
, we could query on the term
pets
http://www.atomojo.org/O/keywords/pets
as shown in Figure 3.
Similarly, we could restrict the query to return all pet entries as in Figure 4.
If we want to search by value, we could find the pet named 'hudson' as in Figure 5.
As a query language, SPARQL is sufficiently powerful to express many types of queries. If the graphical model of the feeds is extended to include properties from the atom feed vocabulary, queries can be formulated that mix categorization with the feed structure (e.g. find all pet entries whose title contains a certain word).
Since we are interested in feeds and entries, the real use of the
query is to reconstitute
the result into a feed. This can be done by
a simple process using these rules:
-
For each matching feed, return an entry that summarizes the feed, its categorization, and a single link of relation
that points to the feed's resource URI.related
-
For each matching entry, return the entry with the
xml:base
attribute set such that the link relations are preserved.
As a result of this process, the query result is just another feed that can be consumed by any Atom-enabled client.
Building Applications from Atom Feeds
Architecture Overview
With our ability to annotate and query arbitrary content as feed entries, we can now describe a dynamic application architecture show in Figure 6 that is based on atomojo's AtomPub and SPARQL abilities. On the far left is the application author who is responsible for loading content and configuring the application. On the far right is the end user of the application who is using a browser-based application presented by the system. In the middle is the web application server and atomojo server instances.
The author's responsibility is not only to provide the content
from the pile
and appropriate annotations, but also to provide
instructions
through the same means. These instructions provide
content layouts, rules for content organization, web content such as
javascript code or CSS stylesheets, and even service components. While
some of this content is the domain of a web programmer, much of it is
just simple entries with categorizations as to how to build resources on
the web application server.
The web application server performs several tasks:
-
provides content proxies from the atomojo server to deliver content to the browser,
-
loads and configures layouts for web pages that are to be built from atom feeds,
-
loads and configures resources that are SPARQL queries to the atomojo server,
-
loads custom application components for specialized services.
The configuration information is retrieved from the atomojo server by a series of SPARQL queries. These queries allow the configuration information to be stored anywhere the author prefers.
An Example
When the web application server periodically updates itself, it performs a set of queries against the atomojo server to retrieve configuration information. For example, to find all the layouts, the server uses the query shown in Figure 7. This query returns a set of entries that are the instructions for how each feeds is processed to produce a web page.
A typical layout entry is a media resource entry that contains a script (e.g. XSLT) for transforming the feed on the server into appropriate web content. While the transformation as shown in Figure 8 produces HTML, nothing precludes generation of images or other non-XML media types.
The layouts are used by internal or custom components where the choice of layout is based on a number of matching criteria. These layout rules can be restricted to match by resource path or require that the feed being rendered to have certain terms (i.e. category elements). This allows the author to select the layout based on categorization rather than location in the atomojo server.
This approach has been used to configure the proxies, layouts, queries to the server, and application components as shown in Figure 9.
In Figure 9, the first entry
is the layout, the second entry is a proxy for content, and the third is
an application component that comes packaged with atomojo. This
component is used to create index pages
from feeds using the layouts and is typically
mapped to index
resources (e.g. ends with a forward slash). These
mappings are shown in the
terms in the entries.
match
The last two entries are slightly different. Their purpose is to
use the query facilities of the atomojo server to find all software
projects on the web site and then present a listing. They do this by
associating a query to a resource that is then called by another index
component.
The last entry defines a query against the atomojo server for
retrieving all those feeds that have a keyword of
. That
resource is used by the index component defined by the preceding entry.
These two entries together configure an index page at software
on
the web site.
/software/
Conclusion
In Figure 9 in the last
section, the last two entries achieved finding resources in our pile of
information
. An author can now put information about software in any feed
they choose and, as long as it is
annotated with a category element with a term value of
, it will
show up on that software index page.
software
This is not remarkable. Plenty of software system exists that allow authored keywords to produce index information and then allow people to browse that information. What is interesting here is that we're using categorization and terms.
Any categorization--both formal and informal--can now be used to annotate information stored in the feeds. The annotations are not limited to keywords. Also, the combination of different terms and values can be used to create a very specific set of informaiton.
Similarly, the queries are not limited to simple retrieval exercises. The SPARQL queries can perform complex union and intersection operations as well as filtering on term values. As such, very specific data sets can be retrieved from the atomojo server.
As time goes by, queries can be developed to use whatever categorization evolves from the authors. These queries can be used to re-purpose that original content without much, if any, change to the feed metadata. The resulting feeds can then be associated with a web resource independent of how the author chose to organize the original entries and feeds. That is, I can create a disorganized pile of information and keep my website organized.
References
[atomojo] Milowski, R. "Atomojo" 2008 http://code.google.com/p/atomojo/
[atomrfc] Nottingham, M. and Sayre R., "RFC 4287: The Atom Syndication Format", December 2005 http://www.ietf.org/rfc/rfc4287.txt
[apprfc] Gregorio, J. and de hOra, B. "RFC 5023: The Atom Publishing Protocol", October 2007 http://www.ietf.org/rfc/rfc5023.txt
[sparql] Prud'hommeaux, E. and Seaborne, A. "SPARQL Query Language for RDF", January 2008 http://www.w3.org/TR/rdf-sparql-query/
[rdf] Hayes, P. "RDF Semantics", February 2004 http://www.w3.org/TR/rdf-mt/