Using Atom Categorization to Build Dynamic Applications

R. Alexander Milowski

Lists, Stacks, and Piles

If you could see my desk at home or my office, you'd probably find it a horrific disorganized disaster with post-it notes on the desktop and monitor, stacks of papers and other items, and giant pile at one end. On the other hand, I see it as disorganized ordered mess. I know where to find what I need and there is a system of lists, stacks, and piles.

The post-it notes contain lists of things I need to do, information from "recent queries", reminders on one sort or another, and they are all ordered and carefully placed on my desk as to their importance. In relation to these lists, I have stacks of bills, papers, and other urgent items that need to be filed, where each stack has its purpose or category. Finally, I have giant pile of things that were formally in stacks. For some reason or another, they've expired but need to be filed. Someday I'll get around to that filing and make that giant pile not so giant anymore.

It shouldn't surprise you to find that my computers are similarly organized. It is easy to recognize the stacks of documents shuttered into folders by their relationship to each other or my stacks of photos or videos from my camera. My desktop has become a giant "pile of things downloaded" overtime and the whole computer is somewhat of a giant pile of information I've retained.

I've come to realized that this isn't going to change. What I need is for software and applications to adjust themselves to my way of storing information. Not finding what I needed and discovering the usefulness of Atom feeds, I created the open-source atomojo atomojo project to serve my "disorganized ordered mess" nature and store my information.

What I discovered was that even though I'm a software developer and work with things are often tightly controlled and highly structured, much of the information I produce or interact with--media, information bits, documents of all kinds--tends to be loosely organized into "stacks" rather than tightly controlled "lists". This is especially true when it comes to media coming from personal interactions, family, vacations and other things not related to business.

In fact, I'd say that the immense popularity of web search and the introduction of desktop search trends well with the idea that people generate stacks and piles of information that are loosely grouped together. I'd go even further to say that a "list" is a rare thing and can be hard to develop.

Let's then face the fact that people don't produce lists as much as they produce stacks and piles. Producing organized and structured information takes time and so we just shove our media, documents, and other information into the nearest stack (or pile) that looks appropriate. We do that with a slightly dishonest hope that we'll come back to it later and "get organized."

In the end, this information should be the basis for the applications and communications that we share with others on the Internet. It could drive our websites, our picture galleries, and our commerce engines if we only had a way to get it organized. Thus, my messy desk led me to believe that if I got organized I could build better dynamic applications.

Getting Organized with Atom Feeds

Atom feeds are like stacks of objects. There is an order to the stack from the first entry to the last. You often have to sort through the entries from first to last to find what you want. But the feed has a purpose and so there is a loose relationship between each entry.

While certain information resources have processable content, many others are not so easily searched. That is, an image is difficult to search against without human interaction unless there are annotations. As a result, the metadata encoded in the feed entry is very important.

With the recent invention of the Atom Publishing Protocol (AtomPub) apprfc, creation and manipulation of both entries and their associated metadata is relatively easy. As such, rather than shoving that picture from your last vacation into the nearest folder, you can tuck it away nicely into a Atom feed. The consequence is the natural next step is to author some metadata.

Entries have a rich vocabulary with which you can annotate your resources. Just by looking at the entry you can learn about the author, titles, summaries, and many other aspects. This enhances the ability to retrieve that object later.

Term Categorization in Atom

One of the interesting parts of the Atom vocabulary is the category element associated with both feeds and entries. This element has two important attributes called scheme and term. The scheme attribute is an URI value that qualifies or scopes the term attribute's value. The element itself can contain any content--text or elements--but none is defined by the Atom Syndication Format atomrfc.

If you concatenate the scheme and term attribute values and assume a default for when the scheme attribute is omitted, the result is a URI. This value can be interpreted as a leaf term in some unnamed ontology that labels the entry or feed with that term. As the category element may contain content, a value can be associated with the term.

This interpretation means that for each category element you get a RDF rdf triple. This triple is constructed such that the subject is the entry or feed, the predicate is the term URI, and the object is the value of the element. When the category element is empty, the value defaults to rdf:nil.

Graphical Models for Terms

Each feed generates a simple graphical model using the unique id element values to represent the subjects in the triples. For example, given the feed in Figure 1, the graph in Figure 2 is generated using edge labels:

E: http://www.atomojo.org/O/type/entry
pets: http://www.atomojo.org/O/keywords/pets
kind: http://www.atomojo.org/O/keywords/kind
breed: http://www.atomojo.org/O/keywords/breed
color: http://www.atomojo.org/O/keywords/color
name: http://www.atomojo.org/O/keywords/name

Figure 1: Example Feed

<feed xmlns='http://www.w3.org/2005/Atom'>
<title>Pets</title> <id>id:feed:1</id>
<updated>2008-07-10T15:45:26-07:00</updated>
<category term='pets'/>

<entry>
<title>My Cat</title>
<id>id:entry:1</id>
<updated>2008-07-10T15:45:26-07:00</updated>
<category term='pet'/>
<category term='kind'>cat</category>
<category term='color'>tan/tiger</category>
<category term='name'>bruno</category>
<content type='image/png' src='bruno.png'/>
</entry>

<entry>
<title>My Dog</title>
<id>id:entry:2</id>
<updated>2008-07-10T15:40:17-07:00</updated>
<category term='pet'/>
<category term='kind'>dog</category>
<category term='breed'>siberian husky</category>
<category term='color'>white</category>
<category term='name'>hudson</category>
<content type='image/png' src='hudson.png'/>
</entry>

</feed>

Querying to Build Lists

Being able to query your feeds becomes very important after enough content as been collected. That is, your feeds start to look a lot like an unorganized pile. Past a certain number, keeping track of just where something can be found gets to be very difficult.

While the Atom vocabulary provides many interesting elements (e.g. title) by which you might search, the entry and feed categorization provide a fine-grained set of information on which a query can be performed. Simple queries can be used to retrieve entries from feeds simply by pulling those entries who have certain terms.

We'd like a query that can:

provide a set of terms for the basis of the query,
allow comparison of any values associated with those terms,
return the query result as a "reconstituted" feed.

Fortunately, SPARQL sparql has recently become a W3C recommendation and we can use this to query our pile of feeds.

For example, if we want to retrieve the feed with the keyword pets, we could query on the term http://www.atomojo.org/O/keywords/pets as shown in Figure 3.

Figure 3: Retrieve the pets feed

PREFIX k: <http://www.atomojo.org/O/keyword/>
SELECT ?e WHERE { ?e k:pets () . }

Similarly, we could restrict the query to return all pet entries as in Figure 4.

Figure 4: Retrieve all pets entries

PREFIX k: <http://www.atomojo.org/O/keyword/>
PREFIX t: <http://www.atomojo.org/O/type/>
SELECT ?e WHERE { ?e t:entry (); k:pet () . }

If we want to search by value, we could find the pet named 'hudson' as in Figure 5.

Figure 5: Find Hudson

PREFIX k: <http://www.atomojo.org/O/keyword/>
SELECT ?e WHERE { ?e k:pet () ; k:name "hudson" .}

As a query language, SPARQL is sufficiently powerful to express many types of queries. If the graphical model of the feeds is extended to include properties from the atom feed vocabulary, queries can be formulated that mix categorization with the feed structure (e.g. find all pet entries whose title contains a certain word).

Since we are interested in feeds and entries, the real use of the query is to reconstitute the result into a feed. This can be done by a simple process using these rules:

For each matching feed, return an entry that summarizes the feed, its categorization, and a single link of relation related that points to the feed's resource URI.
For each matching entry, return the entry with the xml:base attribute set such that the link relations are preserved.

As a result of this process, the query result is just another feed that can be consumed by any Atom-enabled client.

Building Applications from Atom Feeds

Architecture Overview

With our ability to annotate and query arbitrary content as feed entries, we can now describe a dynamic application architecture show in Figure 6 that is based on atomojo's AtomPub and SPARQL abilities. On the far left is the application author who is responsible for loading content and configuring the application. On the far right is the end user of the application who is using a browser-based application presented by the system. In the middle is the web application server and atomojo server instances.

The author's responsibility is not only to provide the content from the pile and appropriate annotations, but also to provide instructions through the same means. These instructions provide content layouts, rules for content organization, web content such as javascript code or CSS stylesheets, and even service components. While some of this content is the domain of a web programmer, much of it is just simple entries with categorizations as to how to build resources on the web application server.

The web application server performs several tasks:

provides content proxies from the atomojo server to deliver content to the browser,
loads and configures layouts for web pages that are to be built from atom feeds,
loads and configures resources that are SPARQL queries to the atomojo server,
loads custom application components for specialized services.

The configuration information is retrieved from the atomojo server by a series of SPARQL queries. These queries allow the configuration information to be stored anywhere the author prefers.

An Example

When the web application server periodically updates itself, it performs a set of queries against the atomojo server to retrieve configuration information. For example, to find all the layouts, the server uses the query shown in Figure 7. This query returns a set of entries that are the instructions for how each feeds is processed to produce a web page.

Figure 7: Query for All Layouts

PREFIX www: <http://www.atomojo.org/O/www/configuration/>
SELECT ?e WHERE { ?e www:layout () . }

A typical layout entry is a media resource entry that contains a script (e.g. XSLT) for transforming the feed on the server into appropriate web content. While the transformation as shown in Figure 8 produces HTML, nothing precludes generation of images or other non-XML media types.

Figure 8: A Layout Entry

<entry xmlns='http://www.w3.org/2005/Atom'>
<title>Default Layout</title>
<id>urn:uuid:18e4d0870ee46-4ff5-8895-bdd0ee0fb226</id>
<published>2008-07-18T11:22:06-07:00</published>
<updated>2008-07-18T11:42:06-07:00</update>
<category scheme='http://www.atomojo.org/O/www/configuration/'
          term='layout'/>
<category scheme='http://www.atomojo.org/O/www/configuration/layout/'
          term='media-type'>text/html</category>
<content src='default-layout.xsl' type='application/xslt+xml'/>
</entry>

The layouts are used by internal or custom components where the choice of layout is based on a number of matching criteria. These layout rules can be restricted to match by resource path or require that the feed being rendered to have certain terms (i.e. category elements). This allows the author to select the layout based on categorization rather than location in the atomojo server.

This approach has been used to configure the proxies, layouts, queries to the server, and application components as shown in Figure 9.

Figure 9: A Configuration Feed

<feed xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2005/Atom" xml:base="./">
<id>urn:uuid:9a1ecf4e-2495-4a42-9b53-37b25dff731e</id>
<updated>2008-07-17T14:38:20-07:00</updated>
<title>Configuration</title>
<author><name>Alex Milowski</name></author>
<link href="" rel="edit"/>
<category scheme="http://www.atomojo.org/O/type/feed/" term="hidden"/>
<link href="" rel="self"/>

<entry>
<title type="text">default-layout.xsl</title>
<id>urn:uuid:18e4d087-ee46-4ff5-8895-bdd0ee0fb226</id>
<published>2008-07-18T11:22:09-07:00</published>
<updated>2008-07-18T11:22:09-07:00</updated>
<app:edited>2008-07-18T12:01:59-07:00</app:edited>
<link href="./_/18e4d087-ee46-4ff5-8895-bdd0ee0fb226" rel="edit"/>
<content type="application/xslt+xml" src="default-layout.xsl"/>
<category scheme="http://www.atomojo.org/O/www/configuration/"
          term="layout"/>
<category scheme="http://www.atomojo.org/O/www/configuration/layout/"
          term="media-type">text/html</category>
<link href="default-layout.xsl" rel="edit-media"/>
</entry>

<entry>
<published>2008-07-17T14:43:17-07:00</published>
<app:edited>2008-07-17T17:44:55-07:00</app:edited>
<updated>2008-07-17T14:43:17-07:00</updated>
<id>urn:uuid:f080d0ee-a1b2-4012-a916-b0d915ea2cc4</id>
<title type="text">Proxy</title>
<author><name>Alex Milowski</name></author>
<link href="./_/f080d0ee-a1b2-4012-a916-b0d915ea2cc4" rel="edit"/>
<category scheme="http://www.atomojo.org/O/www/configuration/"
          term="application"/>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="match"/>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="proxy">resources</category>
</entry>

<entry>
<published>2008-07-17T17:38:11-07:00</published>
<app:edited>2008-07-17T17:44:24-07:00</app:edited>
<updated>2008-07-17T17:38:11-07:00</updated>
<id>urn:uuid:115b3132-4a9f-432d-bf7d-e366803d286e</id>
<title type="text">Index</title>
<author><name>Alex Milowski</name></author>
<link href="./_/115b3132-4a9f-432d-bf7d-e366803d286e" rel="edit"/>
<category scheme="http://www.atomojo.org/O/www/configuration/"
          term="application"/>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="class">org.atomojo.www.util.script.IndexApplication</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="match">/{path}/</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="match">/</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/match/"
          term="mode">exact</category>
</entry>

<entry>
<published>2008-07-18T13:28:51-07:00</published>
<app:edited>2008-07-18T13:28:51-07:00</app:edited>
<updated>2008-07-18T13:28:51-07:00</updated>
<id>urn:uuid:01ababd0-9c16-4f0f-a4a9-ffecfdecdaf3</id>
<title type="text">Software Index</title>
<author><name>Alex Milowski</name></author>
<category scheme="http://www.atomojo.org/O/www/configuration/"
          term="application"/>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="class">org.atomojo.www.util.script.IndexApplication</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="match">/software/</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/match/"
          term="mode">exact</category>
<category scheme="http://www.atomojo.org/O/www/configuration/application/"
          term="resource">software</category>
<link href="./_/01ababd0-9c16-4f0f-a4a9-ffecfdecdaf3" rel="edit"/>
</entry>

<entry>
<published>2008-07-18T13:26:49-07:00</published>
<app:edited>2008-07-18T13:26:49-07:00</app:edited>
<updated>2008-07-18T13:26:49-07:00</updated>
<id>urn:uuid:3b5319ba-2e8b-4b66-94bc-ecf01bac77f8</id>
<title type="text">Software Resource</title>
<author><name>Alex Milowski</name></author>
<link href="./_/3b5319ba-2e8b-4b66-94bc-ecf01bac77f8" rel="edit"/>
<category scheme="http://www.atomojo.org/O/www/configuration/"
          term="resource"/>
<category scheme="http://www.atomojo.org/O/www/configuration/resource/"
          term="name">software</category>
<category scheme="http://www.atomojo.org/O/www/configuration/resource/"
          term="relation">terms</category>
<category scheme="http://www.atomojo.org/O/www/configuration/resource/"
          term="media-type">application/sparql-query</category>
<category scheme="http://www.atomojo.org/O/www/configuration/resource/"
          term="query">
PREFIX k: &lt;http://ww.atomojo.org/O/keyword/&gt;
PREFIX t: &lt;http://www.atomojo.org/O/type/&gt;
SELECT ?e WHERE { ?e t:feed (); k:software () }
</category>
</entry>
</feed>

In Figure 9, the first entry is the layout, the second entry is a proxy for content, and the third is an application component that comes packaged with atomojo. This component is used to create index pages from feeds using the layouts and is typically mapped to index resources (e.g. ends with a forward slash). These mappings are shown in the match terms in the entries.

The last two entries are slightly different. Their purpose is to use the query facilities of the atomojo server to find all software projects on the web site and then present a listing. They do this by associating a query to a resource that is then called by another index component.

The last entry defines a query against the atomojo server for retrieving all those feeds that have a keyword of software. That resource is used by the index component defined by the preceding entry. These two entries together configure an index page at /software/ on the web site.

Conclusion

In Figure 9 in the last section, the last two entries achieved finding resources in our pile of information. An author can now put information about software in any feed they choose and, as long as it is annotated with a category element with a term value of software, it will show up on that software index page.

This is not remarkable. Plenty of software system exists that allow authored keywords to produce index information and then allow people to browse that information. What is interesting here is that we're using categorization and terms.

Any categorization--both formal and informal--can now be used to annotate information stored in the feeds. The annotations are not limited to keywords. Also, the combination of different terms and values can be used to create a very specific set of informaiton.

Similarly, the queries are not limited to simple retrieval exercises. The SPARQL queries can perform complex union and intersection operations as well as filtering on term values. As such, very specific data sets can be retrieved from the atomojo server.

As time goes by, queries can be developed to use whatever categorization evolves from the authors. These queries can be used to re-purpose that original content without much, if any, change to the feed metadata. The resulting feeds can then be associated with a web resource independent of how the author chose to organize the original entries and feeds. That is, I can create a disorganized pile of information and keep my website organized.

BalisageThe Markup Conference

Balisage Paper: Using Atom Categorization to Build Dynamic Applications

R. Alexander Milowski

`<alex@milowski.com>`

Table of Contents