A Short Intro
It’s been my long-standing dream to develop my own XML content management system. I could argue that an important reason is to do something better than any of the many such systems I’ve been part of developing in the past, but mainly, I just think it’s a cool idea and something I want to do.
My 2023 Balisage paper [The Dream of a CMS] discussed some of this at length. It focussed on a native XML portal application built on top of eXist-db and discussed my cunning plan to base a proper CMS application on it, but stopped short of discussing the CMS specifics in any detail. In particular, it left out the discussion on implementing version and workflow handling, specifically as I describe the two in my Balisage 2014 paper on multilevel versioning [Multilevel Versioning].
This paper intends to close that gap.
What I Mean by CMS
But let me very briefly remind you of what I mean by CMS
:
A CMS is used for authoring, publishing, and managing XML documents and anything they link to. The CMS should keep track of all versions of those; every XML file and everything that specific XML links to should be uniquely identifiable, per exact version but also per language and country. If a version 223 of an XML file links to a version 11 of a PNG, we should always be able to reproduce that exact configuration.
Also, the Context
This paper happened in a context. I wasn’t simply implementing version and workflow management as a theoretical exercise, I did it to add features to my open-source CMS, JACMS (Just Another CMS). JACMS is not only about version handling, of course — there is everything from authoring to publishing — but that’s a subject for another paper.
Basics
Before we jump into the implementation, let’s briefly look at my views on identification, change, and the like.
Identifying Resources Using URNs
I’ve long advocated an approach that centers on identifying resources using URNs (Uniform Resource Names). The approach starts with a base identifier that identifies the resource in its purest and most abstract form:
urn:x-example:manuals:123456
This won’t give us a specific point in time (a version), nor will it give us a rendition (a specific locale, that is, language and country). For that, we need two or more additional URN components:
urn:x-example:manuals:123456:<version>:<xmllang>
A version 3 in English for the US of an XML document might look like this:
urn:x-example:manuals:123456:3:en-US
The Finnish translation for Finland of that version would then be:
urn:x-example:manuals:123456:3:fi-FI
Importantly, we declare that the English and Finnish translations[1] are the same, in terms of content, but they are different renditions[2] of version 3.
Similarly, a change in version depicts change to the content, regardless of localisation.
A more fine-grained version 3.0.1 might be presented like so:
urn:x-example:manuals:123456:3.0.1:en-US
The version might be split into multiple URN components:
urn:x-example:manuals:123456:3:0:1:en-US
But my view is that all resources should be identified and managed using this same URN scheme. An image should be referenced to from the XML using the URN rather than a URL:
<figure>
<title>An Illustration</title>
<img href="urn:x-example:images:987654:1:en-US"/>
</figure>The URNs might be made more flexible using business rules such as use the
appropriate version and rendition depending on context
:
<img href="urn:x-example:images:987654:*:*"/>
Here, the wildcard * means any version and
any locale, respectively. An alternative is to leave out
the version and locale altogether, but this bothers me, as a URN without the two
should be interpreted as a base URN, the abstract resource, rather than any physical
resource.
Link Trees
Importantly, the URNs allow us to reproduce a specific document configuration down to the last version. Imagine an XML document that links to images and other XML documents to form a tree structure, and where all links are in the form of URNs:
Figure 1: Linked XML documents and images

Where URN1 might look like this:
<doc>
<title>My document</title>
<xi:include href="URN2"/>
<img href="URN3"/>
<img href="URN4"/>
</doc>You get the idea.
Change
What constitutes a change? What drives a version bump? For example, do we bump the version if we fix a spelling mistake? Add a paragraph? Update a link? Remove or add one?
It depends.
My 2014 Balisage paper on the subject [Multilevel Versioning], discusses this in more detail, but essentially, there are a couple of things to consider regarding change:
-
Not every change is meaningful. For example, if every save operation creates a new version, very few of them will be useful.
-
Some identify small fixes. Others are used for adding new features. Yet others are
backwards-incompatible
. Etc. These are different levels of change; you can add as many as you like, as long as you define the business rules. -
Assuming well-defined version levels where each new version is semantically meaningful, we also need an additional level that isn’t. In that, we are only concerned with change meaningful to the author.
Some of the latter identify a workflow stage, i.e. this is a first complete
draft
or this is now ready for review
. They could mean
that the content is ready for a meaningful new version, perhaps one that is
one or more levels up
, but they could also result in additional
changes on the same level. In other words, a workflow change is NOT the same as a
(meaningful) version change.
For a three-level set of versions X.Y.Z., we will therefore need a
fourth, only used for in-work changes.
Adding a translation happens to a specific version. A Finnish translation of the English 1.0.0 document results in a Finnish 1.0.0. Etc.[3]
Remember that a translation is a rendition, not an exact copy. The original and the translation are declared to be equivalent. They don’t need to, for example, have the same node count; a single paragraph in one language may better express the sentiment presented in two paragraphs in another. Images frequently need to be localised, not only in terms of text but also cultural differences. Etc.
Version Markup Language
Version Markup Language (VML) is an approach and a Relax NG Compact Schema for versioning (for more, see my Balisage 2014 paper [Multilevel Versioning] and the VML Github repository [vml-github]) that resulted. A VML instance is an XML document describing the version history of one or more resources stored by a CMS. It is based on the URN principles as described above (see section “Identifying Resources Using URNs”), and it is perhaps best described using an example:
<vml xmlns="http://www.sgmlguru.org/ns/vml">
<resources>
<resource>
<base>urn:x-vml-exist:r1:doc:000001</base>
<!-- Integer -->
<version>
<rev>0</rev>
<!-- No physical document saved here -->
<file>
<metadata><!-- Level 0 --></metadata>
<url></url>
</file>
<!-- Decimal -->
<version>
<metadata><!-- Level 1 --></metadata>
<rev>0</rev>
<!-- No physical document saved here either -->
<file>
<url></url>
</file>
<!-- Centesimal - this is the first edit -->
<version status="c-o">
<metadata><!-- Level 2 --></metadata>
<rev>1</rev>
<file>
<metadata><!-- Level 2 (saved document v 0.0.1) --></metadata>
<url xml:lang="en-GB">/path/to/first/save</url>
</file>
</version>
<!-- Second edit -->
<version status="c-o">
<rev>2</rev>
<file>
<metadata><!-- Level 2 (saved document v 0.0.2) --></metadata>
<url xml:lang="en-GB">/path/to/second/save</url>
</file>
</version>
</version>
<version status="c-i">
<rev>1</rev>
<file>
<metadata><!-- Create version 0.1, first meaningful version --></metadata>
<url xml:lang="en-GB">/path/to/first/checked/in</url>
</file>
<!-- Check out to level 2 or check in to integer level -->
</version>
</version>
</resource>
</resources>
</vml>Let’s unravel this. A VML XML document is essentially a series of resources, each
of
which contains a nested version structure, like so:
Figure 2: Nested Version Elements

The base element provides the base URN, before versions (and their
renditions) are applied.[4] This means that the complete URN is what the base and the
nested version/rev elements provide. A version element looks
like this, regardless of level:
<version>
<rev>X</rev>
<file>
<url xml:lang="YY-ZZ">/path/to/file</url>
</file>
</version><rev>X</rev> provides a revision counter
on the current level. <url
xml:lang="YY-ZZ">/path/to/file</url>
links the actual file with the file’s URL and the localisation. A basic three-level
example XML file resource.xml resolving to a URN
urn:x-example:manual:000001:0.0.1 would look like this:[5]
<resource>
<base>urn:x-example:manual:000001</base>
<version>
<rev>0</rev>
<version>
<rev>0</rev>
<version>
<rev>1</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml</url>
</file>
</version>
</version>
</version>
</resource>Note
The VML schema is still in flux as I write this.
Checking out
the resource, editing it, and then checking
in
it again would result in a new patch-level
(level 3)
version 0.0.2:
<resource>
<base>urn:x-example:manual:000001</base>
<version>
<rev>0</rev>
<version>
<rev>0</rev>
<version>
<rev>1</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml</url>
</file>
</version>
<version>
<rev>2</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml.10</url>
</file>
</version>
</version>
</version>
</resource>The new version might then be reviewed and approved, bumping it to a minor
update
(level 2), in which case the new version, 0.2.0,[6] would be registered like so:
<resource>
<base>urn:x-example:manual:000001</base>
<version>
<rev>0</rev>
<version>
<rev>0</rev>
<version>
<rev>1</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml</url>
</file>
</version>
<version>
<rev>2</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml.10</url>
</file>
</version>
</version>
<version>
<rev>1</rev>
<file>
<url xml:lang="en-GB">/path/to/resource.xml.10</url>
</file>
</version>
</version>
</resource>The keen-eyed reader will note that version 0.0.2 and 0.1.0 both point at the same
file, /path/to/resource.xml.10, as no changes were required.
Storing Resource Versions
In the above example, the first stored version is
/path/to/resource.xml and the second
/path/to/resource.xml.10. What is actually stored here?
VML is a versioning abstraction, a way to identify meaningful change. It does not
dictate how or where to store the physical resources, only that the two are
different because they are identified using different version elements
in the CMS instance. A VML implementation might use a naming convention to keep the
saved resources unique in the file system,[7] but better is to implement it on top of a source control system such as
Git or, as in JACMS, eXist-db’s Versioning Module [github-versioning-module]. Many or even most of these systems save a diff rather than the entire file,
which has many advantages.
For a resource /db/test/doc/test.xml, eXist-db’s Versioning Module
records its version history like so:
<v:history xmlns:v="http://exist-db.org/versioning">
<v:document>/db/test/doc/test.xml</v:document>
<v:revisions>
<v:revision rev="5">
<v:date>2017-04-22T20:57:45.189+02:00</v:date>
<v:user>admin</v:user>
</v:revision>
<v:revision rev="9">
<v:date>2017-04-22T20:59:51.378+02:00</v:date>
<v:user>admin</v:user>
</v:revision>
<v:revision rev="17">
<v:date>2017-04-22T21:55:50.796+02:00</v:date>
<v:user>admin</v:user>
</v:revision>
<v:revision rev="25">
<v:date>2017-04-22T22:50:02.034+02:00</v:date>
<v:user>admin</v:user>
</v:revision>
</v:revisions>
</v:history>A function v:doc() can be used to retrieve a specific version. For
example, v:doc($url, 17) will return version 17, above.
Resources in JACMS
JACMS is built on top of eXist-db, a native XML database. We can use built-in functions to produce a list of resources stored in a given collection and its descendants in eXist. For example, given this collection hierarchy:
Figure 3: Files in eXist-db

We can output this as XML using built-in functions:
<exist:collection
name="/db/cms/content/dita-examples/04"
created="2025-04-01T16:17:37.371+02:00"
owner="admin"
group="dba"
permissions="rwxr-xr-x"
uri="/db/cms/content/dita-examples/04">
<exist:collection
name="/db/cms/content/dita-examples/04/sub"
created="2025-04-01T16:17:37.382+02:00"
owner="admin"
group="dba"
permissions="rwxr-xr-x"
uri="/db/cms/content/dita-examples/04/sub">
<exist:resource
name="relpath-map-linking-11.ditamap"
created="2025-04-01T16:17:37.385+02:00"
last-modified="2025-04-01T16:17:37.385+02:00"
owner="admin"
group="dba"
permissions="rw-r--r--"
uri="/db/cms/content/dita-examples/04/sub/relpath-map-linking-11.xml"/>
</exist:collection>
<exist:resource
name="topic11.dita"
created="2025-04-01T16:17:37.372+02:00"
last-modified="2025-04-01T16:17:37.372+02:00"
owner="admin"
group="dba"
permissions="rw-r--r--"
uri="/db/cms/content/dita-examples/04/topic11.xml"/>
</exist:collection>This is just an initial list of resources stored in eXist-db. These are not necessarily managed by JACMS. When a new XML file is created, edited, and stored in eXist-db, it needs to be added to the resources list, using a manual trigger; you can store and save XML in eXist-db without allowing it to be managed by JACMS.
The resources list can be displayed as a UI using XForms:[8]
Figure 4: Resources List XForm

The resources that are added to JACMS are enriched with an @id attribute,
a generated base URN that identifies the resource, to which separate
@version and @xml:lang attributes are added at the time of display:[9]
<exist:resource
name="new-content.xml"
created="2025-04-01T16:17:37.122+02:00"
last-modified="2025-04-01T16:17:37.122+02:00"
owner="admin"
group="dba"
permissions="rw-r--r--"
uri="/db/cms/content/dita-examples/new-content.xml"
id="id-123456"
base-urn="urn:x-example:manual:000001"
version="1.2.3"
xml:lang="en-GB"/>The initial list of resources is updated rather than regenerated whenever additional resources are added to JACMS. Most XML resources are created in the CMS, but non-XML resources such as images are created and edited elsewhere.
Generating URNs
The URNs used by JACMS take the following form:
urn:<namespace>:<type>:<sequence>:<version>:<locale>
-
<namespace>is the URN namespace; allowed values are defined in a config file -
<type>is the type of resource (manual, illustration, 3D graphic, etc); allowed values are defined in a config file -
<sequence>is the first available sequence number of the resource type -
<version>is the version string, e.g. 0.0.1; the number of levels is defined in a config file -
<locale>is the combined language and country of the resources; allowed values are listed in a config file
The URN generator is set of functions triggered when adding a resource to JACMS.
Depending on the type of resource, the exact process differs. Ideally, for example,
the XML should have an @xml:lang attribute that should be used to
populate the URN locale of the new XML. For a non-XML file, different approaches,
from file naming conventions to manually adding the information, are used.
Adding VML
The resources list (see section “Resources in JACMS”) is intended to list resources to be displayed. Usually these will be the latest versions; the default view in JACMS is just that. The version history of the resource is accessible in a separate VML instance, the two associated with each other using the resource ID. Consider this resource:
<exist:resource
name="new-content.xml"
created="2025-04-01T16:17:37.122+02:00"
last-modified="2025-04-01T16:17:37.122+02:00"
owner="admin"
group="dba"
permissions="rw-r--r--"
uri="/db/cms/content/dita-examples/new-content.xml"
id="id-123456"
base-urn="urn:x-example:manual:000001"
version="1.2.3"
xml:lang="en-GB"/>The matching VML instance looks like this:
<resource id="id-123456">
<base>urn:x-example:manual:000001</base>
...
<version>
<rev>1</rev>
...
<version>
<rev>2</rev>
<version>
<rev>1</rev>
<file>
<url xml:lang="en-GB">/db/cms/content/dita-examples/new-content.xml.155</url>
</file>
</version>
<version>
<rev>2</rev>
<file>
<url xml:lang="en-GB">/db/cms/content/dita-examples/new-content.xml.187</url>
</file>
</version>
<version>
<rev>3</rev>
<file>
<url xml:lang="en-GB">/db/cms/content/dita-examples/3new-content.xml.193</url>
</file>
</version>
</version>
</version>
</resource>Note id="id-123456".
The VML XML is first generated when the resource is added to JACMS. It is then added
to whenever a new version is created — which happens when checking in an edited XML
or
uploading a new non-XML resource. How the VML is updated depends on
the type of check-in. A new version is author-controlled if the purpose is to simply
check in your work. All that happens is the creation of a new version
structure on the same level.
A bump to a higher version level will generally require additional
workflow steps since its purpose is to produce a new, approved version. Note that
for a
three-level VML setup, the editing happens on level four; when a level four version
is
approved, it may result in a new version on any level, depending on the business rules
in place.[10]
In both cases, the check-in and the addition(s) to the VML are handled by XQuery and
some XSLT. Note that there is a third case, for the initial creation of the VML
instance. The processes are similar and rather simple — a check-in is essentially
a save
operation where we first generate a new version element with a
rev label (essentially a counter of preceding siblings and the new
element), and insert the new eXist-db version URL (see section “Storing Resource Versions”) to a
child url element.
Additional Metadata
A CMS is all about reuse. You may be familiar with the topic-based
paradigm
, that is, writing small topics, each one about a single
subject, and then assembling the topics into full documents, with each document
reusing existing topics as needed. The topics are Lego blocks; the documents are
what you do with them. This is what XML vocabularies such as DITA and S1000D
advocate.
But, as everyone with experience from topic-based authoring will know, whilst most topics can be reused if properly authored, there will always be content that needs tweaking. For example, your topic describes an engine, but the engine variants are all slightly different from each other, requiring differing assembly or disassembly steps, different illustrations, part numbers, etc.
This is where profiling becomes essential.[11] Profiling is simply about marking content as applicable to only certain configurations or properties. You have something like this:
<para product="A">This is for product A.</para> <para product="B">This is for product B.</para> <para>This is for all products.</para>
Variations of this basic theme are everywhere, albeit slightly different from one vocabulary to the next.
There is plenty of other content-specific metadata to consider, from a topic’s title to, say, vehicle VIN numbers or ATA codes.
Obviously, it would be useful to include some or all of the metadata in the listed resources (see Figure 4, where we include profiling metadata on the right-hand side). We probably don’t want to fetch that information from the content at runtime, however, so they need to be attached to the resource item when generating or updating the resources list, somehow. There are a few options:
-
Add the metadata as attributes (or elements, for that matter) to the resources list. This is far from ideal, as it means that the metadata applies to ALL resource versions, which probably won’t be the case.
-
Add the metadata to the VML document, either on the
resourceelement (meaning that the metadata applies to ALL versions) or on specificversionelements (meaning that the metadata only applies to that version). The downside of this approach is that the VML schema would have to accommodate any number of metadata approaches to handle the various schemas we want JACMS to support. -
Add the metadata to a third, separate XML file, with one file per resource or even per specific version; or we could use a single file for all resources. This is the cool approach; for more, see section “Coupling a Resource with Its Version History”.
Currently, JACMS uses option two. VML adds a metadata element to both
resource and version. metadata, in turn,
contains a foreign element that literally allows markup,[12] excepting elements in the VML namespace:
<resource>
<metadata>
<foreign>
<other:foo/>
</foreign>
</metadata>
<base>urn:x-vml-test:mydoc:000001</base>
<version>
<metadata>
<foreign>
<other:bar/>
</foreign>
</metadata>
<rev>0</rev>
...This, of course, requires additional validation and processing, and I’m not convinced that it’s the best approach. It does work, though.
Coupling a Resource with Its Version History
Whilst it would be a reasonable approach to add the links to the VML directly in the resources list, a more robust approach is to add a third XML document type, en extended XLink linkbase,[13] to define those relations. Here’s a brief example:
<linkbase xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="extended">
<title xlink:type="title">Connecting a resource with its linkbase</title>
<group>
<title xlink:type="title">A resource, its VML and the matchmaker</title>
<locator xlink:type="locator" xlink:href="/path/to/resources-list.xml#id-123456" xlink:label="resources-list"/>
<locator xlink:type="locator" xlink:href="/path/to/vml.xml#id-123456" xlink:label="vml"/>
<arc xlink:type="arc" xlink:from="resources-list" xlink:to="vml"/>
<arc xlink:type="arc" xlink:from="vml" xlink:to="resources-list"/>
</group>
</linkbase>This has several advantages. First, the relations are defined out-of-line,
independently from either document type, and can be kept unchanged if the resources
list is modified or even regenerated. Second, we can define the direction of the
relation — notice how there are two arc elements defining the
relationship between the resources list and the VML, one in each direction. And
third, the arc elements define relations between labels representing
the resources rather than the resource URLs themselves, allowing us to create
classes of relationships so we don’t have to define each one separately.
Finally, even though the current implementation places the metadata of each resource and version inline in the VML instances, the linkbase allows us to move that information to an out-of-line metadata XML document. It can also be argued that a combined approach, with some metadata inline and some out-of-line, fits better when adding XML vocabularies such as S1000D, where the so-called BREX data modules that define the business rules of the setup are separate documents.
The Process
It occurs to me that the intended processes may not be obvious to the casual reader, so this section attempts to fill in the gaps. Firstly, adding a new document to JACMS control is a manually controlled process:
-
Create and edit an XML document in oXygen.[14] Save it in eXist-db.
-
Add the saved document to JACMS. This is currently a script that triggers manually, generating a URN, additional resources list attributes, and a VML instance. We also add the necessary linkbase locators and arcs.
Non-XML files are uploaded to eXist-db and then added to JACMS using that same script. Currently, the script needs to be manually configured to add the resource type (manual, image, etc) but the idea is to add this to the XForm listing all available eXist-db resources.
Resources already controlled by JACMS get new versions as follows:
-
An XML document is checked out (there is a check-out/in flag, a
@statusattribute, on theversionelement) and edited. For non-XML resources, the editing happens outside JACMS. Once done, the resource is uploaded and checked in.[15] -
When the XML is checked in again, a function is triggered to generate a new
versionelement. By default, this essentially runs an XSLT that adds the newversionas a following sibling in the VML; this is still a work in progress.A checked-in XML may be bumped up to a higher-level
version, from a work in progress (e.g. 1.2.3.4) to the next approved version (e.g. 1.2.4). This is a workflow process, not a simple check-in by the author.
In Closing
JACMS, as mentioned, is very much a work in progress, so if the paper sounds a bit vague or even contradictory in places, it probably is. The final version of this paper will hopefully not have those particular problems.
Here are the next few steps (some of which may be done by the time you read this):
-
Finish the functions and transforms that manipulate version progression. This is on-going as I write this, but getting along nicely.
-
Add functions that register a physical file to a new version.
-
Add workflow functionality to the VML.
-
Finish the various XForms required to display resources and their versions; what I have at the time of this writing are tests and drafts.
-
Check-out/Check-in functionality
-
Etc.
But also, make the JACMS repository public.
Lastly, I’d like to thank my friends Charaf Eddine, Joe Crowther, Adam Retter, and Geert Bormans, all of whom have contributed ideas, support, and encouragement.
References
[Multilevel Versioning] Multilevel
Versioning
[online, fetched on 2 April 2025].
doi:https://doi.org/10.4242/BalisageVol13.Nordstrom01
[The Dream of a CMS] The Dream of a
CMS
[online, fetched on 2 April 2025].
doi:https://doi.org/10.4242/BalisageVol28.Nordstrom01
[vml-github] Version Markup Language (VML)
[Github
repository fetched on 6 April 2025]. https://github.com/sgmlguru/vml
[github-versioning-module] Versioning Module for eXist-db
[Github repository fetched on 9 April 2025]. https://github.com/eXist-db/xquery-versioning-module
[1] The original
could be any one of them, or some other
rendition of version 3.
[2] Think of an image saved in PNG and JPG formats. The methods or renditions differ but the content remains the same.
[3] An argument can be made that the Finnish translation could have a number of separate, independent fourth-level versions meaningful to the translation, but in my view, this is not consistent with a single version and localisation strategy.
[4] Importantly, a base URN
is what I call an abstract
resource
, something that does not concern itself with a specific
version (how the resource changes over time) or rendition (locale). The base URN
cannot be an actual physical resource, but it’s a useful abstraction to provide
a unique identifier for a resource.
The very first version of a new resource will always have a version (for a
four-level version this will be 0.0.0.1
) and a locale. The base
URN is an abstract thing; it intends to cover everything but is unable to point
to a specific version or rendition.
[5] Assuming a single version component in the URN.
[6] With the level 3 0
implied.
[7] Or store them in different folders, or any combination thereof, an approach suggested in my original Balisage paper.
[8] The UI presented here is a test and will change.
[9] The list is used in multiple contexts, some of which require to list older versions and other locales.
[10] Here, a business rule is simply the practice that dictates what level the new version will get depending on the content change, as employed by the organisation.
[11] It is important to note that profiles must be part of
the content, as stored in the CMS, just as a title or paragraph would be.
JACMS is not about storing any possible profile
, just the
ones that are actually expressed by the markup.
If you come from a DITA world, the available profiles are part of the content handled by the CMS. The possible outputs, as defined by DITAVAL filters, may not be, unless the DITAVAL filters, themselves XML documents, are handled by the CMS. DITAVAL documents are usually applied when publishing a DITA map and therefore frequently seen as transient. How they are managed is not something I should decide; if a JACMS user wants to store specific DITAVAL filters, it should work. If not, it should work, too.
[12] This could be my first-ever ANY content model. Certainly one
of very, very few.
[13] Defined by a modest RNC schema.
[14] oXygen has an out-of-the-box integration with eXist-db.
[15] There needs to be a UI control in the XForm to trigger the upload from the right resource in JACMS, but no such thing exists at the time of this writing.