How to cite this paper
La Fontaine, Robin. “Representing Overlapping Hierarchy as Change in XML.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). https://doi.org/10.4242/BalisageVol17.LaFontaine01.
Balisage: The Markup Conference 2016
August 2 - 5, 2016
Balisage Paper: Representing Overlapping Hierarchy as Change in XML
Robin La Fontaine
Robin is the founder and CEO of DeltaXML. He holds an Engineering Science
degree from Oxford University and an MSc in Computer Science. His background
includes computer aided design software and he has been addressing the
challenges and opportunities associated with information change for many
years.
Copyright © DeltaXML Limited 2016. All rights reserved.
Abstract
Changes in an XML document may effect not only element and attribute content but,
more problematically, the markup hierarchy. Markup for tracking structural changes
must represent multiple, often overlapping, structures in the same document. Thus
the perennial problem of overlap becomes a subset of the problem of managing change
to structured documents, such as versions of documents amended over time. Our work
started with a delta format for two or more documents, which easily represents
inline changes, but handles hierarchy change by duplicating content. In order to
avoid duplication, we introduce a distinction between the name of the element (its
tag) and the element content, so that assertions can be made separately. We then
introduce @dx (change) and @dxTag (change tag) attributes to mark changes. This
representation allows us to define overlapping hierarchies in a completely XML way
without declaring a dominant hierarchy and while keeping element fragmentation to
a
minimum. While this solution probably will not scale for large numbers of variants,
it shows promise for many classes of documents.
Table of Contents
- Introduction and Background
- How Content Duplication Represents Any Change
- Representing Structural Change without Content Duplication
- Dominant Hierarchy
- Attributes
- Conclusions
Introduction and Background
Some discoveries, including quite important discoveries such as penicillin, are made
by accident. It can be the case that when looking for a solution to one problem, we
stumble upon a solution to another problem. This is one of those cases. Our objective
was to find a way to represent change to structure, and this turned out to provide
a
useful representation for overlapping hierarchy, but with the advantage that other
changes could also be represented.
Jeni Tennison says in one of her excellent blogs, "Overlap is arguably the main
remaining problem area for markup technologists." [1]. She
points out that this is not only an issue for academics looking at poetry and historical
documents, but is also an issue in managing change to structured documents. The example
she cites is legislation which is amended over time where the authors are not concerned
about changes to structure, their primary interest is in the textual changes.
There are a number of different approaches to this problem, and some excellent reviews
of the advantages and disadvantages of the approaches [2]
[3] [4] [5].
Our own goal is to represent changes to documents, such as versions of documents over
a
period of time as they are amended, and to represent them in a way that is easy to
process. This reflects the classic advantage of XML, where content can be re-purposed
to
meet different needs. If the document can be re-purposed, then we need to be able
to
re-purpose changes also, and this means changes need to be represented in way that
is
easy to process.
Ignoring for the moment changes to attributes, most changes can be represented by
the
addition and deletion of elements and their content. Additionally, we need to be able
to
mark segments of text that are either added or deleted. This approach allows us to
represent any change, although not always in an optimal way. For example, in the
extreme, the deletion of the 'old' document and addition of the 'new' document correctly
represents the changes, but not in a very useful way. This leads to the observation
that
by duplicating content it is always possible to represent a change in a structured
document. The problem is that we do not wish to duplicate content because this appears
to the user as a change to the content, whereas in practice the only change may be
to
the structural markup around that content. This leads to the need to represent the
addition, deletion, and overlapping of structural elements representing
hierarchy.
The TEI format [6] has powerful, though complex, ways of
representing different hierarchies, and also variants of text within a document. The
goal is to provide rich semantic information about the document, representing all
of
this information in a single place. Using this semantically rich representation, it
would be possible to generate all the different variants of the document, including
variants of the text and variants of the hierarchy. When we are considering change,
it
is essentially all these different variants that we use as a starting point. Therfore
in
this respect our goal is very similar to, but not quite the same as, the goal of the
TEI
format. As our starting point is a set of document variants, it is natural that we
clearly identify each of these source variants in the single merged document. We
therefore always make a very precise differentiation between two overlapping structures,
because these are considered to have come from different source documents.
The inherent model that we adopt here, i.e. one that addresses the representation
of
variants of the whole document, is important because it does differ from a model where
the desire is to represent variants in structure within a document. The latter model
can
lead to a very large number of whole document variants, and our model is not well
suited
to a large number of variants because the attribute values representing the variants
become long and therefore difficult to manage. Our model addresses primarily overlap
in
the context of change to a document and is not intended as a solution to all overlap
representation problems.
Although TEI has these mechanisms, most XML document formats, such as DITA[7] or DocBook[8], do not and would therefore
benefit from a way of representing overlap. In these formats, overlap representation
is
needed in order to better represent change. There is a clear advantage to having a
standard way to enhance an existing schema with change and overlap representation
because structured document editing applications then need to understand only one
way of
handling this. Schmidt [9] suggests that a good way to manage
documents that have overlapping hierarchy is to split them into separate documents
and
merge them as needed, though this idea does not seem to have gained a significant
following.
There is another distinguishing feature of this solution. In other solutions for
representing overlap, identifier attributes (which may or may not be strictly of type
xml:id) are often used to indicate which fragments are part of the same element, but
with this solution there is no such use of identifier attributes. The problem with
using
identifier attribtues is that it is difficult to denote a fragment that is part of
two
separate hierarchies because only one identifier attribute can be present on each
element. The identifier attribute could contain a list of identifiers but this does
lead
to make it more difficult to process.
The representation described here is pure XML. As such, standard XML processing tools
such as XSLT and XQuery can be used to process it. Each of the original document
variants can be extracted: this was our primary goal and is an important feature.
We
have verified that it is quite simple in XSLT to extract a single version, and it
is
simple to determine the ancestors of a particular element or piece of text. We are
currently researching alternative types of processing. One XSLT approach shows
particular promise for processing n-way comparison results. This uses a template that
employs sibling recursion and XSLT 3.0 maps, the maps keep track of the state of each
tree using an extension to the principle of a common stack.
There are validation rules, which we express in Schematron, for this representation.
Validation against the original schema of the source documents would need to be done
by
extracting each version and validating it. In other words, we can assert that the
representation is correct if the Schematron rules are passed and if we can extract
each
of the original documents correctly, i.e. the extracted document is deep equal to
the
original.
How Content Duplication Represents Any Change
Our starting point was an existing solution (a delta format) for representing change
to elements, attributes and text in XML documents. Any change could be represented, but
changes to structure required some duplication of content. For example, two paragraphs
(denoted A and B) might
be:
<p>The quick brown fox.</p>
and
<p>The <s>quick</s> brown fox.</p>
This is a change only to the XML tag structure, the textual content is unchanged.
However, we can represent the change by deleting the word ‘quick’ and adding the element
<s>quick</s>
This is a perfectly valid
representation of the change, but it implies that there has been deletion and addition
and thus that the text has changed. This is shown below. The dx attribute indicates
the
documents in which the element and its content were present. The deltaxml:textGroup
and
deltaxml:text elements are wrappers introduced to delineate the word that has been
deleted. We need the wrapper as a container for the dx attribute that applies to the
text. The reason for the double wrapper here is that there may be more than one variant
of the text, so more than one deltaxml:text element, and it is then useful to have
these
grouped in the outer deltaxml:textGroup for easier
processing.
<p dx="A,B">The
<deltaxml:textGroup dx="A">
<deltaxml:text dx="A">quick</deltaxml:text>
</deltaxml:textGroup>
<s dx="B">quick</s>
brown fox.
</p>
It would be preferable if we could represent this change without implying change to
the content. This is discussed in the next section.
Representing Structural Change without Content Duplication
In order to avoid duplication of content, we need to distinguish between the element
tag and its content so that we can make assertions about the tag and content separately
and independently.
As a starting point, we can add an attribute to an element to indicate whether or
not
this element was present in a particular variant of the document. If the element was
present, then the implication is that both the tag and the contents were present.
In the
above situation, we want to indicate that the content, i.e. the word 'quick', was
present in two versions, but the tag, i.e. the <s>, was only present in one version.
We can take a simple approach to this and add an additional attribute with this
information.
<p dx=”A,B” dxTag=”A,B”>The
<s dx=”A,B” dxTag=”B”>quick</s>
brown fox.</p>
Here, the dx attributes tells us the documents in which the element (and its content)
were present, as described above. But now the dxTag attribute tells us a bit more:
whether or not the tag itself was present. So where the document identifiers are the
same in both the dx attribute and the dxTag attribute, the element and its content
were
present. Where we see dx='A,B' and dxTag='B' we can deduce that the tag was present
only
in B. This means that A contained ‘quick’ and B contained ‘<s>quick</s>’.
We can optimize this a little by omiting the dxTag attribute if its value is the same
as the dx value. Therefore we
get:
<p dx=”A,B”>The
<s dx=”A,B” dxTag=”B”>quick</s>
brown fox.</p>
This is a simple representation of a simple change. We can make an adjustment to this
to represent, for example, a change from <i> in document A to <s> in document B as
follows:
<p dx=”A,B”>The
<i dx=”A,B” dxTag=”A”><s dx=”A,B” dxTag=”B”>quick</s></i>
brown fox.</p>
We can now introduce some overlap and see how the principles above are extended. When
overlap occurs, in order to avoid duplicating content, we need to split some of the
elements into fragments - this is the approach that Jeni Tennison calls 'fragmentation'.
When we fragment an element, then clearly one original element becomes two or more
fragments. The dxTag attribute refers to the whole tag, so we need to extend this
to
represent the start and the end. To achieve this we have dxTagStart and dxTagEnd so
that
we clearly distinguish between the start fragment and the end fragment. In more complex
situations where an element is split into more than two fragments, we also introduce
dxTagMiddle for any fragement betwen the start and end fragments.
This is an example of simple
overlap:
<p>The quick brown fox. It jumped over the lazy dog.</p>
<p>The quick brown fox.</p><p> It jumped over the lazy dog.</p>
This is represented
as:
<p dxTagStart="A" dxTag="B" dx="A,B">The quick brown fox.</p>
<p dxTagEnd="A" dxTag="B" dx="A,B"> It jumped over the lazy dog.</p>
This shows two <p> elements, and for the B document each of these represents a
complete element, denoted by dxTag="B". For the A document, the two <p> elements are
fragements and so the first is identified by dxTagStart="A" and the second one by
dxTagEnd="A". This is an unambiguous representation that requires no duplication of
textual content. The astute observer may comment that the leading space in the second
paragraph of the B document would probably have been deleted. Proper handling of
whitespace is a consumer of considerable time and effort in XML document processing.
This type of change could be represented but it complicates the story so is ignored
for
this example.
We can now consider an example of double overlap, where text is moved from one
paragraph to
another:
<p>The quick brown fox. It jumped over the lazy dog.</p><p> Yes!</p>
<p>The quick brown fox.</p><p> It jumped over the lazy dog. Yes!</p>
This is represented
as:
<p dxTagStart="A" dxTag="B" dx="A,B">The quick brown fox.</p>
<p dxTagEnd="A" dxTagStart="B" dx="A,B"> It jumped over the lazy dog.</p>
<p dxTag="A" dxTagEnd="B" dx="A,B"> Yes!</p>
This shows three <p> elements, all of which are fragments in at least one document.
In the B document the first of these represents a complete element, denoted by
dxTag="B". The last two <p> elements are fragments and so the first is identified
by
dxTagStart="B" and the second one by dxTagEnd="B". This mechanism will scale to any
level of complexity, for example three or more overlapping hierarchies. As overlap
increases, so does the fragmentation and therefore the complexity of the result.
Although there is not time to explore this more fully in this paper, it would
certainly be interesting to determine how easy it is to perform queries on this
structure such as, "find all the paragraphs containing both the word 'fox' and the
word
'dog'" and have this return just the A document because in the B document these words
are in different paragraphs.
We can now look at a larger example including a change. We will for the example ignore
white space changes. The A document
is:
<book>
<p>
<seg>Scorn not the sonnet;</seg>
<seg>critic, you have frowned, Mindless of its just honours;</seg>
<seg>with this key SHAKESPEARE unlocked his heart;</seg>
<seg>the melody Of this small lute gave ease to Petrarch's wound.</seg>
</p>
</book>
And the second, B, document is as
follows:
<book>
<l>Scorn not the sonnet; critic, you have frowned,</l>
<l>Mindless of its just honours; with this key</l>
<l>Shakespeare unlocked his heart; the melody</l>
<l>Of this small lute gave ease to Petrarch's wound.</l>
</book>
There are different representations that we can generate for this depending on how
we
decide to nest the fragments. For example, if we generally nest the <seg> elements
inside the <l> elements, we get this
result:
<book dx="A,B">
<p dx="A,B" dxTag="A">
<l dx="A,B" dxTag="B">
<seg dx="A,B" dxTag="A">Scorn not the sonnet; </seg>
<seg dx="A,B" dxTagStart="A">critic, you have frowned,</seg>
</l>
<l dx="A,B" dxTag="B">
<seg dx="A,B" dxTagEnd="A">Mindless of its just honours; </seg>
<seg dx="A,B" dxTagStart="A">with this key</seg>
</l>
<l dx="A,B" dxTag="B">
<seg dx="A,B" dxTagEnd="A">
<deltaxml:textGroup dx="A,B">
<deltaxml:text dx="A">SHAKESPEARE</deltaxml:text>
<deltaxml:text dx="B">Shakespeare</deltaxml:text>
</deltaxml:textGroup> unlocked his heart;</seg>
<seg dx="A,B" dxTagStart="A">the melody</seg>
</l>
<l dx="A,B" dxTag="B">
<seg dx="A,B" dxTagEnd="A">Of this small lute gave ease to Petrarch's
wound.</seg>
</l>
</p>
</book>
It is instructive to visualize this structure as shown below. Here we are looking
at
it primarily as document A, so the tags and text that belong only to B have been greyed
out. This is to visualize more clearly the A structure. Some of the <seg> elements
are still split so these would need to be merged in order to get back to the original
A
document, although the basic original structure of A is apparent.
This visualization illustrates the very simple nature of this approach. The attributes
we are adding provide information about an element, specifically for each variant
the
attributes tell us which of the following is true:
-
The tag and its content are present in this variant and the element is not
fragmented
-
The tag and its content are present in this variant and the element is
fragmented, so this is the start, the end or a middle fragment
-
The content is present in this variant but not the tag
-
The tag and its content are not present in this variant
Therefore it is very simple to extract any one variant from the whole document or
any
part of it. It is also very simple to work out, for a given piece of content, the
list
of ancestors in any variant. An important characteristic of this representation is
that
as the overlap reduces to zero so the representation reduces to the original
structure.
Dominant Hierarchy
Methods for representing overlapping hierarchy often need to know the dominant
hierarchy in order to know which tree structure 'overrides' the others. In this proposed
representation, there is no need for a concept of a dominant hierarchy. We are at
liberty to create a hierarchy that reduces the fragmentation as far as possible.
Therefore it is possible to adopt various different algorithms to generate different
results. The format describes how to represent overlapping hierarchy, it does not
dictate what the overlap should be. Therefore another valid representation of the
example above would be as
follows:
<book xmlns:dx="xx" dx="A,B">
<p dx="A,B" dxTag="A">
<seg dx="A,B" dxTag="A">
<l dx="A,B" dxTagStart="B">Scorn not the sonnet;</l>
</seg>
<seg dx="A,B" dxTagStart="A">
<l dx="A,B" dxTagEnd="B">critic, you have frowned, </l>
</seg>
<seg dx="A,B" dxTagEnd="A">
<l dx="A,B" dxTagStart="B">Mindless of its just honours;</l>
</seg>
<seg dx="A,B" dxTag="A">
<l dx="A,B" dxTagEnd="B">with this key </l>
<l dx="A,B" dxTagStart="B">
<dx:textGroup dx="A,B">
<dx:text dx="A">SHAKESPEARE</dx:text>
<dx:text dx="B">Shakespeare</dx:text>
</dx:textGroup>
unlocked his heart;</l>
</seg>
<seg dx="A,B" dxTag="A">
<l dx="A,B" dxTagEnd="B">the melody </l>
<l dx="A,B" dxTag="B">Of this small lute gave ease to Petrarch's
wound.</l>
</seg>
</p>
</book>
We can also take this a step further, and look at the representation for what might
be
called full fragmentation, i.e. each piece of text that has a different set of ancestors
is put into a single fragment. It would also be possible to treat the paragraph element
in the same way, but ideally this can be kept as a single element around all of the
text, providing a clearer and simpler representation.
<book dx="A,B">
<p dx="A,B" dxTag="A">
<seg dx="A,B" dxTag="A">
<l dx="A,B" dxTagStart="B">Scorn not the sonnet;</l>
</seg>
<seg dx="A,B" dxTagStart="A">
<l dx="A,B" dxTagEnd="B">critic, you have frowned, </l>
</seg>
<seg dx="A,B" dxTagEnd="A">
<l dx="A,B" dxTagStart="B">Mindless of its just honours;</l>
</seg>
<seg dx="A,B" dxTagStart="A">
<l dx="A,B" dxTagEnd="B">with this key </l>
</seg>
<seg dx="A,B" dxTagEnd="A">
<l dx="A,B" dxTagStart="B">
<dx:textGroup dx="A,B">
<dx:text dx="A">SHAKESPEARE</dx:text>
<dx:text dx="B">Shakespeare</dx:text>
</dx:textGroup>
unlocked his heart;</l>
</seg>
<seg dx="A,B" dxTagStart="A">
<l dx="A,B" dxTagEnd="B">the melody </l>
</seg>
<seg dx="A,B" dxTagEnd="A">
<l dx="A,B" dxTag="B">Of this small lute gave ease to Petrarch's wound.</l>
</seg>
</p>
</book>
The actual hierarchy of the overlapping elements can be determined based on any
criteria. One criterion might be to minimise the fragmentation. The results of an
automated generation of the above by comparing the two documents and aligning them
according to their text content is shown below. In this example the attribute names
are
shown in full, e.g. dx attribute is shown as deltaxml:deltaV2 and its content indicates
whether the two documents are equal, i.e. "A=B" or not equal, i.e. "A!=B". The hierarchy
is reconstructed to reduce
fragmentation.
<book xmlns:deltaxml="http://www.deltaxml.com/ns/well-formed-delta-v1"
deltaxml:deltaV2="A!=B"
deltaxml:version="2.1" deltaxml:content-type="full-context">
<p deltaxml:deltaV2="A!=B" deltaxml:deltaTag="A">
<seg deltaxml:deltaV2="A!=B" deltaxml:deltaTag="A">
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagStart="B"
>Scorn not the sonnet;</l>
</seg>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagMiddle="B"> </l>
<seg deltaxml:deltaV2="A!=B" deltaxml:deltaTag="A">
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagEnd="B">critic, you have frowned,</l>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagStart="B">Mindless of its just honours;</l>
</seg>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagMiddle="B"> </l>
<seg deltaxml:deltaV2="A!=B" deltaxml:deltaTag="A">
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagEnd="B">with this key</l>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagStart="B">
<deltaxml:textGroup deltaxml:deltaV2="A!=B">
<deltaxml:text deltaxml:deltaV2="A"
>SHAKESPEARE</deltaxml:text>
<deltaxml:text deltaxml:deltaV2="B"
>Shakespeare</deltaxml:text>
</deltaxml:textGroup>
unlocked his heart;</l>
</seg>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagMiddle="B"> </l>
<seg deltaxml:deltaV2="A!=B" deltaxml:deltaTag="A">
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTagEnd="B">the melody</l>
<l deltaxml:deltaV2="A!=B" deltaxml:deltaTag="B">Of this
small lute gave ease to Petrarch's wound.</l>
</seg>
</p>
</book>
In addition there are several elements that contain only white space, e.g. the second
<l> element. This is because the A document contained a space between the two
<seg>
elements:
<seg>Scorn not the sonnet;</seg> <seg>critic, you have frowned, Mindless of its just honours;</seg>
The
B document had this space within the <l>
element:
<l>Scorn not the sonnet; critic, you have frowned,</l>
As mentioned earlier, correct handling of white space is often very complicated
because a careful distinction needs to be made between white space that can be ignored
and white space that is part of the content. Element boundaries are not always word
separators, for example elements that represent formatting are not considered word
separators whereas a new line would be considered a word separator. This is often
not
clearly specified or represented in the XML schema.
The overlapping hierarchy representation described here is therefore suited to a
number of different situations.
Attributes
Attributes are an important part of the XML structure, and have not yet been
mentioned. Where an element appears in a particular document variant, and is not
fragmented, it is simple to add the attributes onto that element as part of the start
tag. When an element has been fragmented, then the attributes for that element will
appear in the start tag, i.e. the element with the dxTagStart attribute. This means
that
any attributes that appear on a middle tag or end tag would not be relevant to a
particular document variant.
This is an example of simple overlap including some attribute
data:
<p>The quick brown fox. It jumped over the lazy dog.</p>
<p>The quick brown fox.</p><p class="B"> It jumped over the lazy dog.</p>
This is represented
as:
<p dxTagStart="A" dxTag="B" dx="A,B">The quick brown fox.</p>
<p dxTagEnd="A" dxTag="B" dx="A,B" class="B"> It jumped over the lazy dog.</p>
This shows the class attribute but an attribute applies only to those variants where
the tag is a dxTag or dxTagStart. Therefore class="B" applies only to the B document
because for A this <p> is an end tag.
Changes to attributes can also be represented. This is done by converting the
attribute into markup as part of a new first child of the element. Although
theoretically possible to represent changes to attributes within attributes, this
leads
to some dedicated syntactic conventions within the attribute string, which is not
easy
to process. Therefore separating change attributes out into XML markup makes processing,
particularly using XSLT, much easier.
This is an example of simple overlap, including some changed attribute
data:
<p class="B" align="left">The quick brown fox. It jumped over the lazy dog.</p>
<p class="B" align="right">The quick brown fox.</p><p> It jumped over the lazy dog.</p>
This is represented
as:
<p dxTagStart="A" dxTag="B" dx="A,B" class="B">
<deltaxml:attributes>
<dxa:align dx="A,B">
<deltaxml:attributeValue dx="A">left</deltaxml:attributeValue>
<deltaxml:attributeValue dx="B">right</deltaxml:attributeValue>
</dxa:align>
</deltaxml:attributes>
The quick brown fox.</p>
<p dxTagEnd="A" dxTag="B" dx="A,B"> It jumped over the lazy dog.</p>
This shows that the unchanged attribute, class="B", remains as an attribute, but the
changed align attribute is represented as markup to show the two values. This is a
simplified representation and full details can be found in the documentation of the
DeltaXML DeltaV2.1 format [11].
The delta representation also allows an alternative representation because the <p>
tag in the A document can be wrapped around the two <p> tags in the B document, as
shown below:
<p dxTag="A" dx="A,B" class="B" align="left">
<p dxTag="B" dx="A,B" class="B" align="right">The quick brown fox.</p>
<p dxTag="B" dx="A,B"> It jumped over the lazy dog.</p>
</p>
This is, in this case, a shorter representation though it has in effect used
duplication of the (unchanged) attributes and tags to show the change. However, this
may
be a preferred representation for some formatting elements, for example if the class
attribute in a <span> is changed then it may be more useful to represent this as a
different <span>. Both representations conform to the delta format.
Conclusions
This paper has described a new representation for overlapping hierarchy which is also
capable of representing changes to text and attributes. This makes it suitable for
some
important use cases for overlapping hierarchy, particularly the representation of
change
between two or more variants of a document.
A significant advantage over some previous representations is that it is pure XML,
and
therefore can be processed using standard XML tools. The dominance of one hierarchy
over
another does not need to be fixed and this means that the actual hierarchy of the
overlapping structures can be determined for other reasons and indeed varied throughout
the document. This flexibility allows fragmentation of elements to be kept to a
minimum.
The underlying data model is based on document variants and therefore is better suited
to situations where the number of variants is small. Although it does scale to any
number of variants, its complexity increases as the number of variants increases,
e.g.
each new variant has an identifier in the dx attribute so this will become longer
and
more difficult to interpret.
Overlapping hierarchy is a powerful tool to use in certain markup situations, though
its use can lead to complex situations and any solution is also likely to look
complicated. This paper is intended to contribute to the discussion as the XML community
continues to strive for a simple, generic and universal solution to this problem.
References
[1] Overlap, Containment and Dominance, URN:http://www.jenitennison.com/2008/12/06/overlap-containment-and-dominance.html
[2] Modeling overlapping structures, Yves
Marcoux, Michael Sperberg-McQueen, Claus Huitfeldt, http://www.balisage.net/Proceedings/vol10/html/Marcoux01/BalisageVol10-Marcoux01.html. doi:https://doi.org/10.4242/BalisageVol10.Marcoux01
[3] Markup Overlap: A Review and a Horse, Steven DeRose, http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html
[4] Multiple hierarchies: new aspects of an old solution,
Andreas Witt, http://conferences.idealliance.org/extreme/html/2004/Witt01/EML2004Witt01.html
[5] Representation of overlapping structures,
Michael Sperberg-McQueen, http://conferences.idealliance.org/extreme/html/2007/SperbergMcQueen01/EML2007SperbergMcQueen01.html
[6] TEI: Text Encoding Initiative, http://www.tei-c.org/index.xml
[7] OASIS Darwin Information Typing Architecture (DITA)
TC, https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=dita
[8] OASIS DocBook TC, https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=docbook
[9] Schmidt, Desmond. “Merging Multi-Version Texts:
a Generic Solution to the Overlap Problem.” Balisage Series on Markup Technologies,
vol.
3 (2009), http://www.balisage.net/Proceedings/vol3/html/Schmidt01/BalisageVol3-Schmidt01.html. doi:https://doi.org/10.4242/BalisageVol3.Schmidt01
[10] Overlapping Hierarchies in DeltaV2 Format, http://www.deltaxml.com/support/documents/deltav21
[11] Two and Three Document DeltaV2 Format
(patent pending), http://www.deltaxml.com/support/documents/deltav2