How to cite this paper
Vion-Dury, Jean-Yves. “Stand-alone Encoding of Document History (or One Step Beyond XML Diff).” Presented at Balisage: The Markup Conference 2010, Montréal, Canada, August 3 - 6, 2010. In Proceedings of Balisage: The Markup Conference 2010. Balisage Series on Markup Technologies, vol. 5 (2010). doi:10.4242/BalisageVol5.Vion-Dury01.
Stand-alone Encoding of Document History
(or One Step Beyond XML Diff)
Balisage: The Markup Conference 2010
August 3 - 6, 2010
Introduction
XML being recognized today as the lingua franca of computerized information, some key
basic functionalities become of universal interest and will predictably gain momentum in
the upcoming XML related economy.
One of these functionalities is the computation of differences between two XML
documents, and more generally, the management of the history of a given document. In
particular, this functions as a cornerstone of any system aiming at offering long term
preservation.
Today, such functionality is achieved through content management systems or databases.
Models and operations are therefore hidden inside a black box, and up to the author's
knowledge, no standard mechanism makes this information explicit and accessible to human
users neither to external algorithms and processors.
Our proposal addresses this issue by associating a target document instance together
with its history inside a standalone and consistent document, thus gaining strong
potential for current or future interoperability.
The notion of history, when applied to a document,
calls for a deeper understanding of the global context, intents and social processes
underpinning the creation, evolution and maintenance of documents (see pédauque03). Hence, the relevant structure of a meaningful document history highly depends on the document typology and
usages.
In addition, the criterion that founds any document history is the permanence of some
key deep invariants (for a general, ontological, reflection around this line, see pédauque05). Those invariants define the identity of the document itself
and their loss inevitably ends the versioning process and calls for a new creation
cycle.
The method we describe in this paper takes into consideration the key points above,
especially through a flexible calculus of difference descriptors, and through offering
high level transformation operations that preserve the consistency of both the document
instance and its whole history.
Overview
The target document (also called body hereafter),
i.e. any XML document using any namespace, is encapsulated inside a dedicated packaging
XML document. The body always consistently relates to a given state of its history (not
necessarily the most recent one) thanks to a dedicated attribute that refers to a
version label and makes this relation explicit.
The history itself is encoded as interrelated nodes (XML elements) and has the
structure of a directed acyclic graph (DAG); each node of this graph models a versioning
point, i.e. a particular significant state that the document reached during its lifetime
and for which a version has been recorded. The meaning of any versioning point with
respect to the document life cycle is application dependent, as discussed above, and the
proposed approach abstracts over application semantics, just considering that a new
version is built when some actor in the document life cycle decided it makes sense to do
so. Arcs that connect the nodes are decorated with deltas, and these arcs model the
transformation that allows changing the document from one versioning point to the other.
Thus arcs are oriented. Deltas descriptors are combinations of low level operations on
the document tree, mainly based on deletion and insertion of subtrees.
Thanks to a set of transformations qualitatively described in this paper, it is
possible to navigate inside the history of the document and to consistently extract any
version of the document captured in the history graph. It is also possible to create new
versioning points, new branches or to merge existing branches, all these operations
producing novel consistent encapsulations of the target document.
Modelling Abstract Differencing Operations
Document model
The encapsulating XML document can be represented (in an abstract way) as a tree
like this:
where d represent the target document and
vi a versioning point.
The full syntax of our encapsulation is available through a RelaxNG Schema (see
appendix). The examples Figure 26, Figure 27, Figure 28 and Figure 29 illustrate concretely the way we
propose to encode the history graph (DAG) in XML, and the way delta operations are
attached to transition arcs.
Diff Engine
We assume that a diff engine is able to operate as a black box function. Its
abstract signature could be for instance:
with config being some input information used to configure the processor (e.g.
filter to apply, mode commutative/non-commutative,…). The notation Δ represents a
set of basic delta operations, formally described hereafter.
Abstracting over Delta Operations
The delta operation use paths, noted p, to
designate the tree location where the modification should be applied. Those paths
are describe by the following grammar (i is a
positive integer, nm an attribute name, pp denotes "pure paths"):
These paths are interpreted as relative to the root of the encapsulated document,
and can be easily translated as XPath expressions. For example, 1/2/1 is translated
into *[1]/*[2]/*[1], and 1/3/@id into *[1]/*[3]/@id .
Each delta belonging to a snapshot must comply with a structural constraint that
ensures orthogonality (thus making the snapshot indeed commutative). In particular,
it is enough to verify that two paths in a snapshot do not designate sibling trees,
and one path does not designate a sibling tree of the parent node designated by the
other path. We assume that this constraint is part of the well-formedness condition
assured for all definitions .
The semantics of delta operations expresses changes that occur on the operand (an
XML tree) ; we note this transformation of d into
d’ by applying Δ as follows:
More precisely, the previous notation is a logical assertion saying that a
well-formed document d is changed into a
well-formed document d’ after application of the
well-formed Δ operation.
Formally, for all subtree A, path p, documents d and
d’, deltas Δi and δi, the transformations verify the following
abstract properties:
Note that the definition of insert operator makes
use of the function get, which extracts the subtree
rooted at a given location p, as well as a path
addition function ⊕ and a so called fingerprint extraction ζ . The invar operator
deals with locality of the transformation (see Invariance Property Fig Figure 11 and associated explanations). Moreover, ≈ (the
equivalence of trees) is defined extensionally using quantification of paths, and is
insensitive to attribute order, according to path definition.
The path addition is defined over pure paths (paths designating element nodes
only, noted pp), can deal with operands of various
depth and is commutative:
Also fingerprints capture some structural information, and more precisely, the
depth level of a path:
Thus ζ (1/2/3) = 0/0/1 and 1/2/3 ⊕ ζ(1/2/3) = 1/2/4 and 1/2/3/2 ⊕ ζ(1/2/3) =
1/2/4/2
The property (a-snap) holds for all permutations
over indexes. It simply means that the set of deltas must be commutative with
respect to their sequential application. In other words, all basic deltas of a
snapshot are pairwise orthogonal.
The property (a-del) involves a minus operator
over paths. This one is easily defined as inverting all integers found in the path.
The invar property used in (a-ins), (a-ins-@) and
(a-del) expresses that other parts of the tree are not modified by the operation.
This is defined as follows:
The relation ≪ is a strict total order over (pure) paths, which is sound with
respect to the standard total order over document nodes as defined by the XML
document model.
The intuitive notion of path prefix relation captures the idea of tree embedding
We now define the notion of path orthogonality as a binary boolean operator that
captures the notion that a path is not a subpath of another (or equivalently, a
subtree is not included into another subtree)
Extension of basic vocabulary
In order to ease understanding and increase the generality of descriptions, we
propose now to extend the vocabulary through definitions based on the
fundamental operations described above: move,
copy and replace
.
Note that move and copy operations are only defined for orthogonal paths.
Inversion of basic operations and snapshots
Inverting a delta (i.e. computing the changes that will exactly bring the
operand in the previous state) requires knowing the original operand on which
changes will be applied.
The inversion function is inductively defined as follows:
Delta inversion is characterized by the following soundness property:
Proof : See appendix Appendix A
The inversion of deltas is an important functionality which allows navigating
in the history graph. Moreover, it provides a more compact representation of
changes, especially when successive versions represent documents whose content
tends to increase incrementally (corresponds to the construction phase inside a
standard document life-cycle).
Indeed, in such cases, subgraphs of the form
| vi |
→insert(p,A) |
vj |
→insert(p',B) |
[vk]
|
can be rewritten using delta inversion as:
| vi |
←delete(p) |
vj |
←delete(p') |
[vk]
|
This transformation is beneficial because the subtrees A and B are redundantly
stored: once inside the history and once inside the encapsulated instance
itself. In case the focus is set to a non-terminal versioning point (e.g.
vj ), we also may have
| vi |
←delete(p) |
[vj]
|
→insert(p',B) |
vk |
which is still quite meaningful as the subtree B is only stored once inside
the delta transition (indeed, the encapsulated document, consistent with focused
version vj, does not comprise the B subtree (see examples
3 et 4 below to grasp more concretely the point).
Note that inverting the operands of a diff operation should also lead to
reversed deltas :
A weaker version of this property (see Figure 19 below)
requires defining an equivalence relation over deltas. This relation is based on
the effect of delta application rather than its syntactic structure.
In the general case, it is much more interesting (from the performance point
of view) to compute inversion on deltas rather than to perform a reversed
diff.
Fundamental Operations over Encapsulated Documents
Creation of a history from an initial XML document
This operation has the following abstract signature:
This reflects that the document is encapsulated, and that an initial versioning
point v0 is created inside
the history. The link that relates the embedded document with the consistent
versioning point is inserted in the x-body
subtree.
Versioning
The operation requires two operands: the encapsulated document and another variant
of the document which is to be considered as the novel versioning point. What is
returned is a novel encapsulated document including a new versioning point, a
(consistent) link inside the body part, and a transition from the previous
versioning point to the new one. This transition expresses the delta operations
abstracted from the diff engine outputs.
Extraction
Unfolds the embedded document. This is useful to work on the target document,
update or change it.
Focusing
This operation allows modifying the embedded document d in order to be compliant to a given version stored in the history.
This requires first building the path that connects the current versioning point
vi to the novel
vj and second, applying
all deltas to obtain the new embedded document d'
Note that in the connection path, some vertices could have the reversed form
vn ←Δn
vm requiring the computation of inverse delta (indeed,
the connection graph is a DAG admitting branching). Note also that we assume that
the algorithm is able to choose a path when several possibilities are present in the
graph (it may decide which is the optimal path with respect to performance criteria
based on standard algorithm based on simple metrics).
Merging branches
This operation requires two operands: the embedded document and another existing
versioning point. An algorithm creates a novel versioning point and two transitions
that relate original versioning points to the new one. The idea is to perform a
merge with maximal preservation (no deletion operation in the respective deltas).
However, conflicts may arise, and can be signaled thanks to a dedicated annotation
inside the target document (this annotation is based on foreign namespace which
cannot conflict with namespaces used by target document)
In the above formula, Δa represents the concatenation of
(possibly reversed) deltas that allows to reach the new (merged) version starting
from version vi, and similarly for Δa
and vj .
Encoding Graphs of Deltas in XML
Each versioning point is captured using a dedicated element (e.g. version) associated with an id attribute that uniquely
identifies it. Our convention uses a naming scheme of the form v0…
v3 for instance. Names for diverging branches use a dot, e.g. v1.1, v1.2.
Each version element contains a sequence of delta
elements which capture the transition from this versioning point to another versioning
point. This information is conveyed by an attribute (e.g. fwd for forward links and bwd for backward
links). Moreover, a delta contains a non order-significant sequence of delta operations,
which captures the so-called snapshot. Several delta
elements are interpreted as sequences of snapshots (corresponds to the
Δ1 ; Δ2 syntactic form, where
Δi= {…}).
Each delta operation is described using a dedicated element name according to its
semantics (insert, insert-attr …). The path information
may be concisely encoded, e.g. through an “ipath” attribute. Copies of subtrees may be
expressed through a “copy” attribute attached to the delta element. When such an
attribute is not defined, an extensive definition of the subtree is expected as the
content of the operation.
The example 1 below is a target document we want to track
After applying the create-history operation, the document of figure Figure 27 below is created.
In the following example, we assume three distinct versioning points have been
created.
The following example illustrate how the same information can be encoded differently,
featuring different properties.
We propose a particular optimisation that could be applied in linear mode: redundant
trees are eliminated tanks to an explicit copy instruction using an XPath expression to
designate the subtree (thus the XPath selection refers to the focused version).
<delta fwd="v1">
<insert ipath=”1/2/2”>
<p class="verse" number="2">
Saying, Where is he that is born King of the Jews? For we have seen his star
in the east, and are come to worship him.
</p>
</insert >
</delta>
In the case of figure
Figure 29, the node "insert", child of the forward
delta to v1, is suppressed from the history and its parent is changed into the
expression below:
<delta fwd="v1">
<insert ipath=”1/2/2” copy=”/html[1]/body[1]/p[2]”/>
</delta>
This optimization applies only to versions belonging to past
history, but makes a lot of sense when the evolution of document
instances is best captured through using linear mode and comprises many incremental
additions.
State of the Art
XML diff operations are basically used to realize two fundamental operations on tree
structured documents: comparison and merging. Therefore tree based diff algorithms are
central to solve various important problems related to XML document management and
editing processes (Chawathe96), among which:
-
checking modifications with respect to a reference document, typically, the
last version stored inside a repository or a file system (version management
Chien00)
-
synchronizing variants, that is, detection of changes that occurred
concurrently on two copies of a reference document, detecting potential
conflicts (LaFontaine02
-
merging variants, which is about fusing two variants into a unique document
integrating modifications of both instances (this may imply solving potential
conflicts when the underlying document management system applies a weak control
policy)
-
recovering anterior state(s) of a document
An abundant literature exists on this topic, including recent and excellent synthetic
work on an old topic, linear differencing (Khanna07). Most work covers
:
-
Algorithmic complexity of the differencing operation,
either for ordered or unordered tree models.
The problem was first approached as a particular case of string oriented diff
computation (see Khanna07 for a point on most recent
algorithms in this area), implying computing an adequate linearization of the
XML trees. Zhang and Shasha described a fast algorithm in 1989 for computing
tree edit distance Zha89 improving previous work conducted in
1979. Still the runtime and space complexity was higher than quadratic on
sequential architectures, but could drop to quadratic levels on parallel
architectures. In Wang03 an algorithm for unordered tree was
presented with a polynomial complexity for optimal differences, whereas previous
works established a NP-complete complexity for general cases. A survey (still of
interest 14 years after) and a comparative study on the topic can be found in
Bille95 and Cobe02. Most recent
algorithms for ordered diff computation are described in Lind06, Rönnau08, Rönnau09.
-
Optimality of editing scripts (also called
deltas)
The very notion of optimality is somehow controversial, because bound to
execution models one could consider either as over simplistic or too specific.
Of course execution models are needed to justify underlying cost functions
associated with deltas application. However, such an approach intrinsically
restricts the application field or the area of proposed results. In addition,
usages and document types deeply impact the structure of delta and thus the
runtime performance of algorithms. As a consequence, diff algorithms should be
chosen and parameterized on the basis of document type and related
activities.
-
Delta models and use of diff operations inside storage
architectures (typically databases).
In Martinez02, a versioning graph is proposed based on XLink
designation mechanism. But in this proposal, the graph relates two separate
documents that may exist in independent storage infrastructure. Hence there is
no notion of encapsulation and explicit consistency. In Arevalo09, an application is proposed that offers online
versioning services for XML documents. However, there is no notion of history
encapsulation, and the history itself cannot be exported or organized into an
explicit form (see Arevalo09b for an online demo).
A recent paper Rönnau08 proposes reversible deltas using a
straightforward mechanism as deletion operation conveys explicitly the deleted subtree
(this could quickly lead to enormous overhead, especially in the perspective of storing
the whole history). The authors focus on algorithmic and speed performance issues of a
novel diff algorithm using a bottom-up approach (and also precomputed hash codes on a
sliding window of adjacent nodes.)
Most work conducted on this area focused on related algorithm performance, time and
space complexity of the diff operation and also optimality of generated deltas, but
little attention has been paid to tools and models to make use of these in efficient
ways. From the practical point of view, no work considered the necessity and applicative
interest of abstracting over deltas and related operation models.
Recently, the XQuery Update working group from W3C (see XQueryUp)
published a candidate recommendation proposing to enrich XQuery with mechanisms intended
to modify XML documents. This impressive work largely focused on expressive power and
consistent integration of the new constructs to the existing language. The patching
operations rely on "insert", "delete", "rename" and "replace" , using XPath to designate
the locations in the XML tree ; an innovative "transform" operator allows for combining
copy with on-the-fly modifications. One major difficulty was to deal with the extra
power raised by the selection semantic model of XPath, i.e sequences of nodes. As a
consequence, many runtime error may occur depending on the evaluation of selectors and
the semantics of the delta operation. Additionally the insert operator of the language
is particularly powerful since it can make explicit positional constraints ("first/last
position", "before/after A") whereas when implicit, positional constraints are
interpreted so that global ordering constraint are optimally satisfied (this remain a
bit unclear how far this can be hard).
The transformation model of XQUpdate relies on a kind of transactional framework,
where changes are stored in a list of "pending modifications" only applied after calling
a dedicated primitive upd:applyUpdates. This is a pragmatic solution to the problem of
path relocation (and variable scope validity!), but is a nightmare regarding static
analysis. Therefore it will be much more realistic to use the XQUpdate interpreter as a
patch engine (i.e generating XQUpdate code from our descriptions), than considering an
XQUpdate program as a description of differences.
Conclusion
The main contribution of this work is to propose a universal XML data structure and
related transformations which allow abstracting from underlying storage systems and from
any execution model associated with the computation of XML versioning information.
This is in favor of long term preservation of XML documents, infrastructure and vendor
independence, and open the way to interoperable processing of XML versioning
information.
The method targets generic XML documents, and makes use of a particular namespace to
embed any XML document (whatever vocabulary it may use) without any change in its
content and tag/attribute set. The history is encoded using a specific vocabulary that
captures change operations in a formal and universal way, so that any XML diff enabled
processor can generate suitable deltas. Based on the properties of this change
description language, we propose a set of high level operators which allows exploiting
the historical information in order to navigate inside the history, compute new version
in the current branch, create new branches or suppress intermediate versioning points.
Various information may be attached to the versioning points, such as user meta
information, universal date/time of the creation of the versioning point, additional
data required to optimize or ameliorate the computation (hash code, digest number,...),
which are not central to this contribution but could substantially enhance any of its
application.
The problem with existing diff related approaches is that the description model and
the transformation model are identical. In other words, they use the same objects both
to describe operationally and declaratively the differences between trees. We outline in
our proposal a calculus that formally allows making a distinction between modification
descriptors and modification operations. This calculus allows modeling "standard"
processing of deltas and allows transforming descriptions in order to gain compactness
or efficiency. We are not aware of any equivalent proposal.
The two transformation structures we described can lead to two broad classes of delta
interpretations:
-
Each modification step transforms the tree, so that the paths used inside
subsequent deltas should be defined on the modified tree. This model is used
inside database systems and DOM-like in memory tree processors (especially XUpdate, XQueryUp).
-
All modification steps are defined on the same reference tree, and all
modifications are realized in one global step. Thus the paths always refer to
the same tree operand and consequently, delta sequences are more easily
understood by a human reader, because they do not have to maintain the cognitive
load of mentally propagating tree modifications to paths. From a computational
point of view, this model is well adapted to stream based processing pipes in
which each operation can be performed concurrently (e.g. on a multi core
processor). As this approach doesn't need building a tree in memory, it is
particularly indicated for large and very large document processing.
The delta model we propose as foundational for history encoding integrates those two
transformation models, in the sense that both can be expressed explicitly. The first one
is captured by sequences of singleton snapshots whereas the second may involve snapshots
including several deltas.
We identified 4 key issues when addressing the challenge of encapsulating document
history into a stand alone XML instance: mastering the potentially big data volume,
usability of the history model (human vs machine), flexibility of the history
representation and abstracting over diff operations (i.e. not being dependent of a
particular diff algorithm/engine behavior). The method described in this paper addresses
these four issues at various levels. We believe our approach therefore constitutes a
significant contribution having a realistic applicability level, though the data volume
issue would still require large scale experiments.
Acknowledgment
This work is supported by the European project SHAMAN (FP7- ICT-216736,
http://shaman-ip.eu/shaman/ ). The author also would like to thanks
Jean-Pierre Chanod for his continuous support and helpful comments
Appendix A. Proof of proposition Figure 17
Global structure:
We do a structural induction over Δ. Each sub-case is associated with a corresponding
well-formedness hypothesis; we use these together with the definitions of inversion. We
show here the case for the Δ1;Δ2
composition:
-
Goal :
| d › Δ1;Δ2 › d’
|
⇒ |
d’ › ∘(
Δ1;Δ2 ) › d
|
We introduce the left hand as a new hypothesis H and we get the new goal
By applying the definition of inversion (thanks to H), we get
The we apply the property (a-seq)
| ∃ dd , d’ › ∘Δ2
› dd ⋀ dd › ∘Δ1
› d
|
We apply again the (a-seq) property to hypothesis H, and get a
particularisation for intermediate tree d''. Thus we choose d'' as an evidence
for the existence of dd, and get
| d’ › ∘Δ2 › d''
⋀ d'' › ∘Δ1 ›
d
|
Now we split the goal in two subgoals and apply the induction hypothesis to
both.
Appendix B. An RelaxNG schema (compact syntax ) capturing our model
default namespace xversion = "XRCE::DS::X-Version"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
start = x-version
anyElement = element * - xversion:* { anyAttribute*, anyElement*} | text
anyAttribute = attribute * { text}
x-version = element x-version {
element x-version-body {
attribute ref {V-REF},
attribute xml:space {"preserve" | "ignore"}?,
SUBTREE
}+,
element x-version-history {
attribute first {list {V-REF+}},
attribute last {list {V-REF+}},
# the version history encodes a DAG (Directed Acyclic Graph)
version*
}
}
version = element version {
attribute id { V-REF},
# allows zero or more
attribute conflicts {xsd:nonNegativeInteger}?,
a-time?,
delta*
}
a-target=
# this attribute defines a "one-step" next version
attribute to {V-REF}
delta = element delta { a-target, (rename | insert | append | move | copy | delete | replace)* }
a-here=
# the "here" attribute designates the insertion point (existing stuff is to be shift to right after insertion)
attribute here {IPath}
insert=
# "insert" only applies to elements, text, comment and PI
element insert {
a-here,
# the subtree to be inserted is here (but skipped thanks to a NVDL rule)
SUBTREE
}
append=
# "append" applies to everything including attributes
# for elements, text, comments, PI, the item is appended at the end of the sequence of children
element append {
a-here,
# the "attribute" attribute is used to specify attribute insertion; it must have the "QName=Value" syntactic structure
((attribute attribute {A-DEF}?, empty) | SUBTREE)
}
copy= element copy {
a-what, a-here,
attribute append {flag}?
}
move= element move {
a-what, a-here,
attribute append {flag}?
}
delete= element delete { a-what, SUBTREE }
replace= element replace { a-what, SUBTREE }
rename=
# works for elements, attributes, and PI as well
# the "as" attribute is a qualified name (that is, may refer to a given namespace through a declared prefix
element rename { a-what, attribute as {token}}
SUBTREE=anyElement | conflict
conflict=element conflict { item+ }
item = element item { attribute ref {V-REF}, anyElement }
attribute-conflict =element attribute-conflict { attribute ref {V-REF}, A-DEF}
tree-conflict = element tree-conflict { attribute ref {V-REF}, A-DEF}
V-REF= xsd:string { pattern="v\d+(\.\d+)*" }
A-DEF=string
flag = "yes" | "no"
a-what = attribute what {XPath}
a-time = attribute time {xsd:dateTime}
XPath = xsd:string
IPath = xsd:string { pattern="1(/\d+)*(/@([\c-[:]]+:)?[\c-[:]]+)?" }
Appendix C. an ISO Schematron schema Schematron
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" queryBinding="xslt2">
<sch:ns uri="XRCE::DS::X-Version" prefix="xv"/>
<sch:pattern >
<sch:let name="versions" value="/*[1]/*[2]/xv:version"/>
<sch:let name="versions-id" value="for $i in $versions/@id return normalize-space($i)"/>
<sch:let name="Nversions-id" value="distinct-values($versions-id)"/>
<sch:let name="history" value="/*[1]/xv:x-version-history[1]"/>
<sch:let name="starting-vector" value="tokenize(normalize-space($history/@first),'\s')"/>
<sch:let name="ending-vector" value="tokenize(normalize-space($history/@last),'\s')"/>
<sch:let name="Nstarting-vector" value="distinct-values($starting-vector)"/>
<sch:let name="Nending-vector" value="distinct-values($ending-vector)"/>
<sch:rule context="/*[1]/xv:x-version-history[1]">
<sch:assert test="count($versions-id) ge count($Nversions-id)">Every version element must have an unique id attribute</sch:assert>
<sch:assert test="count($starting-vector) eq count($Nstarting-vector)">"first" attribute should refer to each relevant version only once</sch:assert>
<sch:assert test="count($ending-vector) eq count($Nending-vector)">"last" attribute should refer to each relevant version only once</sch:assert>
<sch:assert test="every $i in $Nstarting-vector satisfies index-of($Nversions-id,$i) gt 0 ">
the "first" attribute should point to existing version(s) (cf
"<sch:value-of select="string-join($Nstarting-vector,'/')"/>" versus "<sch:value-of select="string-join($Nversions-id,'/')"/>"
)
</sch:assert>
<sch:assert test="every $i in $Nending-vector satisfies index-of($Nversions-id,$i) gt 0 ">
the "last" attribute should point to existing version(s) (cf
"<sch:value-of select="string-join($Nending-vector,'/')"/>" versus "<sch:value-of select="string-join($Nversions-id,'/')"/>"
)
</sch:assert>
</sch:rule>
<sch:rule context="xv:version">
<sch:report test="count(index-of($versions-id,normalize-space(@id))) gt 1">
The "id" attribute must be unique ("<sch:value-of select="@id"/>")
</sch:report>
<sch:let name="my-id" value="@id"/>
<sch:assert test="(count(../xv:version/xv:delta[@to=$my-id]) gt 0) or (index-of($Nstarting-vector,$my-id) gt 0)">
missing back link to anterior version (version "<sch:value-of select="@id"/>")
</sch:assert>
<sch:assert test="(count(xv:delta[@to]) gt 0) or (index-of($Nending-vector,$my-id) gt 0)">
missing link to posterior version (version "<sch:value-of select="@id"/>")
</sch:assert>
</sch:rule>
<sch:rule context="xv:delta">
<sch:assert test="index-of($Nversions-id,@to) gt 0">
The link to version "<sch:value-of select="@to"/>" is dangling (no corresponding version found in the whole history)
</sch:assert>
<sch:let name="my-dest" value="@to"/>
<sch:report test="count(preceding-sibling::xv:delta[@to=$my-dest]) gt 0" >
Delta must be unique for one target version ("<sch:value-of select="@to"/>")
</sch:report>
</sch:rule>
<sch:rule context="xv:conflict">
<sch:let name="all-item-refs" value="for $i in xv:item/@ref return normalize-space($i)"/>
<sch:assert test="count(distinct-values($all-item-refs)) eq count($all-item-refs)">
Conflicting items must be uniquely defined for a given version
</sch:assert>
</sch:rule>
<sch:rule context="xv:item">
<sch:assert test="index-of($Nversions-id,@ref) gt 0">
The reference to version "<sch:value-of select="@ref"/>" is dangling (no corresponding version found in the whole history)
</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
References
[Zha89]
Simple Fast Algorithms for the Editing Distance between Trees and
Related Problems, Kaizhong Zhang; Dennis Shasha, SIAM J. Comput., 1989
[Bille95]
A survey on tree edit distance and related problems,
Philip Bille, Theoretical Computer Science , Volume 337 Issue 1-3 June 2005.
[Chawathe96]
Change detection in hierarchically structured
information, Sudarshan S. Chawathe; Anand Rajaraman;Hector Garcia-Molina ;
Jennifer Widom, SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference
on Management of data, June 1996
[Chien00]
A Comparative Study of Version Management Schemes for XML
Documents, Shu-Yao Chien ; Vassilis J. Tsotras ; Carlo Zaniolo, 2000
[Cobe02]
A comparative study for XML change detection, Grégory
Cobéna; Talel Abdessalem; Yassine Hinnach, Research Report, INRIA Rocquencourt, France,
2002 PDF
[LaFontaine02] Merging XML files: a new approach
providing intelligent merge of XML data sets, Robin La Fontaine, 2002
[Martinez02]
A method for the dynamic generation of virtual versions of
evolving documents, Mercedes Martinez, Jean-Claude Derniame, Pablo de la
Fuente, SAC 2002, Madrid, Spain, 2002
[pédauque03]
Document: Form, Sign and Medium, as Reformulated for Electronic
Documents, Roger T. Pédauque, collective writing, STIC-CNRS, 12 September
2003. PDF
[Wang03]
X-Diff: A Fast Change Detection Algorithm for
XMLDocuments. Yuan Wang , David J. DeWitt , Jin-Yi Cai in Proceedings of the
International Conference on Data Engineering (ICDE'03) PDF
[pédauque05]
Le texte en jeu (Permanence et transformations du
document), Roger T. Pédauque, collective writing, STIC-CNRS, 7 April 2005.
PDF
[Lind06]
Fast and Simple XML Tree Differencing by Sequence
Alignment, Tancred Lindholm; Jaakko Kangasharju; Sasu Tarkoma, DocEng '06:
Proceedings of the 2006 ACM symposium on Document engineering October 2006
[Khanna07]
A Formal Investigation of diff3, Sanjeev Khanna; Keshav
Kunal ; Benjamin C. Pierce, In Arvind and Prasad, editors, Foundations of Software
Technology and Theoretical Computer Science (FSTTCS), December 2007 PDF
[Tata07] Tree automata techniques and
applications, Hubert Comon; Max Dauchet; Florent Jacquemard; Denis Lugiez;
Sophie Tison; Marc Tommasi , 2007 PDF
[Rönnau08] Merging Changes in XML Documents Using
Reliable Context Fingerprints, Sebastian Rönnau; Christian Pauli; Uwe M.
Borghoff, ACM Symposium on Document Engineering, September 2008
[Rönnau09] Efficient Change Control of XML
Documents, Sebastian Rönnau; Christian Pauli; Uwe M. Borghoff, ACM
Symposium on Document Engineering, September 2009
[Arevalo09] A Web-Based Version Editor for XML
Documents, Luis Arévalo Rosado, Antonio Polo Márquez and Miryam Salas
Sánchez, ACM Symposium on Document Engineering, September 2009.
[Arevalo09b] A Web-Based Version Editor for XML
Documents, online demonstration version :
http://picaro.unex.es:8180/vEditor/
[XQueryUp]
XQuery Update Facility 1.0, W3C, Specification, June
2009 http://www.w3.org/TR/xquery-update-10/
[XUpdate]
http://xmldb-org.sourceforge.net/xupdate/xupdate-wd.html ; http://en.wikipedia.org/wiki/XUpdate
[Saxonica]
Saxonica, XSLT and XQuery processing Michael Kay, http://www.saxonica.com/
[Schematron]
ISO Schematron, a language for making assertions about patterns
found in XML documents, Topologi , web
site