Main results
Before we get to our main results, we need to establish preliminary
results.
Lemma 1: Let G be a noDAG to which
some set of ranges corresponds. Then, no node in G is both directly and
indirectly dominated by some other node.
Proof. Immediate by the
definitions of correspondence and of direct containment.
QED (Lemma 1)
Corollary 1 to Lemma 1: Let G be a
noDAG to which some set of ranges S corresponds. Then, for all b, c ∈
NG, if (b <G c), then b
⇏G c.
Proof. By the definition of
<G, if (b <G c), then
either b and c are both roots, or they are siblings. If c is a root, then,
clearly b ⇏G c. If b and c are siblings, then (b
⇒G c) would contradict the lemma.
QED (Corollary 1 to Lemma 1)
Corollary 2 to Lemma 1: Let G be a
noDAG to which corresponds some set of ranges with distinct boundaries S
through g. Then, for all b, c ∈ NG, if (b
<G c), then end(b') < end(c').
Proof. By Corollary 1, b
⇏G c. By the definitions of a noDAG and of
correspondence, we know that start(b') < start(c'). But end(c') < end(b')
would imply b ⇒G c. So, we must conclude that
end(b') ≤ end(c'). Since S has distinct boundaries, we conclude that
end(b') < end(c').
QED (Corollary 2 to Lemma 1)
Lemma 2: Let G be a noDAG to which
corresponds some set of ranges with distinct boundaries S through g. Then, for
all r, s ∈ NG, the following two conditions
hold:
1- if (r, s) ∈ SB(G), then start(r') < start(s')
2- if (r, s) ∈ EA(G), then end(r') > end(s')
Proof. We will prove the lemma
only for SB(G); the proof for EA(G) is entirely similar (but makes use of
Corollary 2 to Lemma 1).
By the definition of starts-before-completion, we know that:
SB(G) = transitive-closure(AG ∪
<G ∪ CSB(G)).
We will prove that (1) above holds for each arc in
(AG ∪ <G ∪
CSB(G)). Then, by transitivity of < on integers, the lemma will
follow.
For all arcs (r, s) ∈ (AG ∪
<G), then clearly, by the appropriate definitions,
start(r') < start(s'). There remains to treat the case where (r, s) ∈
CSB(G). We prove this by contradiction.
Suppose that (r, s) ∈ CSB(G) and start(s') ≤
start(r'). By the definition of CSB(G), there must exist d such that
(d ⇒ r) & (d <G s) & s ⇏
r.
From (d <G s), by Corollary 1 to Lemma 1,
we conclude that d ⇏ s.
Since, by hypothesis, s start(s') ≤ start(r'), and because s
⇏ r, we conclude that end(s') ≤ end(r'). Because d ⇒ r, we
know that end(r') ≤ end(d'). Thus, end(s') ≤ end(d'). But d
<G s implies that start(d') < start(s'). Thus, we
conclude that d ⇒ s, a contradiction.
We must thus reject the hypothesis that start(s') ≤
start(r'), and conclude that, for all arcs (r, s) ∈ CSB(G), start(r')
< start(s').
QED (Lemma 2)
Lemma 3: Let G be a finite noDAG.
Then, for all distinct b, c ∈ NG, both of the
following hold:
a) At least one of the pairs (b, c) and (c, b) is in SB(G).
b) At least one of the pairs (b, c) and (c, b) is in EA(G).
Proof. Here again, we will prove
the lemma only for SB(G), the proof for EA(G) being entirely similar.
Let G be a noDAG, and S = SB(G). The proof is simpler if G has only
one root, and it generalizes easily (though somewhat tediously) to the cases
where G has more than one root. So, we will treat only the case where G has a
unique root.
Let distinct b, c ∈ NG. If either b
⇒G c or c ⇒G b,
then, by definition, one of (b, c) and (c, b) ∈ S. There remains to
treat the case where b and c are
(⇒G)-incomparable. Suppose they are, and let m be
a (⇒G)-minimal-upper-bound (mub) of b and
c.
Suppose, without loss of generality, that neither b nor c is a
child of m (if either is, similar—and actually simpler—arguments
lead to the same conclusions). Then, by the definition of mub, and because b
and c are (⇒G)-incomparable, there must exist e,
f ∈ NG such that (m → e ⇒ b) &
(m → f ⇒ c) & (f ⇏ b) & (e ⇏ c). Since e
and f are siblings, they are <G-comparable. If, on
the one hand, e <G f, then, we have:
(e ⇒ b) & (e <G f) & (f
⇏ b)
and, thus, by construction of S, (b, f) ∈ S, and, by
transitivity, (b, c) ∈ S. If, on the other hand, f
<G e, then, we have:
(f ⇒ c) & (f <G e) & (e
⇏ c)
and, thus, by construction of S, (c, e) ∈ S, and, by
transitivity, (c, b) ∈ S.
QED (Lemma 3)
Corollary to Lemma 3: Both
completions of a finite completion-acyclic noDAG G are strict total orders on
NG.
Proof. By the definition, both
completions of a completion-acyclic noDAG are cycle-free. Thus, it is immediate
that they are antireflexive and antisymmetric. Since they are obtained by
transitive closure, they are transitive. It follows from the lemma that they
are total.
QED (Corollary to Lemma 3)
Lemma 4: For all
completion-acyclic noDAG G, SB(G) ∩ EA(G) =
(⇒G).
Proof. Let G be a
completion-acyclic noDAG. By the definitions of the respective completions, it
suffices to show that no pair (b, c) can be both in CSB(G) and CEA(G).
We prove this by contradiction: suppose there exists a pair (b, c)
that is a member of both sets. Since (b, c) is in CSB(G), then it is also in
SB(G), by definition. Because it is in CEA(G), we know that c ⇏ b, and
that there exists d such that c <G d and d
⇒G b. From this and the definition of SB(G), we
conclude that the pair (c, b) is in SB(G). So, both (b, c) and (c, b) are in
SB(G). Hence, SB(G) contains a cycle, which contradicts the hypothesis that G
is completion-acyclic.
QED (Lemma 4)
Lemma 5: Let G be a finite
completion-acyclic noDAG. For all (b, c) ∈ NG, if
(b, c) ∈ SB(G) but b ⇏ c, then any node dominated by b and not by
c precedes, in SB(G)-order, c and any node dominated by c. Conversely, if (b,
c) ∈ EA(G) but b ⇏ c, then any node dominated by b and not by c
precedes, in EA(G)-order, c and any node dominated by c.
Proof. Let G be a finite
completion-acyclic noDAG. First observe that, by Corollary to Lemma 3 and
Lemma 4, for all distinct, ⇒-incomparable b, c ∈
NG, if (b, c) ∈ SB(G), then (c, b) ∈
EA(G), and vice-versa.
Again, we prove the lemma only for SB, the proof for EA being
entirely similar. Let b and c be as in the statement of the lemma, and suppose
d and e are such that b ⇒ d, c ⇏ d, c ⇒ e. Note that d
⇏ c. Suppose, for the sake of contradiction, that (c, d) ∈ SB(G).
Then, by the preceding observation, we have (d, c) ∈ EA(G), (c, b)
∈ EA(G), and, because b ⇒ d, (b, d) ∈ EA(G). Thus, there
is a cycle in EA(G), contradicting the fact that G is completion-acyclic. So,
by Corollary to Lemma 3, we must conclude that (d, c) ∈ SB(G), and thus,
by transitivity, that (d, e) ∈ SB(G).
QED (Lemma 5)
Theorem 1: Let (G,
RG) be a noDAG. Then, there exists a well-formed
overlap-only TexMECS document corresponding to G iff all of the following
hold:
1- NG is finite and non-empty.
2- No node of G is both directly and indirectly reachable from
some other node.
3- G is completion-acyclic.
4- RG is a subset of SB(G).
5- No two leaves are consecutive in both SB(G)-order and reverse
EA(G)-order.
6- Neither the first nor the last root of G (in
<G order) is a leaf.
Note that point (4) asserts that the ordering among nodes specified
by RG − (<G) can only
confirm what is already implicit in <G. Points (5)
and (6) may seem mysterious at first, but simply translate idiosyncratic
morphological contingencies of well-formed overlap-only TexMECS. An alternate
formulation of point (5) is that no two consecutive siblings are leaves with
the same set of parents.
Proof.
(⇒) Let (G, RG) be a noDAG to which
corresponds a well-formed overlap-only TexMECS document D. We show in
succession that G satisfies conditions (1) to (6) in the statement of the
theorem.
(1): Follows immediately from the various definitions.
(2): Follows immediately from Lemma 1.
(3): Follows from Lemma 2 (which, by Observation 1, we know is
applicable) because, if either SB(G) or EA(G) had a cycle, it would imply that,
for some r ∈ ranges(D), start(r) < start(r) or end(r) <
end(r).
(4): By (1) and (3), G is finite and completion-acyclic. Thus,
Corollary to Lemma 3 applies, and we conclude that SB(G) is a strict total
order on NG. Since SB(G) is total, any pair (r, s) that
is not in it can only contradict it. Suppose,
then, there exists a pair (r, s) ∈ RG −
SB(G). By the preceding argument, we conclude that (s, r) ∈ SB(G). So,
by Lemma 2, start(s') < start(r'). However, by the definition of
correspondence, (r, s) ∈ RG implies that
start(r') < start(s'), a contradiction.
(5): Proof sketch. Suppose, for
the sake of contradiction, that there exist two leaves v and w in G consecutive
in both SB(G)-order and reverse EA(G)-order. Suppose, without loss of
generality, that v and w are not roots. Then, it is not hard to see that v' and
w' must be both directly contained (wrt ranges(D)) in some other range, without
any range starting and/or ending between them. But, by construction of
ranges(D), this is impossible.
(6): We treat only the case of the first root; the proof for the
last root is entirely similar. It is easy to show that, in a completion-acyclic
noDAG, the first node in SB-order is always the first root. So, if r is the
first root of G, in <G order, then r is the first
node in SB(G)-order. By Lemma 2 and the construction of ranges(D), then,
start(r') = 1. Because D is well-formed, it must start with a tag; thus, r is
not a leaf.
(⇐) Proof sketch. Let (G,
RG) be a noDAG satisfying conditions (1) to (6) in the
statement of the theorem. We construct a well-formed overlap-only TexMECS
document D and show it corresponds to G.
We sketch informally the construction of D.
We start by assigning an arbitrary, distinct, generic identifier to
each internal node of G. Then, on each side (left and right) of the node, we
stick TexMECS tags derived from the generic identifier: a start-tag on the
left, and an end-tag on the right; for example “<QC|” on the
left and “|QC>” on the right.
Then, we let those tags slide down from their node, following the
arrow linking the node to its first child (for a start-tag) and to its last
child (for an end-tag). When a tag arrives to a new internal node, it continues
sliding down, using the same arrow position (first or last) as initially, all
the way down, until it ends up on a leaf. When it does, it goes to the left of
the leaf (if it is a start-tag), or to its right (if it is an end-tag), and
stays there.
After both tags of all internal nodes have slid down to a leaf, we
order the start-tags on the left of each leaf in SB(G) order
of the node they come from. Then, we order the
end-tags on the right of each leaf in reversed
EA(G) order of the node they come from.
Then, for each leaf, we create a character string consisting of the
concatenation of all the start-tags to its left, in order, then, the character
“A”, then all the end-tags to its right, again in order.
Finally, we pick-up and concatenate the strings created for the
leaves, in SB(G) order. That is our document D.
The mapping g between ranges(D) and NG is
the one that maps to each internal node the range stretching from one to the
other of (and including) the two tags that originate from it, and to each leaf
the character “A” that it contributed to D.
To see that D is a well-formed overlap-only TexMECS document,
observe that: (1) there are equal numbers of start- and end-tags; (2) all
generic identifiers are distinct, and thus, the fact (following from Lemma 5)
that any start-tag occurs to the left of the matching end-tag suffices to
guarantee that no tag has depth 0; and (3) the document starts and ends with a
tag.
We now sketch a proof that the set ranges(D) corresponds to G
through g, i.e., that the following statements both hold:
i. (for all b, c ∈ NG) [(b
→G c) iff (b' directly contains c') wrt
ranges(D)]
ii. (for all b, c ∈ NG) [if (b
RG c), then start(b') < start(c')]
To prove (i), we will show that proper containment on ranges(D)
corresponds to reachability on G. Since, by hypothesis, G contains only direct
reachability relationships, (i) will follow.
Suppose first that b ⇒G c. Then, by
construction of D, Lemma 5, and the fact that b precedes c in SB(G)-order,
start(b') < start(c'). Likewise, end(c') < end(b'). Thus, b' properly
contains c'.
Now suppose for some r and s in ranges(D), r properly contains s.
By construction of D, and Lemma 5, it can be show that
r' ⇒G s'.
To prove (ii), we will show that for all (b, c) ∈ SB(G),
start(b') < start(c'). Since, by hypothesis, RG is a
subset of SB(G), (ii) will follow. By transitivity of < on integers, it
suffices to show that for all (b, c) ∈ (AG
∪ <G ∪ CSB(G)), start(b') <
start(c'). For each of the three sets constituent of the union, the desired
result follows from Lemma 5.
QED (Theorem 1)
Theorem 2: Every well-formed
overlap-only TexMECS document corresponds to some noDAG satisfying all the
conditions in the statement of Theorem 1.
Proof sketch. From D, any
well-formed overlap-only TexMECS document, first compute ranges(D), then the
direct containment relationships between the members of ranges(D). Construct a
noDAG (G, RG) as follows:
1. NG = ranges(D)
2. For all r, s ∈ N, r → s iff r directly contains
s
3. For all r, s ∈ N, r RG s iff
start(r) < start(s)
It is easily seen that (G, RG) corresponds
to D through the identity function. Note that (G, RG)
necessarily satisfies all the conditions in the statement of Theorem 1, since
it corresponds to a well-formed, overlap-only TexMECS document.
QED (Theorem 2)