Authors: | Pierre-Édouard Portier & Sylvie Calabretto |
---|---|
Date: | 2009-08-13 |
Multiple uses of a same document
Multiplication of documentary structures
SGML, XML: overlapping hierarchies
"The Structure Of Appearance"
The Appearance Of Structure
Aporia:
A document has no structure
=> The "String-In-A-Role" strategy
Working environment
Institut Jean-Toussaint Desanti
Philosopher
Main work: "Les Idéalités Mathématiques"
Development of the theory of real variables functions
ZI: meaningful fragment of textual content spanning two pages
S1: pages
S2: "regions of interest"
E1, E2: equations
2 or 3 structures ...
4 categories
5 dimensions
CONCUR
MuLaX: adaptation of SGML CONCUR to XML
Ad-Hoc solutions
TexMECS and LMNL
Annotation graphs
Have been developed to model linguistics phenomena.
RDF graphs: annotation graphs in a well known formalism
EARMARK: Extreme Annotations RDF Markup
GODDAG
MCT (Multi-Colored Trees)
MSXD (Multi-Structured XML Documents)
Delay Nodes
MonetDB/XQuery
MSDM: a document is a graph D composed of:
First Structure:
<s1> <page>Autrement dit la distinction signe-signifie ... Remarque, </page> <page>ce discours, ... par ex le discours 3+2=0-1 est-il un texte ? ... </page> </s1>
Second Structure:
<s2> <p>Autrement dit la distinction signe-signifie...</p> <p>Remarque, ce discours, ...</p> <p>par ex le discours <eq>3 + 2 = 0 - 1</eq> est-il un texte ? ...</p> </s2>
Base Structure:
<seg xml:id="F1">Autrement dit la distinction signe-signifie ...</seg> <seg xml:id="F2">Remarque, </seg> <seg xml:id="F3">ce discours, ...</seg> <seg xml:id="F4">par ex le discours </seg> <seg xml:id="F5">3 + 2 = 0 - 1</seg> <seg xml:id="F6"> est-il un texte ? ...</seg>
First Structure:
<s1> <page> <xi:include href="b.xml" xpointer="element(F1/1)"/> <xi:include href="b.xml" xpointer="element(F2/1)"/> </page> <page> <xi:include href="b.xml" xpointer="element(F3/1)"/> <xi:include href="b.xml" xpointer="element(F4/1)"/> <xi:include href="b.xml" xpointer="element(F5/1)"/> <xi:include href="b.xml" xpointer="element(F6/1)"/> </page> </s1>
Second Structure:
<s2> <p> <xi:include href="b.xml" xpointer="element(F1/1)"/> </p> <p> <xi:include href="b.xml" xpointer="element(F2/1)"/> <xi:include href="b.xml" xpointer="element(F3/1)"/> </p> <p> <xi:include href="b.xml" xpointer="element(F4/1)"/> <eq> <xi:include href="b.xml" xpointer="element(F5/1)"/> </eq> <xi:include href="b.xml" xpointer="element(F6/1)"/> </p> </s2>
XQuery functions:
let $physique := doc("physique.xml") let $logique := doc("logique.xml") for $page in $physique//page, $para in $logique//p where multix:share-fragments($page,$para) and not(multix:include-content-of($page,$para)) return $para
Finds region of interest overlapping two pages.
Three categories of methods:
Restructuting stage
Pages and regions of interests have been tagged until a region overlaps two pages.
Automatic restructuting
Creation of a new structure is a purely formal operation consisting in the transformation of a graph into two trees.
Integration of the user
Automatic restructuring is a good occasion for a user to make modeling choices.
Recommendation system for documents authors
Two users are close, insofar as they are editing specific documents, if the implied tags trees of their structures are close.
Users 1, 2 | User 3 |
---|---|
theorem
lemma
cocycle |
proposition
cohomology cocycle |
REST interface
ex: POST to http://desanti.org/collections/148/structures/math/taggees with content:
<taggee> <tag name="equation" /> <interval start="14" end="26" /> </taggee> </programlisting>
Javascript user interface
Multi-Structured documents: from their representation to their construction
The enforcement of tree structures, for a long time considered as the crux of the M-S.D. problem, triggers the creation of new structures that have to be validated by the user.
We propose an open methodology that can be used for the incremental and collective emergence of documentary structures.