Eric van der Vlist
Balisage 2012
So long as XDM, the XQuery and XPath Data Model, was concerned only with traditional XML documents, it was relatively tidy. Version 3.0, however, proposes new features to support such things as JSON maps and could be extended to support RDF triples. How can we support such things that do not map simply into conventional XML? Several possible approaches are examined, along with methods for validation and processing, to extend the XML ecosystem for the future.
= Seven types of nodes (root, elements, text, attributes, namespaces, processing instructions & comments).
≅ XPath 1.0 data model (relaxed constraints on root nodes, additional base URI and unparsed entities properties).
inadvertently mentions 4 basic XPath data-types (string, number, boolean, node-set) to explicitly add a 5th one: result tree fragments
= XDM 1.0 + XSLT 1.0 - node sets - RTF - W3C XML Schema 1.0 PSVI properties and data types + item kinds + sequences.
Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
<?xml version="1.0" encoding="UTF-8"?> <root> <foo>5</foo> <foo>5</foo> <bar foo="5"> <foo>5</foo> </bar> </root>
<foo>5</foo>
has its own identity5
text node has its own identity5
" in the 3 <foo/>
elements (and in the @foo
attribute) has no identity5
" doesn't know where it belongs.An important characteristic of the data model is that there is no distinction between an item (a node or an atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa. A sequence may contain nodes, atomic values, or any mixture of nodes and atomic values. When a node is added to a sequence its identity remains the same. Consequently a node may occur in more than one sequence and a sequence may contain duplicate items.
Appolonius' ship is a beautiful ship. Over the years it has been repaired so many times that there is not a single piece of the original materials remaining. The question is, therefore, is it really still Appolonius' ship?
--ObjectIdentity on c2.com
Each node has a unique identity. Every node in an instance of the data model is unique: identical to itself, and not identical to any other node. (Atomic values do not have identity; every instance of the value “5” as an integer is identical to every other instance of the value “5” as an integer.)
<foo>5</foo>
may have the same properties only because XDM doesn't consider that "document order" or "previous sibling" are nodes properties
(they have no accessors)ancestor::
and preceding-sibling::
axis!<?xml version="1.0" encoding="UTF-8"?> <root> <foo>5</foo> <foo>5</foo> <bar foo="5"> <foo>5</foo> </bar> </root>
{ "accounting" : [ { "firstName" : "John", "lastName" : "Doe", "age" : 23 }, { "firstName" : "Mary", "lastName" : "Smith", "age" : 32 } ], "sales" : [ { "firstName" : "Sally", "lastName" : "Green", "age" : 27 }, { "firstName" : "Jim", "lastName" : "Galley", "age" : 41 } ] }
--XSLT 3.0 Working Draft, Working Draft 10 July 2012
<?xml version="1.0" encoding="UTF-8"?> <company> <department name="sales"> <employee> <firstName>Sally</firstName> <lastName>Green</lastName> <age>27</age> </employee> <employee> <firstName>Jim</firstName> <lastName>Galley</lastName> <age>41</age> </employee> </department> <department name="accounting"> <employee> <firstName>John</firstName> <lastName>Doe</lastName> <age>23</age> </employee> <employee> <firstName>Mary</firstName> <lastName>Smith</lastName> <age>32</age> </employee> </department> </company>
More or less equivalent...
Data model
XPath
map:keys()
, map:contains()
& map:get()
.XSLT
map:new()
, map:entry()
, ....Conclusion
See W3C XSLT 3.0 Bug 16118, "Maps should be first class citizens"
{ "accounting" : [ { "firstName" : "John", "lastName" : "Doe", "age" : 23 }, { "firstName" : "Mary", "lastName" : "Smith", "age" : 32 } ], "sales" : [ { "firstName" : "Sally", "lastName" : "Green", "age" : 27 }, { "firstName" : "Jim", "lastName" : "Galley", "age" : 41 } ] }
--XSLT 3.0 Working Draft, Working Draft 10 July 2012
<?xml version="1.0" encoding="UTF-8"?> <χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:map> <χ:entry key="sales" keyType="string"> <χ:map> <χ:entry key="1" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Green</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">27</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Sally</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> <χ:entry key="2" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Galley</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">41</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Jim</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:entry> <χ:entry key="accounting" keyType="string"> <χ:map> <χ:entry key="1" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Doe</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">23</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">John</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> <χ:entry key="2" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Smith</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">32</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Mary</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:data-model>
Yes, it's more verbose...
Zorba's XDM serialization
XDML (Rennau, Hans-Jürgen, and David A. Lee, Balisage 2011)
map{1:= 'foo'}
=>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:map> <χ:entry key="1" keyType="number"> <χ:atomic-value type="string">foo</χ:atomic-value> </χ:entry> </χ:map> </χ:data-model>
map{1:= ('foo', 'bar')}
=>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:map> <χ:entry key="1" keyType="number"> <χ:atomic-value type="string">foo</χ:atomic-value> <χ:atomic-value type="string">bar</χ:atomic-value> </χ:entry> </χ:map> </χ:data-model>
<xsl:variable name="a-node"> <foo/> </xsl:variable> <xsl:variable name="map" select="map{'a-node':= $a-node}"/>
=>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:instance id="d4" kind="document"> <foo/> </χ:instance> <χ:map> <χ:entry key="a-node" keyType="string"> <χ:node kind="document" instance="d4" path="/"/> </χ:entry> </χ:map> </χ:data-model>
<xsl:variable name="a-node" as="node()"> <foo/> </xsl:variable> <xsl:variable name="map" select="map{'a-node':= $a-node}"/>
=>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:instance id="d4e0" kind="fragment"> <foo/> </χ:instance> <χ:map> <χ:entry key="a-node" keyType="string"> <χ:node kind="element" instance="d4e0" path="root()" name="foo"/> </χ:entry> </χ:map> </χ:data-model>
<xsl:variable name="a-node" as="node()*"> <foo/> <bar/> </xsl:variable> <xsl:variable name="map" select="map{'a-node':= $a-node}"/>
=>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:instance id="d4e0" kind="fragment"> <foo/> </χ:instance> <χ:instance id="d4e3" kind="fragment"> <bar/> </χ:instance> <χ:map> <χ:entry key="a-node" keyType="string"> <χ:node kind="element" instance="d4e0" path="root()" name="foo"/> <χ:node kind="element" instance="d4e3" path="root()" name="bar"/> </χ:entry> </χ:map> </χ:data-model>
<xsl:variable name="doc"> <department name="sales"> <employee> <firstName>Sally</firstName> <lastName>Green</lastName> <age>27</age> </employee> <employee> <firstName>Jim</firstName> <lastName>Galley</lastName> <age>41</age> </employee> </department> <department name="accounting"> <employee> <firstName>John</firstName> <lastName>Doe</lastName> <age>23</age> </employee> <employee> <firstName>Mary</firstName> <lastName>Smith</lastName> <age>32</age> </employee> </department> </xsl:variable> <xsl:variable name="map" select="map{ 'sales' := $doc/department[@name='sales'], 'Sally' := $doc//employee[firstName = 'Sally'], 'kids' := $doc//employee[age < 30], 'dep-names-attributes' := $doc/department/@name, 'dep-names' := for $name in $doc/department/@name return string($name) }"/>
<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:instance id="d4" kind="document"> <department name="sales"> <employee> <firstName>Sally</firstName> <lastName>Green</lastName> <age>27</age> </employee> <employee> <firstName>Jim</firstName> <lastName>Galley</lastName> <age>41</age> </employee> </department> <department name="accounting"> <employee> <firstName>John</firstName> <lastName>Doe</lastName> <age>23</age> </employee> <employee> <firstName>Mary</firstName> <lastName>Smith</lastName> <age>32</age> </employee> </department> </χ:instance> <χ:map> <χ:entry key="sales" keyType="string"> <χ:node kind="element" instance="d4" path="/"":department[1]" name="department"/> </χ:entry> <χ:entry key="Sally" keyType="string"> <χ:node kind="element" instance="d4" path="/"":department[1]/"":employee[1]" name="employee"/> </χ:entry> <χ:entry key="kids" keyType="string"> <χ:node kind="element" instance="d4" path="/"":department[1]/"":employee[1]" name="employee"/> <χ:node kind="element" instance="d4" path="/"":department[2]/"":employee[1]" name="employee"/> </χ:entry> <χ:entry key="dep-names-attributes" keyType="string"> <χ:node kind="attribute" instance="d4" path="/"":department[1]/@name" name="name">sales</χ:node> <χ:node kind="attribute" instance="d4" path="/"":department[2]/@name" name="name">accounting</χ:node> </χ:entry> <χ:entry key="dep-names" keyType="string"> <χ:atomic-value type="string">sales</χ:atomic-value> <χ:atomic-value type="string">accounting</χ:atomic-value> </χ:entry> </χ:map> </χ:data-model>
<xsl:variable name="attribute" as="node()"> <xsl:attribute name="foo">bar</xsl:attribute> </xsl:variable> <xsl:variable name="map" select="map{ 'attribute' := $attribute }"/>
map { xs:QName('rdf:subject') := xs:anyURI('http://www.example.org/index.html'), xs:QName('rdf:predicate') := xs:anyURI('http://purl.org/dc/elements/1.1/creator'), xs:QName('rdf:object') := xs:anyURI('http://www.example.org/staffid/85740') }
=>
<χ:data-model xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:χ="http://χίμαιραλ.com#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <χ:map> <χ:entry key="rdf:object" keyType="xs:QName"> <χ:atomic-value type="xs:anyURI">http://www.example.org/staffid/85740</χ:atomic-value> </χ:entry> <χ:entry key="rdf:predicate" keyType="xs:QName"> <χ:atomic-value type="xs:anyURI">http://purl.org/dc/elements/1.1/creator</χ:atomic-value> </χ:entry> <χ:entry key="rdf:subject" keyType="xs:QName"> <χ:atomic-value type="xs:anyURI">http://www.example.org/index.html</χ:atomic-value> </χ:entry> </χ:map> </χ:data-model>
Jonathan Robie presenting "the syntactical web" at XML 2001.
<χ:notation mediatype="application/json"><![CDATA[ { "accounting" : [ { "firstName" : "John", "lastName" : "Doe", "age" : 23 }, { "firstName" : "Mary", "lastName" : "Smith", "age" : 32 } ], "sales" : [ { "firstName" : "Sally", "lastName" : "Green", "age" : 27 }, { "firstName" : "Jim", "lastName" : "Galley", "age" : 41 } ] } ]]></χ:notation>
<
and </
in tags.<χ:data-model xmlns:χ="http://χίμαιραλ.com#"> <χ:map> <χ:entry key="sales" keyType="string"> <χ:map> <χ:entry key="1" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Green</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">27</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Sally</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> <χ:entry key="2" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Galley</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">41</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Jim</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:entry> <χ:entry key="accounting" keyType="string"> <χ:map> <χ:entry key="1" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Doe</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">23</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">John</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> <χ:entry key="2" keyType="number"> <χ:map> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Smith</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">32</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Mary</χ:atomic-value> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:entry> </χ:map> </χ:data-model>
Replace <χ:map>
by <{>
<χ:data-model> <{> <χ:entry key="sales" keyType="string"> <{> <χ:entry key="1" keyType="number"> <{> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Green</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">27</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Sally</χ:atomic-value> </χ:entry> </}> </χ:entry> <χ:entry key="2" keyType="number"> <{> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Galley</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">41</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Jim</χ:atomic-value> </χ:entry> </}> </χ:entry> </}> </χ:entry> <χ:entry key="accounting" keyType="string"> <{> <χ:entry key="1" keyType="number"> <{> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Doe</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">23</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">John</χ:atomic-value> </χ:entry> </}> </χ:entry> <χ:entry key="2" keyType="number"> <{> <χ:entry key="lastName" keyType="string"> <χ:atomic-value type="string">Smith</χ:atomic-value> </χ:entry> <χ:entry key="age" keyType="string"> <χ:atomic-value type="number">32</χ:atomic-value> </χ:entry> <χ:entry key="firstName" keyType="string"> <χ:atomic-value type="string">Mary</χ:atomic-value> </χ:entry> </}> </χ:entry> </}> </χ:entry> </}> </χ:data-model>
Replace <χ:entry>
by <@>
(map entries are kind of similar to XML attributes)
<χ:data-model> <{> <@"sales" keyType="string"> <{> <@"1" keyType="number"> <{> <@"lastName" keyType="string"> <χ:atomic-value type="string">Green</χ:atomic-value> </@"lastName"> <@"age" keyType="string"> <χ:atomic-value type="number">27</χ:atomic-value> </@"age"> <@"firstName" keyType="string"> <χ:atomic-value type="string">Sally</χ:atomic-value> </@"firstName"> </}> </@"1"> <@"2" keyType="number"> <{> <@"lastName" keyType="string"> <χ:atomic-value type="string">Galley</χ:atomic-value> </@"lastName"> <@"age" keyType="string"> <χ:atomic-value type="number">41</χ:atomic-value> </@"age"> <@"firstName" keyType="string"> <χ:atomic-value type="string">Jim</χ:atomic-value> </@"firstName"> </}> </@"2"> </}> </@"sales"> <@"accounting" keyType="string"> <{> <@"1" keyType="number"> <{> <@"lastName" keyType="string"> <χ:atomic-value type="string">Doe</χ:atomic-value> </@"lastName"> <@"age" keyType="string"> <χ:atomic-value type="number">23</χ:atomic-value> </@"age"> <@"firstName" keyType="string"> <χ:atomic-value type="string">John</χ:atomic-value> </@"firstName"> </}> </@"1"> <@"2" keyType="number"> <{> <@"lastName" keyType="string"> <χ:atomic-value type="string">Smith</χ:atomic-value> </@"lastName"> <@"age" keyType="string"> <χ:atomic-value type="number">32</χ:atomic-value> </@"age"> <@"firstName" keyType="string"> <χ:atomic-value type="string">Mary</χ:atomic-value> </@"firstName"> </}> </@"2"> </}> </@"accounting"> </}> </χ:data-model>
JSON map items may have non simple type values?
JSON map items may have other types of keys?
JSON map don't have names and element have a mandatory name
Elements would then be a superset of JSON maps.
Make quotes around key values optional when they do not contain spaces or other weird characters
<χ:data-model> <{> <@sales> <{> <@1 keyType="number"> <{> <@lastName> <χ:atomic-value type="string">Green</χ:atomic-value> </@lastName <@age> <χ:atomic-value type="number">27</χ:atomic-value> </@age <@firstName> <χ:atomic-value type="string">Sally</χ:atomic-value> </@firstName </}> </@1 <@2 keyType="number"> <{> <@lastName> <χ:atomic-value type="string">Galley</χ:atomic-value> </@lastName <@age> <χ:atomic-value type="number">41</χ:atomic-value> </@age <@firstName> <χ:atomic-value type="string">Jim</χ:atomic-value> </@firstName </}> </@2 </}> </@sales <@accounting> <{> <@1 keyType="number"> <{> <@lastName> <χ:atomic-value type="string">Doe</χ:atomic-value> </@lastName <@age> <χ:atomic-value type="number">23</χ:atomic-value> </@age <@firstName> <χ:atomic-value type="string">John</χ:atomic-value> </@firstName </}> </@1 <@2 keyType="number"> <{> <@lastName> <χ:atomic-value type="string">Smith</χ:atomic-value> </@lastName <@age> <χ:atomic-value type="number">32</χ:atomic-value> </@age <@firstName> <χ:atomic-value type="string">Mary</χ:atomic-value> </@firstName </}> </@2 </}> </@accounting </}> </χ:data-model>
Replace <χ:atomic-value>
by <=>
and decide that the default type is string
<χ:data-model> <{> <@sales> <{> <@1 keyType="number"> <{> <@lastName> <=>Green</=> </@lastName> <@age> <= type="number">27</=> </@age> <@firstName> <=>Sally</=> </@firstName> </}> </@1> <@2 keyType="number"> <{> <@lastName> <=>Galley</=> </@lastName> <@age> <= type="number">41</=> </@age> <@firstName> <=>Jim</=> </@firstName> </}> </@2> </}> </@sales> <@accounting> <{> <@1 keyType="number"> <{> <@lastName> <=>Doe</=> </@lastName> <@age> <= type="number">23</=> </@age> <@firstName> <=>John</=> </@firstName> </}> </@1> <@2 keyType="number"> <{> <@lastName> <=>Smith</=> </@lastName> <@age> <= type="number">32</=> </@age> <@firstName> <=>Mary</=> </@firstName> </}> </@2> </}> </@accounting> </}> </χ:data-model>
<=>
is useful around atomic values when the entry's value is a sequence. Decide that when the entry's value is a single atomic value, a shortcut is to place its
attribute within the enclosing entry
<χ:data-model> <{> <@sales> <{> <@1 keyType="number"> <{> <@lastName>Green</@lastName> <@age type="number">27</@age> <@firstName>Sally</@firstName> </}> </@1> <@2 keyType="number"> <{> <@lastName>Galley</@lastName> <@age type="number">41</@age> <@firstName>Jim</@firstName> </}> </@2> </}> </@sales> <@accounting> <{> <@1 keyType="number"> <{> <@lastName>Doe</@lastName> <@age type="number">23</@age> <@firstName>John</@firstName> </}> </@1> <@2 keyType="number"> <{> <@lastName>Smith</@lastName> <@age type="number">32</@age> <@firstName>Mary</@firstName> </}> </@2> </}> </@accounting> </}> </χ:data-model>
χ:map/χ:entry/χ:map/χ:entry/χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]or, if you're feeling lucky:
//χ:map[χ:entry[@key='age'][χ:atomic-value < 30]]
map()/@*/map()/@*/map()[@age < 30]]or
//map()[@age < 30]]???
We can validate χίμαιραλ with this kind of schema (note the impact of the restriction on interleave):
namespace χ = "http://χίμαιραλ.com#" start = element χ:data-model { top-level-map } # Top level map: departments top-level-map = element χ:map { element χ:entry { attribute key { xsd:NMTOKEN }, attribute keyType { "string" }, emp-array }* } # List of employees emp-array = element χ:map { element χ:entry { attribute key { xsd:positiveInteger }, attribute keyType { "number" }, emp-map }* } # Description of an employee emp-map = element χ:map { (age | firstName | lastName) + } age = element χ:entry { attribute key { "age" }, attribute keyType { "string" }, element χ:atomic-value { attribute type { "number" }, xsd:positiveInteger } } firstName = element χ:entry { attribute key { "firstName" }, attribute keyType { "string" }, element χ:atomic-value { attribute type { "string" }, xsd:token } } lastName = element χ:entry { attribute key { "lastName" }, attribute keyType { "string" }, element χ:atomic-value { attribute type { "string" }, xsd:token } }
Wouldn't the following be much better?
namespace χ = "http://χίμαιραλ.com#" start = element χ:data-model { top-level-map } # Top level map: departments top-level-map = map { entry xsd:NMTOKEN { emp-array }* } # List of employees emp-array = map { entry xsd:positiveInteger { emp-map }* } # Description of an employee emp-map = map { age, firstName, lastName } age = entry age { xsd:positiveInteger } } firstName = entry firstName { xsd:token } } lastName = entry lastName { xsd:token } }
The bad news is that the map proposal is an ugly chimera :( !
The good news is that we'll find a workaround :) !
Use a spacebar or arrow keys to navigate