The dream of one universal markup language is now past. JSON is clearly here to stay, and it is becoming the format of choice for data interchange.
udl:key
) used in pseudo-attributes
(constructs which look like an attribute but do not represent a node)
and pseudo-tags (which look like an element but do
not represent a node).
udl:null
,
udl:value
,
udl:array
and
udl:map
).
As any element names in XML, these names do not have any
built-in semantics: they do not signal that the element has
been constructed from a JSON value, and they do not imply specific
values of any node properties.
After an update or if the node tree is constructed in any other way,
the elements representing null values, simple values, arrays and objects
may have any valid node name.
udl:key
pseudo-attribute, see next section), XML names cannot
be represented in JSON markup at all. Lossless information mapping in
both directions is nevertheless enabled by arbitrarily defining
JSON markup to represent nodes with default names which are
implied by other node properties.
udl:model
) is introduced which indicates
the value of the udl:defaultModel
,
in which case the default is specified by the nearest ancestor
with a udl:defaultModel
pseudo-attribute.
udl:defaultModel
) is introduced which
sets the default value of [model] for the element itself and its
descendants. The default value applies to the element itself and
to its descendant elements unless the element in question has
simple content (in which case [model] is always "sequence"),
or has a [model] pseudo-attribute (which overrides the default)
or has a nearer ancestor with a
udl:defaultModel
pseudo-attribute (which shadows
any outer default values).
udl:key
) is introduced which indicates
the value of the udl:key
is child of an
element whose [model] is "map", the [key] defaults to the local
name of the element. Example:
udl:key
pseudo-attribute.
Example: if in the following markup
udl:model
were changed to
"sequence", the markup would cease to be well-formed.
xml
,
json
,
telem
,
a slightly simplified version of XML
using JSON-like constructs for simple elements meeting
certain constraints. See
udl:markup
, which specifies the markup
language used to represent the content of an element.
If the value is not xml
, the child nodes
of the element
are the nodes constructed from the markup found in the
text content.
Only element tags and the pseudo-tag udl:markupSection
(see below) may have this
pseudo-attribute. Possible values are:
xml
,
json
,
telem
;
default value is xml
. Example:
udl:markupSection
, which delimits a
markup section, a section of the document text which
uses a particular markup language.
When constructing the node tree, the pseudo-tag and
its contents represent the nodes constructed from
the contained markup. The markup language is identified
by the udl:markup
pseudo-attribute
contained by the pseudo-tag. In the following
example, the pseudo-tag represents five nodes
which are constructed from the JSON markup:
markup
. Possible
values are:
xml
,
json
,
html
;
default is xml
. Depending on the value,
the text following the XML declaration will be
interpreted as XML markup, JSON markup or HTML markup.
Example:
xml
,
serialization produces conventional XML markup,
augmented by the pseudo-attributes
udl:key
, udl:model
and udl:defaultModel
where appropriate.
xml
, the serialization
may nevertheless insert non-XML markup into the document text,
depending on serialization parameters. The non-XML markup is
constrained to represent element contents – that is, every
chunk of non-XML markup is scoped to represent the content
of an element whose start and end tag delimit the chunk.
xml
, additional
serialization parameters control the use of alternative markup
within selected elements. Parameter
json-content-elements
contains a list of expanded
QNames, identifying the elements whose content shall be
represented as JSON markup. In a similar way, parameter
telem-content-elements
identifies the elements
to be rendered using the telem
style.
(For details see
method
is extended by the value
json
. This value lets the
complete document be serialized as JSON markup.
info-loss
specifies how to handle
information loss implied by the serialization.
Special values relate to
situations where JSON markup should be produced but a
node to be serialized contains information which cannot
be expressed by a JSON representation. (There are three cases:
(i) mixed content,
(ii) the use of attributes,
(iii) the use of non-standard element names.)
Three parameter values are supported:
json.strict
,
json.ignore-names
, and
json.projection
. In case of json.strict
the
serialization must be aborted; the value
json.projection
mandates a projection
which simply ignores any information which cannot
be represented;
and the value json.ignore-names
means that the
QNames of XML elements are ignored, but any other
incompatibility with the JSON model
(e.g. the use of attributes)
produces an unrecoverable error. (For details see
&
and <
must always be escaped.
Examples of path steps containing a key test:
fn:node-key
returns
the [key] of a given node, or the empty sequence
if the node has no [key]:
#a/#b/#c
.
fn:node-model
returns
the [model] of a given node, or the empty sequence
if the node is not an element node. The [model] is
represented as a string
which is either "sequence" or "map":
fn:deep-equal
is modified as follows: (a) if the arguments are element
nodes with different [key]s or with different
[models]s, the function returns "false"; (b) if both
arguments are element nodes with [model] equal "map",
the comparison ignores non-element children and
ignores the order of element children.
udl:key
and udl:model
are used in the same way as they are
used in XML markup. In order to reduce verbosity,
however, several abbreviated variants of element constructors
are introduced.
udl:map
and [model] equal "map". Syntax:
udl:map
element are obtained by (a) evaluating the
content expession to an item sequence, (b) replacing
in this sequence any document node by its document element,
(c) replacing in the resulting sequence any element
without a key by a copy which has a key equal to
its local name. An error is raised if the result
sequence contains atomic or text node items, or if
it contains two elements with the same key. Otherwise,
the expression value is guaranteed to be an element
which can be serialized to JSON without information loss.
udl:array
element are obtained by (a) evaluating
the content expression to an item sequence,
(b) replacing in this sequence any document nodes by
their element children,
(c) replacing in the resulting sequence any element
with a key by a copy which does not have a key,
(d) replacing in the resulting sequence any atomic values
by a udl:value
element containing the value as text.
The expression value is guaranteed to be an element which can
be serialized to a JSON array without information loss.
udl:null
,
a [key] property equal S and a
[nilled] property equal true.
udl:value
,
a [key] property equal S and a single
text node child whose string value is
the string value of R. (Special case:
empty content if the string value of R
is a zero-length string.)
The resulting element has a type annotation
which depends on the type of R. If R
has a number type, the type annotation is
one of these:
xs:double
,
xs:decimal
,
xs:integer
,
whatever is closest to the type of R.
If R has a boolean type, the type annotation is
xs:boolean
. If R is a zero-length
string, the type annotation is
xs:untypedAtomic
. Otherwise, the
default type annotation is used (
xs:untyped
).
task | expression | result |
---|---|---|
count books | count(/*/*) |
3 |
maximum price | max(//#price/xs:decimal(.)) |
|
first book title | /*/*[1]/#title/string() |
JSON |
all publication years | distinct-values(//#year/string()) |
2011 2012 |
books about UDL | //#title[contains(., 'UDL')]/string() |
UDL |
books above 30$ | //#title[../#price/xs:decimal(.) gt 30]/string() |
|
books with a single author | //#title[count(../#author/*) eq 1] |
|
books without signature | //#title[empty(../#sigs/*)] |
|
books written by Legoux | /*/*[.//#last = 'Legoux']/#title/string() |
|
coauthors of Legoux | distinct-values(//#last[. eq 'Legoux']/../../*/#last[. ne 'Legoux']) |
|
duplicate signatures |
for $s in distinct-values(//#sigs/*) where count(//#sigs[* = $s]) gt 1 return $s |
|
<udl:map udl:key="...">
The situation can be amended by resorting to the
abbreviated constructors for maps and arrays along with
the key-oriented constructors
(see udl:key
, udl:model
) or
a default value of a property
(udl:defaultModel
), or they identify the markup language
used locally (udl:markup
).
udl:markupSection
) which delimits a section
of non-XML markup.
json.strict
mode. In this case a node
name which is different from the default name expected
(according to the node properties)
is considered information that would be lost during
serialization (see
Name | Usage category | Meaning |
---|---|---|
udl:null |
element name | a standard name available for nilled elements with an unspecific name |
udl:value |
element name | a standard name available for a simple content element with an unspecific name |
udl:array |
element name | a standard name available for a complex element with [model] equal "sequence" |
udl:map |
element name | a standard name available for a complex element with [model] equal "map" |
udl:markupSection |
pseudo tag | delimits a markup section containing markup which may be non-XML; the section represents the nodes resulting from parsing the contained markup text |
udl:markup |
pseudo attribute | indicates the markup language used within element content, or within a markup section |
udl:model |
pseudo attribute | represents the [model] property value |
udl:defaultModel |
pseudo attribute | sets a default value for the [model] property |
udl:key |
pseudo attribute | represents the [key] property value |
udl:markup
pseudo-attribute and the
udl:markupSection
pseudo-tag (see
telem
(text notation for
simple elements).
telem
telem
markup style.
udl:value
xs:integer
,
xs:decimal
,
xs:double
,
xs:boolean
,
xs:untypedAtomic
,
xs:untyped
telem
style, these representations are
separated by a comma. If the value is not put in quotes, it
must be a number or one of the constants
true
,
false
or
null
, which will be interpreted as implicit type
information, following the JSON rules. Example: the following fragment
xs:integer
,
xs:decimal
,
xs:double
,
xs:boolean
).
A string which has non-zero length is translated into
a simple element with [schema-type] xs:untyped
.
A zero-length string is translated into an
empty element node with
[schema-type] xs:untypedAtomic
,
so as to make it distinguishable from a node
constructed from an empty array or object.
JSON item | UDL node properties | remarks | ||
---|---|---|---|---|
node-name | model | children | ||
name/value pair | see below | see below | see below |
|
null | udl:null |
sequence | none | element is nilled |
object | udl:map |
map | elements, one for each name/value | all child elements have a [key] |
array | udl:array |
sequence | elements, one for each member | all child elements without a [key] |
string (non-empty) | udl:value |
sequence | text node |
|
zero-length string | udl:value |
sequence | none | [schema-type] is xs:untypedAtomic |
number | udl:value |
sequence | text node | [schema-type] is one of: xs:integer, xs:decimal, xs:double |
true|false | udl:value |
sequence | text node | [schema-type] is xs:boolean |
info-loss
is json.strict
.
In this case, the actual node name is compared with the
default node name associated with the given element content
and properties, and an unrecoverable error is raised if
actual node name and expected node name are not the same.
info-loss
.
node properties | JSON item | |||
---|---|---|---|---|
children | model | nilled | schema-type | |
empty | sequence | false | xs:untyped or CT |
array (empty) |
empty | sequence | false | xs:untypedAtomic or ST |
string (zero-length) |
empty | map | false | any | object (empty) |
empty | sequence | true | any | null |
element children | sequence | false | any | array |
element children | map | false | any | object |
text node | sequence | false | xs:double |
number |
text node | sequence | false | xs:decimal |
number |
text node | sequence | false | xs:integer |
number |
text node | sequence | false | xs:boolean |
true |false |
text node | sequence | false | xs:untyped or ST |
string |
info-loss
. Presently the parameter
is only relevant when serializing to JSON. Three values are
defined:
json.strict
– any information loss causes
an unrecoverable error
json.ignore-names
– element names are ignored,
but any other information loss causes an unrecoverable
error
json.projection
– any information that JSON
cannot represent is simply ignored
info-loss
equal
json.projection
means:
source | elements | attributes |
---|---|---|
json, value |
arrays, booleans, nulls, numbers, objects, type |
|
exml:anonymous |
exml:fullname, exml:maxOccurs |
|
json, item |
boolean, type |
|
json |
name, starts, type |
type
attribute, for example, used in
[xsi:type
for certain values, yet nevertheless
had to be introduced as additional attribute, because
the value range includes values
array
and object
with ad hoc semantics dictated by the mapping
task. Such attributes reveal the fact that the
current XML node model does not support a bidirectional
mapping into JSON markup. To enable such a mapping,
the node tree must contain special items with
serialization semantics.
This is at odds with the basic principle of serialization
being a process solely controlled by serialization
parameters, without a need to interfere with
the information content of the node tree.
xml:id
attribute more than an element name.
It can be compared to a locally scoped
xml:id
attribute (uniqueness among all element
children of an element).
For these reasons UDL distinguishes
the concepts of names and keys. It thus enables
native relationships between nodes and
XML markup on the one hand and JSON markup on
the other hand. As a result it becomes possible
to regard JSON markup and XML markup as alternative
representations of an information content
which is defined in terms of nodes and their
properties.
Remembering Plato,
one kind of “thing” is inferred from - or may cast -
two different "shadows".
#foo
)
which is interpreted as a string which is
udl:null
xs:untyped
udl:map
xs:untyped
udl:array
xs:untyped
udl:value
xs:integer/xs:decimal/xs:double
–
depending on the lexical form
true
or false
:
xs:boolean
xs:untypedAtomic
xs:untyped
xs:untypedAtomic
. The type annotation
makes the node distinguishable from empty elements
corresponding to empty arrays or objects.
info-loss
: if the value is
json.projection
, the attributes are
ignored; otherwise, a non-recoverable error is raised.
UDL element node | JSON item | ||
---|---|---|---|
[nilled] is true |
info-loss is
json.strict and the element name is not
udl:null , a non-recoverable error is
raised. Otherwise the element is serialized as a
JSON null value.
|
||
|
info-loss is
json.strict and the element
name is not
udl:map , a non-recoverable error is raised.
Otherwise the element is serialized as a JSON object.
The contained name/value pairs are obtained by serializing
the element children. An error is raised if
the element has a text node child with
non-whitespace content.
|
||
|
info-loss is
json.strict and the element
name is not
udl:array , a non-recoverable error is raised.
Otherwise the element is serialized as a JSON array.
The array members are obtained by serializing
the element children. An error is raised if
the element has a text node child with
non-whitespace content.
|
||
|
info-loss is
json.strict and the element
name is not
udl:value , a non-recoverable error is raised.
Otherwise, the element is serialized as a JSON simple value.
The string values of the text nodes are concatenated and the
result is used to construct a simple JSON value whose type
depends on the node's [schema-type]: number (if [schema-type]
is equal to or derived from xs:double or xs:decimal),
Boolean (if [schema-type] is equal to or derived from xs:boolean)
or a string (otherwise).
|
||
|
info-loss is
json.strict and the element
name is not
udl:value , a non-recoverable error is raised.
Otherwise, the element is serialized as a JSON string
value of zero length.
|
||
|
info-loss is
json.strict and the element
name is not
udl:array , a non-recoverable error is raised.
Otherwise, the element is serialized as an empty JSON array.
|
info-loss
with a value
json.ignore-names
will yield
the same document as the strict JSON-serialization of the
counterpart which sticks to unspecific names. Let us further
introduce the notion of
json.ignore-names
(the “JSON”)
json.ignore-names
;
and they can also be serialized to a well-readable XML
representation of that JSON document.
When dealing with nJSON documents, nnJSON can
be used as a normalization of information which enables unified
processing code: code that is used no matter if the input is
JSON or XML and whether the output is JSON or XML. This
unified code consumes an nnJSON tree and it produces an
nnJSON tree. The UDL extensions discussed so far ensure that
the nnJSON output can be alternatively serialized as readable
XML or nJSON. The extensions do however not enable the parsing
of both, nJSON text (JSON) and nnJSON text (XML) into an nnJSON
tree. After all, the information content of nJSON and nnJSON
is different and parsing by definition does not change the
information content: parsing alone will always produce one
kind of tree or the other. The processing pattern just
sketched – “read and write nnJSON” - therefore has to rely
on a translation of an nJSON text or node tree into an nnJSON
node tree.
doc("foo.json")
)
it can easily be transformed into an nnJSON document, e.g. with
a simple stylesheet. nJSON documents are however so important
that they warrant a built-in support supplied by the UDL
extensions. Therefore the present proposal adds a
special-purpose-function which combines the JSON parsing
and its transformation to an equivalent nnJSON document:
json.ignore-names
. Note
the use of an arbitrary namespace and the choice of
intuitive element names for key-less elements. A
second signature of the nnjson
function enables control of these customizations:
patternsAndNames
parameter expects an alternating
sequence of XSLT pattern values and an element name; when
renaming a key-less element, the first matching pattern is
located and the name is taken from the item following the
pattern item. Our example could be produced by the
following call:
nnjson
function is a convenience function which
combines the parsing of an nJSON document with a
transformation of particular interest. The transformation
is defined in such a way that the changes of information
content do not interfere with a subsequent JSON-serialization
(using json.ignore-names
). This curious mixture
of parsing and transformation is regarded as a first-class
operation deserving a built-in XPath function because of
a well-defined relationship between the resulting
XML document and the original JSON document.