About this Paper
Firstly: the status of the work. The QT4 community group has been busy for about three years developing enhancements to the XSLT, XPath, and XQuery languages, with a view to publishing 4.0 versions of these specifications. The work is ongoing and subject to revision and change as the work proceeds. This paper describes one recent change proposal: the group has spent a fair bit of time reviewing it and revising it. At the time of writing, the proposal has not yet been accepted into the status quo draft, but with luck that will change by the time of the conference.
The paper is in three parts, an end, a middle, and a beginning, in that order.
-
The first section, called The End, is about what the proposal delivers to users: what capability does it offer.
-
The second section, called The Middle, is about how it works: what are the concepts and data model structures that enable the syntax and semantics to function.
-
The final section, called The Beginning, is about how we got to where we are: it reviews previous work and tries to explain how the ideas came together, and some of the cul-de-sacs that we went down along the journey.
A lot of the thinking behind the proposal was motivated by use cases involving XSLT transformation of JSON, but in fact XSLT is hardly mentioned in the paper. That's because the JNodes proposal described in this paper is designed to provide the foundations for recursive-descent rule-based transformation in the XSLT style, but the actual XSLT machinery to be built on top of these foundations is not yet fully defined or agreed. Watch this space; meanwhile we focus on navigational queries.
The End
This section is designed to illustrate what becomes possible with the new syntax. We'll start with some sample JSON, and show some queries against it.
We'll start with a sample JSON document that appears in RFC 9535, the specification of JSONPath, a query language inspired by XPath 1.0 but designed exclusively for JSON. The example is frankly bonkers, but by choosing it (a) we can compare the XPath 4.0 and JSONPath solutions for the same queries, and (b) we can't be accused of bias by choosing an example that played to our strengths.
Here's the example JSON: a store catalog for a shop that sells four books and a bicycle.
{ "store": {
"book": [
{ "category": "reference",
"author": "Nigel Rees",
"title": "Sayings of the Century",
"price": 8.95
},
{ "category": "fiction",
"author": "Evelyn Waugh",
"title": "Sword of Honour",
"price": 12.99
},
{ "category": "fiction",
"author": "Herman Melville",
"title": "Moby Dick",
"isbn": "0-553-21311-3",
"price": 8.99
},
{ "category": "fiction",
"author": "J. R. R. Tolkien",
"title": "The Lord of the Rings",
"isbn": "0-395-19395-8",
"price": 22.99
}
],
"bicycle": {
"color": "red",
"price": 399
}
}
}We'll start by presenting the example queries given in the JSONPath specification, and showing their XPath 4.0 equivalents.
Table I
JSONPath queries and their XPath 4.0 equivalents
| Query | JSONPath | XPath 4.0 |
|---|---|---|
|
The authors of all books in the store |
$.store.book[*].author |
/store/book/*/author |
|
All authors |
$..author |
//author |
|
All things in the store, which are some books and a red bicycle |
$.store.* |
/store/* |
|
The prices of everything in the store |
$.store..price |
/store//price |
|
The third book |
$..book[2] |
//book/*[3] |
|
The third book's author |
$..book[2].author |
//book/*[3]/author |
|
Empty result: the third book does not have a "publisher" member |
$..book[2].publisher |
//book/*[3]/publisher |
|
The last book in order |
$..book[-1] |
//book/*[last()] |
|
The first two books |
$..book[0,1] |
//book/*[1,2] |
|
All books with an ISBN number |
$..book[?@.isbn] |
//book/*[isbn] |
|
All books cheaper than 10 |
$..book[?@.price < 10] |
//book/*[price lt 10] |
|
All member values and array elements contained in the input value |
$..* |
//* |
Now, the first thing you have probably noticed is that the syntax of all the examples above is pure XPath 3.1 (in fact, it's almost pure XPath 1.0). Essentially what the JNodes proposal does is to bend the data model for maps and arrays so that it is sufficiently close to the data model for XML that the same syntax can be used, with only minor adjustments to the semantics.
With XPath 4.0 and XQuery 4.0 we can go well beyond these simple examples, both using features that are familiar to XPath 3.1 and XQuery 3.1 users, and using new features in 4.0 (some of which, but not all, are related to the use of JNodes).
Here are some more examples (the fourth is XQuery only, the others work in both XPath and XQuery):
Table II
More XPath and XQuery 4.0 examples
| Query | XPath 4.0/XQuery 4.0 |
|
All distinct authors, in sorted order |
distinct-values(//author) => sort() |
|
All authors of fiction |
//author[../category = 'fiction'] |
|
The book before "Moby Dick" |
//book/*[title='Moby Dick']/preceding-sibling::*[1] |
|
The number of books in each category |
map:merge( for $b in //book/* group by $cat := $b/category return map:entry($cat, count($b) ) |
|
The categories containing products priced at over 100 |
/store/*[.//price gt 100] => jnode-selector() |
Now, frankly, the problem with this JSON sample is that it is too simple. The lookup operator in XPath 3.1 is sufficiently powerful to tackle most of these examples. So it's worth a reminder of why improvements were needed:
-
The existing expression
$array?*flattens the array, returning a sequence of items. For example[(1 to 3), (), 10]returns(1, 2, 3, 10), losing the information that there were actually three array members. -
A lookup expression such as
$book?priceis fine for getting the value of a particular entry in a map, but when you have a tree of maps and arrays, you don't just want to know the value, you want to know where it was found so that you can relate it to other information. (The JNodes proposal replaces an operator in the current 4.0 draft, the deep lookup operator??. The problem with this operator is that it's very hard to add any context: you don't just want to know the prices of everything, you typically want what products the prices relate to).This is illustrated by the last example above. In this example we want to search for a qualifying value, and then return the corresponding key. This is achieved by calling the
jnode-selectorfunction, which is rather similar to thenamefunction for XNodes — it returns the key corresponding to the selected value. -
In many more complex queries, especially those involving deep or recursive structures, there's a need to navigate upwards in the tree to get properties of parent or ancestor objects. The only alternative is to pass this information downwards by adding extra arguments to function calls that perform downwards navigation, and this quickly gets very clumsy. (In XSLT, tunnel parameters can help, but they can't be referenced in match patterns, unfortunately.)
The Middle
How does it work?
Reusing the XPath path expression syntax to navigate maps and arrays
depends on having a data model whose structure is sufficiently similar
to the tree structure used to model XML so that the syntax can be interpreted
with only small changes to the semantics of particular structures. In particular
the syntax of steps (the construct appearing on the right-hand side of the path operator
/)
can't depend on what type of item is returned by the left-hand operand,
because we don't have enough static type information to know whether we are dealing
with XML trees or JSON trees: in fact, at the
time we parse the query, we have no type information at all.
This means we need to iron out some of the differences between the XML model and the JSON model — or more accurately, we need to find abstractions that work for both cases. (And note, I'm using the term "JSON" loosely here: maps and arrays in XDM are actually a lot richer than those in JSON. For example the keys in a map aren't necessarily strings, and the associated values can be any XDM value, including for example a sequence of maps rather than a single map, or even an XML node.)
What are the important differences?
-
In XML, the key constructs (elements and attributes) have names, and these names are the primary navigation aid to finding our way around the structure. The key constructs in JSON (maps and arrays) don't have names. The nearest equivalent are the keys (in a map), and the integer index positions (in an array), but these aren't properties of the construct itself, rather, they establish the role or relationship of the construct within its parent in the tree. This leads to an important realisation: we shouldn't be thinking of a tree of maps and arrays, we should be thinking of a tree of entries within maps and arrays; these entries have a selector (a key in the case of maps, an index position in the case of arrays) which plays a closely analogous role to that of element and attribute names in an XML tree.
-
Traditionally, trees representing XML have parent pointers, while trees representing JSON don't. There's nothing intrinsic to XML or JSON that means this has to be the case, but there are upsides and downsides to both designs, and it makes sense to respect the tradition. Certainly, having access to parents and ancestors is mighty convenient, but the cost is high: it means that subtrees can't be shared, which makes it very expensive to make a small change to a large tree. The key insight to solving this conundrum is that the parent pointers can be transient. We can never reach a leaf node in a tree other than by navigating downwards from the root, and it is possible while doing this downwards navigation to remember the path we have trodden so that we can retrace our steps. This idea is referred to in the functional programming literature as a zipper data structure.
Closely associated with this is the concept of node identity. The requirement that nodes in XML trees have a persistent identity is another factor that makes physical sharing of subtrees difficult and therefore makes small changes expensive. (There are others as well, such as namespaces, but life is too short to talk about namespaces.)
Suffice it to say that we want to make parents and ancestors accessible without preventing the use of functional data structures (also called immutable or persistent data structures) that allow non-destructive modification without bulk copying.
-
Elements and attributes are restricted in what kind of value they can contain; array member and map entries can hold anything. This means there is a different relationship between "the value of the node" and "the children of the node".
The solution adopted is to define the concept of JNodes as an analog of the existing XML-based nodes, now called XNodes for differentiation. The abstract superclass is called GNode (for generalized node).
A root JNode is a transient wrapper around a map or array. The children of this root wrap the map entries or array members. It is expected that these wrappers will be created lazily on demand, and discarded when they are no longer needed. A non-root JNode has the following properties:
-
parent: a reference to the parent JNode in the tree.
-
content: the value of the array member or map entry.
-
selector: the map key or array index that distinguishes this child from its siblings.
-
position: needed to handle the edge case where an array member or map entry holds a sequence of values that might include multiple maps and arrays. This gets complicated and is rarely needed in practice (and is never needed when processing JSON), so I won't mention it again.
Now, if we ignore attributes and namespaces, the existing XPath axes can all be defined in terms of (a) the parent and child relationships, and (b) document order (meaning that the children of a node are ordered). If we can define the child axis for JNodes, then we have essentially defined all the XPath axes. For a value that is a single map or array, defining the child axis is easy (it contains one JNode for each map entry or array member). If the value is atomic, or a sequence of atomic items, then the child axis is empty. For a value that might contain multiple maps or arrays, it gets a bit more complicated, which is where the position property comes into play, but we'll ignore that case.
As for document order, this is intuitive in the case of arrays, and for maps, we had already decided that in 4.0, the entries in a map should be ordered. The reason for this is primarily to make serialized JSON human-readable. XML's distinction between attributes (small) and child elements (often large) gives a natural and predictable output structure even if attribute order is not retained; with JSON, outputting deeply-nested properties and simple string-valued properties in random order makes it quite impossible for the human reader to find anything. So with both maps and arrays, the concept of sibling order is well defined.
So the JNode model gives us all the XPath axes, and the only other thing we need to make path expressions work is a suitable set of selectors - the equivalent of the name tests and node tests used in the XML model.
Existing XPath gives us three ways of selecting nodes on an axis (plus the
wildcard node() which selects everything). We can select
nodes using a (partial or total) match on the name, or a match on the type,
or by a general predicate. The use of predicates needs no adaptation to work
with JNodes, and the other two mechanisms can be generalized:
-
Name tests can be generalized to selector tests. Taking the existing capabilities of the
?lookup operator, we can intepret$a/codeas selecting within a map using the string"code"as a key value. For other key values (including strings containing spaces) we introduce the syntax$a/get(Expr)whereExpris an expression that computes the required key values: so$a/get("date of birth")selects using a key value with spaces. To select within an array, we can use$a/get(1), though in practice the predicate based syntax$a/*[1]works just as well.To ensure that the
get()syntax also works for XNodes, we can make$a/get(Expr)select elements whose names are determined dynamically, essentially an alternative to$a[node-name()=Expr]. -
Type tests (which currently allow node types such as
text(),comment(), orschema-element(S)) can be generalized to specify an arbitrary sequence type. For example,$array/type(xs:integer+)selects all members in an array whose type isxs:integer+(a sequence of one or more integers). For commonly used types where there is no syntactic ambiguity, the syntax/type(T)can be abbreviated to/T: this not only retains existing options such as$a/text(), but for selection from JNodes it also allows the very useful syntax$a/record(longitude, latitude)which selects all children that are maps having entries with the keys "longitude" and "latitude". In the absence of element names, this is often the best way to identify the constructs of interest.The type syntax is extended regardless whether selecting from JNodes or XNodes; specifying a type such as
map(*)when selecting from an element node will simply select nothing, because elements never have maps among their children. That's exactly the same as requesting$element/document-node()today.
The Beginning
This final section examines the background to the JNodes proposal: it looks at some of the formative steps and insights that led up to it, and at some of the wrong turnings that we took along the road.[1]
The initial introduction of maps into the data model was motivated by the work on streaming in XSLT 3.0. When streaming, you can't visit a node in the source document more than once, which means that when you encounter a node, you may need to remember selected information until it is needed later in the process, and maps seemed a good way of retaining and accumulating that information. We then realised that maps were also useful for processing JSON, and that led in turn to the addition of arrays to the data model. (Arrays were a bit of an afterthought in XSLT 3.0, they were only fully developed in XPath 3.1).
As early as 2016, the year before XSLT 3.0 was finalized, it was clear that there were limitations in the capabilities for transforming JSON structures. At XML Prague in that year I presented a paper [Kay 2016] that examined a couple of fairly simple transformation tasks, and came to the rather unwelcome conclusion that the best way of transforming the JSON was to covert it to XML, perform an XML-to-XML transformation, and then convert the result back to JSON. Ever since then I have been trying to find a better way of doing such transformations natively. There were two main limitations identified: the inability to access properties from ancestors in the tree, and the difficulty of writing selective patterns to match constructs in the JSON tree.
Two years later, at XML Prague 2018, I took a look at the other side of the coin: how to find a data structure for representing XML trees that achieved some of the benefits of the JSON model, in terms of reducing the overhead of making small changes to large trees [Kay 2018a]. In this paper I "invented" the KL-tree, a representation of XML in which there were no persistent parent pointers, but where instead the act of navigating downwards in the tree retained enough context to enable the ancestors to be found on request. After I presented the talk, a member of the audience pointed out very politely that I had reinvented the Zipper structure. Applying such a structure to XML trees remains unfinished business. I got it working, but never got it performing well enough to put into production, largely because the performance couldn't complete with the Saxon TinyTree which relies on node adjacency in memory to achieve its speed. Use of contiguous memory locations for adjacent nodes gives great CPU caching, but of course makes sharing of subtrees impossible.
Also in 2018, this time at Markup UK, I presented a case study on writing an XSD schema validator in XSLT [Kay 2018b]. This is another project that never made it into production, but I learned a lot from the exercise of writing a significant piece of XSLT code that made heavy use of maps and arrays for its working data. In particular, this work led to the introduction of record types (initially called tuple types) which play a significant role in XPath 4.0. But the maps and arrays used here were essentially single-level and the work didn't make any noticeable contribution to solving the JSON transformation problem.
Fast forward to 2020, again in Prague, where I put forward my ideas for an XSLT 4.0 [Kay 2020]. Many of the ideas in that paper have found their way into the current drafts. The paper identified rule-based transformation of JSON structures as an important area for improvement, but had little to offer in the way of concrete suggestions for how to achieve it.
I tried to fill this gap with my paper at Balisage 2022 [Kay 2022]. There were many good ideas in that paper, and many of them found their way into the 4.0 draft specifications and into Saxon, but the section on rule-based transformation is again rather weak. There are ideas that made a difference, but it's clear I was still struggling with the problem.
In retrospect, I think the reason for this was a reluctance to introduce data model changes. I was aware that introducing data model changes in 3.0, to add maps and arrays, was the major reason that development of 3.0 took so long, and I was keen to finish 4.0 in less than a decade. So I self-imposed a rule, no data model changes. But I think what becomes clear from the JNodes proposal in this paper is that if you get the data model right, everything else becomes easy.
My XML Prague paper in 2024 [Kay 2024]
shows some recognition of this. In that paper I introduced the idea of adding semi-hidden
labels to the
results of a lookup operation, where the labels could retain information about the
selection path used to retrieve data from the JSON tree. These labels would be accessible
to internal operations, to allow matching of nodes in XSLT and upwards navigation.
These labelled values can be seen as prototypes of what we now know as JNodes.
We adopted labelled values in the draft spec, but they didn't solve all the problems,
and alongside them we introduced another capability with overlapping functionality,
a construct that allowed the lookup operator to return key/value pairs rather than
plain values, using the syntax $map?pair::KEY. It was clear that the two
features overlapped, and didn't play well with each other.
Earlier this year I embarked on a case study to examine just how far the changes we had made in 4.0 (specifically in the area of rule-based tranformation) had taken us towards meeting the requirements. I reported on this case study at Markup UK in June 2025 [Kay 2025]. The case study took an existing XML-based application of significant size and complexity (the Java to C# transpiler that Saxonica uses for porting the Java Saxon product to .NET) and examined how feasible it would be to make it work with JSON rather than XML. This exercise generated lots of ideas. JNodes as such do not appear in the write-up: but it was the exercise of conducting this case study and analysing its conclusions that led directly to the formulation of the JNodes proposal a few days later. Indeed, throughout this ten-year journey, all the significant ideas have come from using XSLT to solve challenging but realistic problems and thinking about how it could be made better.
The journey is not quite at its end. I'm pretty confident that the JNode data model provides the right foundation, and we now have very powerful query facilities as a result. There are still two loose ends that need to be tied up: one is defining match patterns for matching JNodes in XSLT, and the other is a "deep update" capability (probably using higher order functions) that makes it simple to define simple changes to trees of maps and arrays (and possibly trees of XNodes as well) without explicit navigation of the entire structure.
Appendix A. The JNodes Proposal
Pointers to the current status quo drafts of the 4.0 specifications are maintained at https://qt4cg.org/. Change proposals are managed on GitHub at https://github.com/qt4cg/qtspecs. The proposal in question is pull request (PR) 2083.
While a PR is active, a version of the specifications with difference markings is available via a dashboard at https://qt4cg.org/dashboard/. Sadly, this generally disappears once the PR is accepted and integrated into the baseline spec, meaning that the only persistent record of the PR is the actual commit history, which is not designed for human consumption.
The major changes, however (unless of course they are amended beyond all recognition by subsequent work) will be found in the following places:
-
Section §8.4 of the Data Model specification, titled JNodes.
-
Section §4.6 of the XPath specification, titled Path Expressions.
-
Section §20 of the Functions and Operators specification, titled Processing JNodes.
References
[Kay 2016]
Michael Kay.
Transforming JSON using XSLT 3.0
.
XML Prague, 2016.
http://archive.xmlprague.cz/2016/files/xmlprague-2016-proceedings.pdf.
[Kay 2018a]
Michael Kay.
XML Tree Models for Efficient Copy Operations
.
XML Prague, 2018.
http://archive.xmlprague.cz/2018/files/xmlprague-2018-proceedings.pdf.
[Kay 2018b]
Michael Kay.
An XSD 1.1 Schema Validator Written in XSLT 3.0
.
Markup UK, 2018.
http://markupuk.org/2018/Markup-UK-2018-proceedings.pdf.
[Kay 2020]
Michael Kay.
A Proposal for XSLT 4.0
.
XML Prague, 2020.
http://archive.xmlprague.cz/2020/files/xmlprague-2020-proceedings.pdf.
[Kay 2022]
Michael Kay.
XSLT Extensions for JSON Processing
.
Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022.
In Proceedings of Balisage: The Markup Conference 2022.
Balisage Series on Markup Technologies, vol. 27 (2022).
doi:https://doi.org/10.4242/BalisageVol27.Kay01.
[Kay 2024]
Michael Kay.
Navigating and Updating Trees of Maps and Arrays
.
XML Prague, 2024.
https://archive.xmlprague.cz/2024/files/xmlprague-2024-proceedings.pdf.
[Kay 2025]
Michael Kay.
Processing JSON with Template Rules
.
Markup UK, 2025.
https://markupuk.org/webhelp/index.html.
[1] I guess some people might be interested in knowing which individuals can claim the credit for good insights or take the blame for bad judgements. I'm happy to do both. However, although I have personally been responsible for both good and bad ideas along the way, over a period of at least ten years, the work would not have happened without the comments, criticisms, and suggestions of many other people who helped me along the road: in the end, it's a team effort.