How to cite this paper
Lenz, Evan. “Carrot: An appetizing hybrid of XQuery and XSLT.” Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Balisage: The Markup Conference 2011. Balisage Series on Markup Technologies, vol. 7 (2011). https://doi.org/10.4242/BalisageVol7.Lenz01.
Balisage: The Markup Conference 2011
August 2 - 5, 2011
Balisage Paper: Carrot
An appetizing hybrid of XQuery and XSLT
Evan Lenz
Software Developer, Community
MarkLogic Corporation
Evan Lenz has been a specialist in XML technologies since 1999, having served on the
W3C XSL Working Group, written XML-related books and articles, and spoken at numerous
conferences. He is currently working for MarkLogic Corporation.
Copyright © 2011 MarkLogic Corporation
Abstract
On the surface, XQuery and XSLT are very different languages. Users tend to prefer
one language or the other. XSLT users are loath to give up the power of template rules;
on the other hand, users of XQuery prefer its concise, composable syntax and perhaps
wouldn't dare writing code in XML. There are good historical reasons why they are
not the same language. For one thing, XSLT came first, and XQuery was designed more
with SQL users in mind. However, the two languages share the same data model and a
large syntactic subset (XPath 2.0), which raises the question: Is there a way to yield
the unique benefits of both languages without having to continually decide between
the two? The answer is yes. Carrot, an appetizing hybrid of XQuery and XSLT, lets
you have your cake and eat it too.
Table of Contents
- Introduction
- Background and
influences
- Introduction by example
- Carrot definitions
-
- Global variables
- Functions
- Rules
- Carrot expressions
-
- Ruleset invocations
- Shallow copy
constructors
- Text node literals
- Expression semantics
- What about xsl:for-each,
xsl:for-each-group, etc.?
- Implementation strategy
- Future directions
-
- Simple mapping operator
- Mode merging
- Underlying language
development
Introduction
Carrot combines the best that XQuery and XSLT have to offer:
-
the friendly
syntax and composability of XQuery expressions, plus
-
the power and
flexibility of template rules in XSLT.
Carrot can also be (loosely) thought of as an alternative, more
composable syntax for XSLT.
Background and
influences
Carrot is not the first XSLT-inspired project to provide a shorter
syntax than XSLT itself. Syntax shorthands have included Paul
Tchistopolskii's XSLScript, Sam
Wilmott's RXSLT, and another
project called XSLTXT. Although
none of these projects provided direct inspiration for Carrot, they
all address one of the same desires that Carrot addresses: being
able to program in XSLT more concisely. However, unlike these
projects, Carrot addresses more than XSLT's verbosity. It also
addresses XSLT's limited composability. For example, in XSLT you
can't include an element constructor in a path expression (like you
can in XQuery and Carrot) or apply templates inside a path
expression (which you can uniquely do in Carrot).
A
more direct inspiration was James Clark's proposal for Unifying XSLT and XQuery
element construction. Written during the early days of
the W3C activity on XQuery, that proposal suggested that XQuery and
XSLT language constructs could be used interchangeably if XQuery
used an XML-based syntax (via a simple document element wrapper).
As we now know, things didn't turn out that way. Carrot takes
essentially the opposite approach. Rather than make XQuery use an
XML-based syntax like XSLT's, make XSLT (Carrot, actually) use a
non-XML-based syntax like XQuery's.
Carrot is also inspired by Haskell's syntax, which defines
functions using pattern-matching and an equation-like syntax.
Introduction by example
Carrot is best understood by example. Here's an example of XSLT's
syntax for a template rule (henceforth "rule"):
<xsl:template match="para">
<p>
<xsl:apply-templates/>
<p>
</xsl:template>
In Carrot, you'd write the above rule like this:
^(para) := <p>{^()}</p>;
There are a few things to note
about the above. To define a rule in Carrot, you use the same
operator that XQuery uses for binding variables
(:=).
Everything on the right-hand side up to the semi-colon is an
expression in Carrot. An expression in Carrot is simply an XQuery
expression, plus some extensions. In this case, the expression is
using the extended syntax for invoking rules:
^()
which is short for:
^(node())
just as:
<xsl:apply-templates/>
is short for:
<xsl:apply-templates select="node()"/>
All rules belong to a ruleset (equivalent to a "mode" in
XSLT). The above examples use the unnamed ruleset (there's just one
of these). Here's an example that belongs to a ruleset named
"toc":
^toc(section) := <li>{ ^toc() }</li>;
The above is short for:
<xsl:template match="section" mode="toc">
<li>
<xsl:apply-templates mode="toc"/>
</li>
</xsl:template>
Here's the identity transform in Carrot:
^(@*|node()) := copy{ ^(@*|node()) };
This recursively copies the input to the output, one node at a
time.
Here's a Carrot script that
creates an HTML document with dynamic content for its title and
body, converting <para> elements
in the input to <p> elements
in the output:
^(/) :=
<html>
<head>
{ /doc/title }
</head>
<body>
{ ^(/doc/para) }
</body>
</html>;
^(para) := <p>{ ^() }</p>;
As a comparison, here's what you'd have to write if you were using
regular XSLT:
<xsl:transform version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<xsl:copy-of select="/doc/title"/>
</head>
<body>
<xsl:apply-templates select="/doc/para"/>
</body>
</html>
</xsl:template>
<xsl:template match="para">
<p>
<xsl:apply-templates/>
</p>
</xsl:template>
</xsl:stylesheet>
Just
as in XSLT, rules in Carrot can be associated with more than one
mode. In XSLT, this template rule belongs to two modes:
<xsl:template mode="foo bar" match="bang"/>
Here's the equivalent rule in
Carrot, belonging to two rulesets:
^foo|bar(bang) := ();
Carrot definitions
A
Carrot module consists of a set of unordered definitions. Unlike XQuery, there is
no distinction between main modules and library modules. Likewise, a Carrot module
has no "body." Instead, there are only definitions.
Carrot is more like XSLT in this regard. Also unlike XQuery, Carrot modules need not
be associated with a namespace.
There are three kinds of
definitions in Carrot:
-
global variables,
-
functions, and
-
rules.
Global variables
A global
variable definition is very similar to a variable declaration in
XQuery, except that you don't need the "declare variable" verbiage.
Whereas in XQuery you would write:
declare variable $foo := "a string value";
In
Carrot you would instead write:
$foo := "a string value";
Functions
A
function definition is just like a function declaration in XQuery
except that you don't need the "declare function" verbiage and,
instead of curly braces, you use the same binding operator (:=) as
a variable definition. For example, whereas in XQuery, you would
declare functions like this:
declare function my:foo() { "return value" };
declare function my:bar($str as xs:string) as xs:string { upper-case($str) };
In
Carrot, you would instead write:
my:foo() := "return value";
my:bar($str as xs:string) as xs:string := upper-case($str);
Why
not just use the regular XQuery syntax? Two reasons: conciseness
(lower signal-to-noise ratio) and consistency (with the other two
types of definitions).
Rules
The third type of definition is a rule. This
corresponds to a template rule in XSLT. For example, this rule
matches any element node (*):
^foo(*) := "return value";
Unlike a function definition, the "argument" of a rule definition ("*" in the above
case)
is not an (optional) formal
parameter list; instead it is a required pattern (as XSLT defines a
pattern). Thus, it's illegal to have an empty set of parentheses in
a rule definition:
^foo() := "return value"; (: NOT LEGAL :)
Note
the asymmetry with ruleset invocations, where it is legal to call ^foo()
, which is short
for ^foo(node())
.
Of
course, rules can also have parameters (just as template rules can
have parameters in XSLT). The syntax for declaring these is very similar to an XQuery
function parameter list, except that it comes after the pattern and is
separated from the pattern by a semicolon:
^foo(* ; $str as xs:string) := concat($str, .);
Carrot also supports tunnel parameters, as in XSLT. To indicate a tunnel parameter,
you add the keyword "tunnel" before the parameter:
^foo(* ; tunnel $str as xs:string) := concat($str, .);
Unlike XQuery functions, parameters in a rule are identified by name, not position.
Thus the syntax for passing them looks very similar to how they are declared, and
the order of parameters is insignificant. The following expression applies the "foo"
ruleset to the context node, passing the tunnel parameter $str with the value "Hello":
^foo(. ; tunnel $str := "Hello")
What about conflict resolution among multiple matching rules? Carrot behaves the same
as XSLT: rules with higher import precedence win, followed by rules with higher priority.
Default priority is based on the syntax of the pattern, just as in XSLT. You can also
specify the priority explicitly (right before the binding operator :=), as in the
first rule of this example, which explicitly sets the priority to 1:
^author-listing( author[1] ) 1 := ^();
^author-listing( author ) := ", " , ^();
^author-listing( author[last()] ) := " and " , ^();
Carrot expressions
The
right-hand side of a Carrot definition, whether it be a variable,
function, or rule, is a Carrot expression. The context for the
expression evaluation is the same as it is for sequence
constructors within a template rule in XSLT. For example, the
context node is the node matched by the rule's pattern.
A
Carrot expression is an XQuery expression with some extensions:
-
ruleset
invocations — ^mode(nodes)
-
shallow copy{…}
constructors
-
text node
literals — `my text node`
Let's
look at each of these extensions in turn and the rationale behind
each one.
Ruleset invocations
Ruleset invocations (i.e., "apply-templates" in XSLT) are largely
Carrot's raison d'etre. They are not possible in XQuery; thus, the
extension is required. Not only that, but XSLT can't invoke rules
(apply templates) in an expression either. In Carrot, all
definitions are bound to an expression, so the only way to "do"
anything is to write an expression. (Unlike XSLT, Carrot does not
make a distinction between "instructions" and "expressions";
everything is an expression.)
Shallow copy
constructors
Shallow copy constructors are possible in XSLT but not XQuery. The
difference between a copy constructor and using an XQuery element
constructor is that, in the latter case, the namespace context
comes from the query rather than the source document. XQuery allows
you to perform deep element copies from the source document, but
not shallow copies. Without this ability, modified identity
transforms are impractical in XQuery. The semantics of Carrot's
copy constructor are essentially the same as XSLT's
<xsl:copy> instruction. For example, when the context node is
not an element node, it behaves the same as if a deep copy were
being performed.
Note
XSLT 2.1/3.0 promises to add a "select" attribute to <xsl:copy> to make it convenient
to perform a shallow copy of a node other than the context node. This is largely unnecessary
in Carrot, since copy constructors can be easily composed within an expression, making
it convenient to write, for example, foo/copy{…}
.
Text node literals
Carrot also adds text node literals, using the back-tick (`) for
the delimiter. This extension may at first seem to be of minimal
value, since XQuery already allows you to construct text nodes
using text{…}
, and strings using quotes (or apostrophes).
However, in practice, text node literals will often be the
preferred syntax, as the following examples should make clear.
Consider the following template rules in XSLT:
<xsl:template mode="file-name" match="doc">doc</xsl:template>
<xsl:template mode="file-ext" match="doc">.xml</xsl:template>
<xsl:template match="/doc">
<result>
<xsl:apply-templates mode="file-name" select="."/>
<xsl:apply-templates mode="file-ext" select="."/>
</result>
</xsl:template>
In
Carrot, you might naturally rewrite the above as follows:
^file-name(doc) := "doc";
^file-ext (doc) := ".xml";
^(/doc) := <result>{ ^file-name(.), ^file-ext(.) }</result>
The
problem is that this will produce an undesired result:
<result>doc .xml</result>
The
extra space results because of the way in which sequences of atomic
values are combined to make a text node in XQuery. Contiguous sequences of text nodes,
on
the other hand, are merged together without any intervening spaces,
so you could fix things by using explicit text node
constructors:
^file-name(doc) := text{"doc"};
^file-ext (doc) := text{".xml"};
The
problem here is that it may be an edge case with a large syntactic
cost if you want to cover your bases (six extra characters for
every text node). If in 90% of cases, using a string will result in
the exact same behavior as if you had used a text node, you will be
strongly tempted as a user to use quotes instead of text{…}
everywhere. However, you will get bugs in the remaining 10% of your
code because of the way sequences of strings are concatenated to
make a text node in XQuery.
Whereas it's more verbose in XQuery to construct a text node (using
text{…}
) than it is to return a string (using quotes), it's more verbose in
XSLT to return a string (using <xsl:sequence>) than it is to
return a text node (using a literal text node in the stylesheet). Text node literals
in Carrot address this imbalance by
making it equally convenient to create text nodes and strings.
Thus, we naturally rewrite our Carrot definitions to get the
desired result, without having to think about whether this is an
edge case or not:
^file-name(doc) := `doc`;
^file-ext (doc) := `.xml`;
The existence of text
node literals makes it easy to follow a simple rule: use text node
literals when you are constructing part of a result document; use
string literals when you know you want to return a string.
Expression semantics
Expressions in Carrot, unless otherwise noted here, are assumed to
have the same semantics as in XQuery. Carrot operates on exactly
the same data model as XQuery 1.0 and XPath 2.0.
One
exception is that namespace attribute declarations on element
constructors in Carrot do not affect the default element namespace
for XPath expressions. Carrot is more like XSLT in this regard, in
that it makes a distinction between the default namespace for input
documents and the default namespace for output documents
("xpath-default-namespace" in XSLT), thereby correcting what is
arguably a design bug in XQuery.
What about xsl:for-each,
xsl:for-each-group, etc.?
Given
that XQuery expressions do not include everything that it's
possible to do in an XSLT template rule, that begs the question:
What do all the XSLT instructions get mapped to in Carrot? In many
cases, Carrot simply does not have an analogue. In some cases,
that's because XQuery already provides a different way to achieve
the same use case. For example, <xsl:for-each> does not have
a direct analogue in Carrot. For iteration over a sequence, you can
use "for" expressions, or even just "/" when applicable. The
following Carrot (and XQuery) expression constructs a new
<bar> element for each <foo> element, rendering
<xsl:for-each> unnecessary for this case: foo/<bar/>
. Similarly,
Carrot does not support <xsl:sort>. For sorting sequences in
Carrot, you would instead use "order by", as in XQuery. Local
variables are defined using "let" expressions. Etc.
The
biggest area not currently addressed by Carrot—and which remains
an open question—is how to perform grouping. There are a few
answers to this question, not all mutually exclusive:
-
Extend Carrot to support grouping.
-
Import an XSLT
2.0 stylesheet when you need grouping.
-
Wait for
grouping to be added to XQuery 3.0 expressions and use
those.
At
this stage, the operative answers to this question are #2 and
#3.
Designing support for multiple output documents (corresponding to
<xsl:result-document> in XSLT) and how it interacts with
document{} node constructors is on my TODO list. (If you have ideas, I'd be happy
to hear them.)
Implementation strategy
Carrot is being implemented by compilation to XSLT 2.0. Several
things are worth noting about this:
-
Each Carrot
module compiles to an XSLT 2.0 module.
-
Carrot can
include and import other Carrot modules or XSLT
modules.
-
Carrot can also
import XQuery modules, but since this is not supported directly in
XSLT 2.0, the semantics depend on your target XSLT processor (e.g.,
<saxon:import-query> in Saxon and <xdmp:import-module>
in MarkLogic Server)
Carrot is still in the process of being defined more formally. The
current strategy for defining and implementing Carrot is as
follows:
-
Create a BNF grammar for Carrot
-
Hand-convert the EBNF grammar for XQuery expressions to
BNF
-
Extend the
resulting BNF to support Carrot definitions and
expressions
-
Use yapp-xslt
to generate the Carrot parser from the Carrot BNF
-
Write a
compiler in XSLT 2.0 to convert parsed Carrot modules to XSLT 2.0
modules
The
syntax for other top-level constructs, such as namespace
declarations, serialization options, and parameter definitions are
still being worked out. Some mock-up examples can be found at the
project's home page: http://github.com/evanlenz/carrot
Future directions
Carrot is both a practical tool and a research project. I'm trying
to find the right balance between innovation and sticking to the
syntax and/or semantics of XPath, XSLT, and XQuery. I'm excited by
the future possibility of using XML-oriented scripting languages in
the browser, as made possible by projects like Saxon-CE
and XQIB. I'm convinced that
XSLT's syntax is an obstacle to mainstream adoption as a browser
scripting language. Carrot, or something like it, could help
overcome such obstacles.
As a
research project, the ideas at the heart of Carrot may possibly
influence the longer-term W3C work, as XQuery and XSLT continue to
move closer to each other. I'm already quite satisfied by the
composability that Carrot provides in contrast to XSLT. That said,
I'm always itching for more features in the XPath/XQuery/XSLT
triumvirate. As a sample, here are two.
Simple mapping operator
I
think XPath needs a "simple
mapping operator" that behaves similarly to "/" except without
its restrictions and special behavior with regard to node
sequences. This is one possible extension that could be added to
Carrot, without having to wait for XSLT/XQuery 3.0 (if it's
even being considered for inclusion).
Mode merging
Another more recent idea (which would be straightforward to
implement in Carrot) would be "mode merging."
In XSLT, a single template rule can declare
itself to be a part of more than one mode. However, a single call
to apply-templates cannot invoke rules in more than one mode. The
ability to merge modes would provide a static mode extension
mechanism, the chief benefit of course being that you wouldn't have
to go add a new mode to each template rule's list of modes (and in
the case when it's in the default mode, go add mode="#default new-mode"
to each
rule).
In
XSLT:
<xsl:apply-templates mode="foo bar"/>
In Carrot:
^foo|bar()
This
would be especially handy in multi-stage transformations where each
stage of processing makes an incremental change to its input, but
some stages need to handle things slightly differently, for
example, to avoid transforming an already-converted element more
than once. Mode merging would allow you to invoke statically
determined subsets and supersets of rules.
Underlying language
development
Finally, Carrot is a project that can grow with the languages it is
based on. As various features are added in XSLT/XQuery 3.0, such as
JSON support or the ability to apply templates to sequences of
atomic values, Carrot will (happily) be updated accordingly.