Introduction
Carrot combines the best that XQuery and XSLT have to offer:
-
the friendly syntax and composability of XQuery expressions, plus
-
the power and flexibility of template rules in XSLT.
Carrot can also be (loosely) thought of as an alternative, more composable syntax for XSLT.
Background and influences
Carrot is not the first XSLT-inspired project to provide a shorter syntax than XSLT itself. Syntax shorthands have included Paul Tchistopolskii's XSLScript, Sam Wilmott's RXSLT, and another project called XSLTXT. Although none of these projects provided direct inspiration for Carrot, they all address one of the same desires that Carrot addresses: being able to program in XSLT more concisely. However, unlike these projects, Carrot addresses more than XSLT's verbosity. It also addresses XSLT's limited composability. For example, in XSLT you can't include an element constructor in a path expression (like you can in XQuery and Carrot) or apply templates inside a path expression (which you can uniquely do in Carrot).
A more direct inspiration was James Clark's proposal for Unifying XSLT and XQuery element construction. Written during the early days of the W3C activity on XQuery, that proposal suggested that XQuery and XSLT language constructs could be used interchangeably if XQuery used an XML-based syntax (via a simple document element wrapper). As we now know, things didn't turn out that way. Carrot takes essentially the opposite approach. Rather than make XQuery use an XML-based syntax like XSLT's, make XSLT (Carrot, actually) use a non-XML-based syntax like XQuery's.
Carrot is also inspired by Haskell's syntax, which defines functions using pattern-matching and an equation-like syntax.
Introduction by example
Carrot is best understood by example. Here's an example of XSLT's syntax for a template rule (henceforth "rule"):
<xsl:template match="para"> <p> <xsl:apply-templates/> <p> </xsl:template>
In Carrot, you'd write the above rule like this:
^(para) := <p>{^()}</p>;
There are a few things to note about the above. To define a rule in Carrot, you use the same operator that XQuery uses for binding variables (:=). Everything on the right-hand side up to the semi-colon is an expression in Carrot. An expression in Carrot is simply an XQuery expression, plus some extensions. In this case, the expression is using the extended syntax for invoking rules:
^()
which is short for:
^(node())
just as:
<xsl:apply-templates/>
is short for:
<xsl:apply-templates select="node()"/>
All rules belong to a ruleset (equivalent to a "mode" in XSLT). The above examples use the unnamed ruleset (there's just one of these). Here's an example that belongs to a ruleset named "toc":
^toc(section) := <li>{ ^toc() }</li>;
The above is short for:
<xsl:template match="section" mode="toc"> <li> <xsl:apply-templates mode="toc"/> </li> </xsl:template>
Here's the identity transform in Carrot:
^(@*|node()) := copy{ ^(@*|node()) };
This recursively copies the input to the output, one node at a time.
Here's a Carrot script that creates an HTML document with dynamic content for its title and body, converting <para> elements in the input to <p> elements in the output:
^(/) := <html> <head> { /doc/title } </head> <body> { ^(/doc/para) } </body> </html>; ^(para) := <p>{ ^() }</p>;
As a comparison, here's what you'd have to write if you were using regular XSLT:
<xsl:transform version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <html> <head> <xsl:copy-of select="/doc/title"/> </head> <body> <xsl:apply-templates select="/doc/para"/> </body> </html> </xsl:template> <xsl:template match="para"> <p> <xsl:apply-templates/> </p> </xsl:template> </xsl:stylesheet>
Just as in XSLT, rules in Carrot can be associated with more than one mode. In XSLT, this template rule belongs to two modes:
<xsl:template mode="foo bar" match="bang"/>
Here's the equivalent rule in Carrot, belonging to two rulesets:
^foo|bar(bang) := ();
Carrot definitions
A Carrot module consists of a set of unordered definitions. Unlike XQuery, there is no distinction between main modules and library modules. Likewise, a Carrot module has no "body." Instead, there are only definitions. Carrot is more like XSLT in this regard. Also unlike XQuery, Carrot modules need not be associated with a namespace.
There are three kinds of definitions in Carrot:
-
global variables,
-
functions, and
-
rules.
Global variables
A global variable definition is very similar to a variable declaration in XQuery, except that you don't need the "declare variable" verbiage. Whereas in XQuery you would write:
declare variable $foo := "a string value";
In Carrot you would instead write:
$foo := "a string value";
Functions
A function definition is just like a function declaration in XQuery except that you don't need the "declare function" verbiage and, instead of curly braces, you use the same binding operator (:=) as a variable definition. For example, whereas in XQuery, you would declare functions like this:
declare function my:foo() { "return value" }; declare function my:bar($str as xs:string) as xs:string { upper-case($str) };
In Carrot, you would instead write:
my:foo() := "return value"; my:bar($str as xs:string) as xs:string := upper-case($str);
Why not just use the regular XQuery syntax? Two reasons: conciseness (lower signal-to-noise ratio) and consistency (with the other two types of definitions).
Rules
The third type of definition is a rule. This corresponds to a template rule in XSLT. For example, this rule matches any element node (*):
^foo(*) := "return value";
Unlike a function definition, the "argument" of a rule definition ("*" in the above case) is not an (optional) formal parameter list; instead it is a required pattern (as XSLT defines a pattern). Thus, it's illegal to have an empty set of parentheses in a rule definition:
^foo() := "return value"; (: NOT LEGAL :)
Note
the asymmetry with ruleset invocations, where it is legal to call ^foo()
, which is short
for ^foo(node())
.
Of course, rules can also have parameters (just as template rules can have parameters in XSLT). The syntax for declaring these is very similar to an XQuery function parameter list, except that it comes after the pattern and is separated from the pattern by a semicolon:
^foo(* ; $str as xs:string) := concat($str, .);
Carrot also supports tunnel parameters, as in XSLT. To indicate a tunnel parameter, you add the keyword "tunnel" before the parameter:
^foo(* ; tunnel $str as xs:string) := concat($str, .);
Unlike XQuery functions, parameters in a rule are identified by name, not position. Thus the syntax for passing them looks very similar to how they are declared, and the order of parameters is insignificant. The following expression applies the "foo" ruleset to the context node, passing the tunnel parameter $str with the value "Hello":
^foo(. ; tunnel $str := "Hello")
What about conflict resolution among multiple matching rules? Carrot behaves the same as XSLT: rules with higher import precedence win, followed by rules with higher priority. Default priority is based on the syntax of the pattern, just as in XSLT. You can also specify the priority explicitly (right before the binding operator :=), as in the first rule of this example, which explicitly sets the priority to 1:
^author-listing( author[1] ) 1 := ^(); ^author-listing( author ) := ", " , ^(); ^author-listing( author[last()] ) := " and " , ^();
Carrot expressions
The right-hand side of a Carrot definition, whether it be a variable, function, or rule, is a Carrot expression. The context for the expression evaluation is the same as it is for sequence constructors within a template rule in XSLT. For example, the context node is the node matched by the rule's pattern.
A Carrot expression is an XQuery expression with some extensions:
-
ruleset invocations —
^mode(nodes)
-
shallow
copy{…}
constructors -
text node literals —
`my text node`
Let's look at each of these extensions in turn and the rationale behind each one.
Ruleset invocations
Ruleset invocations (i.e., "apply-templates" in XSLT) are largely Carrot's raison d'etre. They are not possible in XQuery; thus, the extension is required. Not only that, but XSLT can't invoke rules (apply templates) in an expression either. In Carrot, all definitions are bound to an expression, so the only way to "do" anything is to write an expression. (Unlike XSLT, Carrot does not make a distinction between "instructions" and "expressions"; everything is an expression.)
Shallow copy constructors
Shallow copy constructors are possible in XSLT but not XQuery. The difference between a copy constructor and using an XQuery element constructor is that, in the latter case, the namespace context comes from the query rather than the source document. XQuery allows you to perform deep element copies from the source document, but not shallow copies. Without this ability, modified identity transforms are impractical in XQuery. The semantics of Carrot's copy constructor are essentially the same as XSLT's <xsl:copy> instruction. For example, when the context node is not an element node, it behaves the same as if a deep copy were being performed.
Note
XSLT 2.1/3.0 promises to add a "select" attribute to <xsl:copy> to make it convenient
to perform a shallow copy of a node other than the context node. This is largely unnecessary
in Carrot, since copy constructors can be easily composed within an expression, making
it convenient to write, for example, foo/copy{…}
.
Text node literals
Carrot also adds text node literals, using the back-tick (`) for
the delimiter. This extension may at first seem to be of minimal
value, since XQuery already allows you to construct text nodes
using text{…}
, and strings using quotes (or apostrophes).
However, in practice, text node literals will often be the
preferred syntax, as the following examples should make clear.
Consider the following template rules in XSLT:
<xsl:template mode="file-name" match="doc">doc</xsl:template> <xsl:template mode="file-ext" match="doc">.xml</xsl:template> <xsl:template match="/doc"> <result> <xsl:apply-templates mode="file-name" select="."/> <xsl:apply-templates mode="file-ext" select="."/> </result> </xsl:template>
In Carrot, you might naturally rewrite the above as follows:
^file-name(doc) := "doc"; ^file-ext (doc) := ".xml"; ^(/doc) := <result>{ ^file-name(.), ^file-ext(.) }</result>
The problem is that this will produce an undesired result:
<result>doc .xml</result>
The extra space results because of the way in which sequences of atomic values are combined to make a text node in XQuery. Contiguous sequences of text nodes, on the other hand, are merged together without any intervening spaces, so you could fix things by using explicit text node constructors:
^file-name(doc) := text{"doc"}; ^file-ext (doc) := text{".xml"};
The
problem here is that it may be an edge case with a large syntactic
cost if you want to cover your bases (six extra characters for
every text node). If in 90% of cases, using a string will result in
the exact same behavior as if you had used a text node, you will be
strongly tempted as a user to use quotes instead of text{…}
everywhere. However, you will get bugs in the remaining 10% of your
code because of the way sequences of strings are concatenated to
make a text node in XQuery.
Whereas it's more verbose in XQuery to construct a text node (using
text{…}
) than it is to return a string (using quotes), it's more verbose in
XSLT to return a string (using <xsl:sequence>) than it is to
return a text node (using a literal text node in the stylesheet). Text node literals
in Carrot address this imbalance by
making it equally convenient to create text nodes and strings.
Thus, we naturally rewrite our Carrot definitions to get the
desired result, without having to think about whether this is an
edge case or not:
^file-name(doc) := `doc`; ^file-ext (doc) := `.xml`;
The existence of text node literals makes it easy to follow a simple rule: use text node literals when you are constructing part of a result document; use string literals when you know you want to return a string.
Expression semantics
Expressions in Carrot, unless otherwise noted here, are assumed to have the same semantics as in XQuery. Carrot operates on exactly the same data model as XQuery 1.0 and XPath 2.0.
One exception is that namespace attribute declarations on element constructors in Carrot do not affect the default element namespace for XPath expressions. Carrot is more like XSLT in this regard, in that it makes a distinction between the default namespace for input documents and the default namespace for output documents ("xpath-default-namespace" in XSLT), thereby correcting what is arguably a design bug in XQuery.
What about xsl:for-each, xsl:for-each-group, etc.?
Given
that XQuery expressions do not include everything that it's
possible to do in an XSLT template rule, that begs the question:
What do all the XSLT instructions get mapped to in Carrot? In many
cases, Carrot simply does not have an analogue. In some cases,
that's because XQuery already provides a different way to achieve
the same use case. For example, <xsl:for-each> does not have
a direct analogue in Carrot. For iteration over a sequence, you can
use "for" expressions, or even just "/" when applicable. The
following Carrot (and XQuery) expression constructs a new
<bar> element for each <foo> element, rendering
<xsl:for-each> unnecessary for this case: foo/<bar/>
. Similarly,
Carrot does not support <xsl:sort>. For sorting sequences in
Carrot, you would instead use "order by", as in XQuery. Local
variables are defined using "let" expressions. Etc.
The biggest area not currently addressed by Carrot—and which remains an open question—is how to perform grouping. There are a few answers to this question, not all mutually exclusive:
-
Extend Carrot to support grouping.
-
Import an XSLT 2.0 stylesheet when you need grouping.
-
Wait for grouping to be added to XQuery 3.0 expressions and use those.
At this stage, the operative answers to this question are #2 and #3.
Designing support for multiple output documents (corresponding to <xsl:result-document> in XSLT) and how it interacts with document{} node constructors is on my TODO list. (If you have ideas, I'd be happy to hear them.)
Implementation strategy
Carrot is being implemented by compilation to XSLT 2.0. Several things are worth noting about this:
-
Each Carrot module compiles to an XSLT 2.0 module.
-
Carrot can include and import other Carrot modules or XSLT modules.
-
Carrot can also import XQuery modules, but since this is not supported directly in XSLT 2.0, the semantics depend on your target XSLT processor (e.g., <saxon:import-query> in Saxon and <xdmp:import-module> in MarkLogic Server)
Carrot is still in the process of being defined more formally. The current strategy for defining and implementing Carrot is as follows:
-
Create a BNF grammar for Carrot
-
Hand-convert the EBNF grammar for XQuery expressions to BNF
-
Extend the resulting BNF to support Carrot definitions and expressions
-
-
Use yapp-xslt to generate the Carrot parser from the Carrot BNF
-
Write a compiler in XSLT 2.0 to convert parsed Carrot modules to XSLT 2.0 modules
The syntax for other top-level constructs, such as namespace declarations, serialization options, and parameter definitions are still being worked out. Some mock-up examples can be found at the project's home page: http://github.com/evanlenz/carrot
Future directions
Carrot is both a practical tool and a research project. I'm trying to find the right balance between innovation and sticking to the syntax and/or semantics of XPath, XSLT, and XQuery. I'm excited by the future possibility of using XML-oriented scripting languages in the browser, as made possible by projects like Saxon-CE and XQIB. I'm convinced that XSLT's syntax is an obstacle to mainstream adoption as a browser scripting language. Carrot, or something like it, could help overcome such obstacles.
As a research project, the ideas at the heart of Carrot may possibly influence the longer-term W3C work, as XQuery and XSLT continue to move closer to each other. I'm already quite satisfied by the composability that Carrot provides in contrast to XSLT. That said, I'm always itching for more features in the XPath/XQuery/XSLT triumvirate. As a sample, here are two.
Simple mapping operator
I think XPath needs a "simple mapping operator" that behaves similarly to "/" except without its restrictions and special behavior with regard to node sequences. This is one possible extension that could be added to Carrot, without having to wait for XSLT/XQuery 3.0 (if it's even being considered for inclusion).
Mode merging
Another more recent idea (which would be straightforward to implement in Carrot) would be "mode merging."
In XSLT, a single template rule can declare
itself to be a part of more than one mode. However, a single call
to apply-templates cannot invoke rules in more than one mode. The
ability to merge modes would provide a static mode extension
mechanism, the chief benefit of course being that you wouldn't have
to go add a new mode to each template rule's list of modes (and in
the case when it's in the default mode, go add mode="#default new-mode"
to each
rule).
In XSLT:
<xsl:apply-templates mode="foo bar"/>
In Carrot:
^foo|bar()
This would be especially handy in multi-stage transformations where each stage of processing makes an incremental change to its input, but some stages need to handle things slightly differently, for example, to avoid transforming an already-converted element more than once. Mode merging would allow you to invoke statically determined subsets and supersets of rules.
Underlying language development
Finally, Carrot is a project that can grow with the languages it is based on. As various features are added in XSLT/XQuery 3.0, such as JSON support or the ability to apply templates to sequences of atomic values, Carrot will (happily) be updated accordingly.