Introduction
This paper tells three tales on a common theme: things motivated by one purpose are sometimes useful for a different purpose. When you have a software problem, you might suddenly realize that an obscure function you read about but have never used could be part of the solution. The other theme among the three tales is that they involve testing XML-related software. Two testing situations involve testing XSLT or XQuery using XSpec, while the third involves testing a schema using XSpec or BaseX. XSpec is an open-source software product for testing XSLT, XQuery, and Schematron code [XS]. BaseX is an open-source XQuery processor that includes functions and annotations for unit testing [B, UM].
If you use XSpec or BaseX for testing, you might be interested in the specific problems and solutions in these tales. Even if you don’t use those testing frameworks, you might find broader lessons that you can apply elsewhere.
Tale 1: The Cupboard Was Bare
The first tale is about XSpec tests for XSLT or XQuery code where a sequence is unexpectedly empty. Perhaps your test code tries to do something with the first paragraph of the third section of a document, but the document has only two sections. Empty sequences are a normal thing in XPath and the XML applications that use XPath, so you don’t necessarily get any feedback that the sought-after paragraph’s parent is nonexistent.
A sequence that is empty by mistake can lead to two categories of testing problems:
-
Tests pass despite the mistake, but they might not be doing what you think they’re doing. Tests that silently neglect to serve their core purpose give you a false sense of security about the health and coverage of the code you’re testing.
-
Tests fail due to the mistake, and the failures might be confusing for you to troubleshoot until you discover the mistake. This phenomenon is not as bad as the first one, but the confusion wastes your time.
Unexpectedly empty sequences can lead to unexpected results in non-testing situations, too, so the lesson we will learn about making assumptions explicit can be applied in other code where you use XPath.
Example of (A), Tests Passing Despite Mistake
Consider an XSpec scenario that calls an XSLT or XQuery function that produces a topic
document having sections and paragraphs. XSpec stores the document in a variable named
x:result
. Suppose you expect at least three sections, and you want to verify
that the third section does not contain any paragraphs. You can use XSpec syntax like
one of
these two <x:expect>
elements:
The first syntax expresses a true/false condition using the
empty
function, and the verification passes if
the condition is true. The second syntax uses the
test
attribute to filter the document down to a
hypothetical paragraph in the third section of the topic and
verifies that the filtering didn’t find anything—that is, the
path expression led to the empty sequence that equals the
select
attribute.
When constructing these <x:expect>
elements, presumably you thought
$x:result/topic/section[3]
would return at least one section. The label
certainly describes the third section
as if it exists. If it doesn’t exist,
the verifications pass for the wrong reason, as follows:
-
The path expression,
$x:result/topic/section[3]
returns an empty sequence. -
The longer expression,
$x:result/topic/section[3]/para
also returns an empty sequence. -
The
<x:expect>
elements pass when you run the test. -
You receive no feedback that existence of the third section was a false assumption.
-
You haven’t accomplished the objective of this test, which was to verify a characteristic of a section you thought existed.
Maybe the test really should have used a path like
$x:result/topic/section[2]/para
or
$x:result/topic/descendant::section[3]/para
, or a different value of the
input argument to the function being tested. Maybe the XSLT or XQuery code has a bug
that
causes the third section to be missing. In any case, the lack of feedback makes it
harder to
discover the mistake.
Example of (B), Mistake Hindering Failure Investigation
Working with the actual result of a test scenario is not the only place where path
expressions can have mistakes. Consider an XSpec scenario for testing an XSLT or XQuery
function named f:proto-list
that produces list markup with one list item for
each node in an input parameter. The scenario loads an XML document from an external
file
named test-document.xml
and uses an XPath expression to select nodes as the
value of the parameter.
Suppose the document in test-document.xml
has no <table>
elements having the specified path, such as because the tables are all in subsections
rather
than first-level sections. In this case, the function parameter named items
is
an empty sequence, and the test probably fails. While troubleshooting, you might spend
a lot
of time investigating the XSLT or XQuery code in the function before realizing that
the
problem is either in the test’s <x:param>
element or in the test
document.
In Search of Prevention
The XSpec vocabulary is fairly small and does not have dedicated features to alert you when a path expression you provide evaluates to an empty sequence. Using the XSpec vocabulary, you have these options for detecting such an empty sequence:
-
Insert extra
<x:expect>
elements to verify that a path evaluates to something non-empty, such as the following code to augment Figure 1. However, the extra verification leads to repetition or clutter in the test scenario.<x:expect label="Confirm that there is a third section" test="exists($x:result/topic/section[3])"/>
-
Use the
as
attribute on some XSpec element, to declare a data type that can’t match an empty sequence (e.g.,as="element()+"
). For example, the<x:param>
element below can replace the one in Figure 2.<x:param name="items" href="test-document.xml" select="topic/section/table" as="element(table)+"/>
Declaring data types is generally a good idea. However, to attach an
as
attribute to an intermediate result, you’d have to declare it as a separate variable. Maybe that’s a good idea, too, or maybe it seems like overkill.
What if you want an unobtrusive way to ensure that an intermediate result isn’t empty, without adding extra elements to the test or extra rows in the test report?
XPath Syntax Options
It turns out that XPath already has concise syntaxes that provide a way to say,
Alert me when this path evaluates to an empty sequence.
You can slip them
into any XPath expression that you use in XSpec. You can even use them with an intermediate
result like a portion of a path expression, without having to define a separate XSpec
variable.
-
The
one-or-more
function is a pass-through function for a non-empty input sequence but issues an error message for an empty input sequence. -
The
exactly-one
function works the same way but is stricter in the way its name implies. You can use it where you expect a sequence of one item instead of zero or multiple items. -
A
treat
expression asserts that a sequence has a certain data type. Although the syntax is often less readable than the use of the functions mentioned above, an advantage of this expression is that it lets you express more than just cardinality.
Outside XSpec, these syntaxes are useful when an XPath processor performs static type checking and you want to promise the processor up-front that a sequence will satisfy a certain data type condition at run time. Making a strict processor relax its vigilance is a use case that comes up in the specification [KXP] and in authoritative books like those by Michael Kay [K2] and Priscilla Walmsley [W].
In the XSpec situation that this tale is about, avoiding a
static type error from a strict processor is not what’s going
on. Instead, it’s the other way around, where you want a lax
processor to be vigilant and tell you when something doesn’t
have the cardinality or data type you expect. Some references
for these syntaxes, such as [K2], mention this
other use case, and you might encounter non-testing situations
where these syntaxes help you construct XPath expressions that
alert you to severe missing-data problems. I learned about the
testing value of these syntaxes from the XSpec lead developer,
GitHub user AirQuick
.
XPath Syntax in the XSpec Examples
Returning to the earlier code examples, you can insert a call to the
exactly-one
function to make sure you find out if the third section fails to
exist:
Also, you can insert a call to the one-or-more
function to make sure you
find out if the function parameter turns out to be empty:
With these changes, if the exactly-one
or
one-or-more
function produces an error message
when the test runs, the error gives you clear and valuable
information that addresses the two categories of testing
problems in this tale.
Tale 2: Good Functions Make Good Neighbors
The second tale is about XSpec tests for XSLT code where you
want to test a lot of tiny contexts that each produce a tiny
result, and you don’t want the contexts and results to be dwarfed
by the amount of test code overhead. An example I came across in a
real project [US] was a set of three template
rules in a certain mode. Each template matched an attribute node
and produced an attribute node. Working together, these templates
determined the value of the resulting (i.e., output) attribute,
based on the value of the matching (i.e., input) attribute. The
template shown in Figure 5 had the most
general match
attribute of the three templates. This
template acted as a fallback, mapping 11 distinct input values to
the same output.[1]
What are some ways to test 11 miniature mappings? The following simple XSpec scenario for one of the 11 mappings uses eight lines:
Doing the same thing 11 times uses 88 lines, which seems like a lot of code for something
so simple. XSpec supports reusing scenarios either verbatim or with variable substitutions,
and using those techniques can reduce the code from 88 lines to 71 or 52 lines, respectively.
Here’s how the variable-substitution idea would look, where you would have 11 scenarios
like
the first one below, and they would all reuse the second scenario. (Assume the prefix
v
is bound to some user-defined namespace URI for test-specific
variables.)
Even 52 lines might seem too long.
Compact, Scalable Test with Neighboring Contexts
We can reduce the test code to one 19-line scenario, by putting all 11 input items
in a
single <x:context>
element, as follows:
In this scenario, the context is a sequence of 11 as-type
attribute nodes,
because the select="/*/@as-type"
attribute is selecting the attribute nodes
from the child elements of <x:context>
. XSpec loops over the 11 nodes and
applies templates to each. XSpec gathers the actual results together into a sequence
of 11
in-json
attributes, and that sequence is what the <x:expect>
element uses for verification. If verification fails, the report would show the full
sequences of actual and expected results, and we would need to determine which of
the 11
comparisons failed; that extra labor is one reason I do not use multiple-item contexts
in
that many situations.
In this case, the expected value is identical for all the context nodes; it’s an
in-json="string"
attribute node. The <x:expect>
provides this
node in one child element and uses select="for $i in ($x:context) return
/*/@in-json"
to replicate the attribute node once per context item. The notation
$x:context
refers to a variable that XSpec populates with the 11-item
context. (In XPath 4.0, an alternative could be
select="replicate(/*/@in-json,count($x:context))"
.)
Now the scenario is fairly compact and would scale well if we needed to add a few more input values. This tale isn’t complete, though, because of an aspect of the scenario that might be problematic or seem philosophically questionable: the potential for the different items in the context to interfere with each other as the XSLT code accesses parts of the tree. Having the different items mingling in the XSpec markup makes the test code concise, but having them mingle when the XSLT runs is not desirable.
What makes interference during the XSLT execution a possibility is that the context is a sequence of 11 attribute nodes within a tree that includes all 11 elements as siblings of each other. To see evidence of this tree relationship, add the following message to the XSLT template and watch the console output from running the test.
Preventing Interference Among Contexts
How might a test make the attributes isolated, if they don’t start that way? XSpec lacks syntax for constructing new attribute nodes. XSpec does support helper functionality, such as an XSLT function that takes an attribute within a tree and creates a new, isolated attribute node.
However, a solution is even easier than that! In the XSLT 3.0 specification, the
Streaming
section lists functions named copy-of
and
snapshot
that isolate parts of trees. While snapshot
preserves
the subtree’s ancestors and their attributes, copy-of
returns the subtree only.
Neither function includes siblings in its output. These functions are useful during
stream
processing, which imposes restrictions on access to nodes of a tree. Buffering a copy
of a
subtree in memory, with or without the ancestry of the subtree, enables freer access
to it.
The specification notes that each of these two functions is available for use (and is
primarily intended for use) when a source document is processed using streaming. It
can
also be used when not streaming
[K3].
This XSpec situation does not involve stream processing, and the two functions are
useful not because of access restrictions or memory usage but for isolation and test
cleanliness. If the <x:context>
element in Figure 8 changes select="/*/@as-type"
to select="/*/@as-type/copy-of()"
,
the XSLT code sees isolated attributes instead of attributes of elements in a tree.
The
console messages from running the test no longer show the name of the element or its
preceding sibling’s attribute value, because there is no element and hence no sibling
element.
As a variation, changing select="/*/@as-type"
to
select="/*/@as-type/snapshot()"
causes the XSLT code to see attributes that
are attached to elements, but each element has no siblings. The console messages from
running the test show the name of the element but not a preceding sibling’s attribute
value,
because there is no sibling element.
The point is that if siblings affected the XSLT template’s behavior in a way that
could
affect the test, using copy-of
or snapshot
in the XSpec code would
take those siblings out of the view of the XSLT template. As a result, a test author
would
be able to write a compact scenario having a multiple-item context while preventing
interference among the different items.
Tale 3: Schema and Variations
The third tale is about testing a schema (say, Schematron, RelaxNG, or XSD) with easier maintenance of the documents that support the tests. One way to test a schema is to validate a series of valid and invalid documents, and check that the validation results are what you expect. The valid and invalid documents might be related to each other. I like to test with invalid documents that are close to valid ones because such pairs help confirm where the boundary of validity is in the schema.
If you follow that approach, you have a set of documents and a set of variations that make invalid documents into valid ones or vice versa. To ease maintenance, it would be nice to derive the variation documents programmatically from the originals, instead of doing manual copy-and-modify operations and then maintaining all the documents independently. Of course, the code that derives variations programmatically is something to maintain, so you want that code to be easier to maintain than the variations as independent documents.
You can certainly write some XSLT or XQuery code to create minor variations of documents. For instance, you can start with an identity transform and implement a system for specifying and then creating the variations you want. However, you don’t have to start from scratch, because there is already a standard way to create minor variations of documents. The XQuery Update Facility standard provides expressions that insert, delete, replace, and rename nodes. In addition to supporting that standard, the BaseX XQuery processor offers its own convenience operator for making updates with a streamlined syntax.
In the XQuery Update Facility 1.0 Requirements document [C], the first usage scenario is about updating persistent storage like a database. Other usage scenarios describe updates in the literal sense of bringing something up to date by adding new information or refreshing a status. While the requirements are not limited to time-oriented updates, newness is prominent in the descriptions. The testing usage in this tale originates from a different mindset. When creating variants of a document for schema testing, the point is not that a variant has fresher content but rather that it serves a different testing purpose compared to the original document. The variant is not necessarily better, and it’s up to you whether to programmatically produce an invalid variant from a valid document or a valid variant from an invalid document. You might pick a consistent direction of operation for your entire test suite or decide per document which direction is simpler to code.
Creating the Variations
Here are two examples of XQuery Update code for creating
variants as persistent files, where the file:write
and file:base-dir
functions are specific to
BaseX.
Both examples use this three-step procedure:
-
Read the original document using
doc()
, and keep a copy in memory. -
Modify the copy in memory, using a sequence of one or more expressions from the XQuery Update vocabulary. If the invalid documents are nearly valid by design, these expressions are likely to be simple and few. The first example renames
<d:code>
elements as<d:literal>
, assuming the query contains a namespace declaration that binds thed
prefix to the namespace URI that the document from step 1 also uses. The second example moves a<d:abstract>
element, by copying it to some location and deleting the original. -
Write the result to a file different from the original file. Unlike some applications of XQuery Update, this situation does not modify the original file in place.
If you don’t mind using even more BaseX-specific functionality, you can streamline
the
syntax using the BaseX convenience operator, update
[XB]. The
three steps are the same, but they look a bit different in this syntax. Here is how
the
expressions above look using the update
operator:
Using the Variations
If you have a module of functions that each use this three-step procedure, you can run all the functions in the module to create all the variations you need. With the variation files in hand, you are ready to run validation tests against the set of original documents and generated documents. The validation tests themselves can use whatever testing functionality you like, such as the Schematron support in XSpec or the BaseX modules for validation and unit testing.
Advantages of separating the variant creation from the validation testing include:
-
Variant files in the file system are easy to inspect while you are fine-tuning your XQuery Update expressions and easy to validate manually while troubleshooting test failures.
-
You can create variant documents using one processor and run validation tests using a different one. For example, you can use BaseX to create variant documents and then use Saxon to run XSpec tests for a Schematron schema. (Saxon supports XQuery Update but requires an enterprise license.)
On the other hand, disadvantages include:
-
There are two processes to manage, and they must run sequentially.
-
The variant creation process needs write permission to create the files wherever the
If you think the disadvantages outweigh the advantages, you can combine the XQuery Update expressions with the validation tests themselves. Here, we show an example that illustrates the combination approach, using two BaseX modules: validation and unit testing.
First, we declare namespaces and a global variable that stores the path to a RelaxNG schema.
Although it is not required, we find it useful to define
helper functions that perform repeated tasks. These helper
functions validate documents using the
validate:rng-info
function [V] and make test
assertions using the unit:assert
function [UM]. The
validate:rng-info
function returns a string
sequence containing warnings and errors, if any. The test
assertion hinges on whether this string sequence is empty (that
is, the document is valid) or non-empty. In the
unit:assert
element, the second parameter is
output in case of a test failure, so it should be something that
helps you investigate the failure.
If you start with a valid document and use XQuery Update to make it invalid, your test functions look like the following pair. The second function in the pair uses XQuery Update code identical to Figure 13. Each function uses one of the helper functions defined above.
Going in the other direction, if you start with an invalid document and use XQuery Update to make it valid, your test functions look like the following. The second function in the pair uses XQuery Update code identical to Figure 14.
If you run the test module in BaseX, the result looks like this:
If you cause deliberate test failures by interchanging
mytest:expect-invalid
with
mytest:expect-valid
, the failures produce
<failure>
elements like the following. The
line and column numbers point to the helper functions (not
extremely useful when all the functions use the same helper
functions), while the child <info>
elements
provide some useful information.
Conclusion
Features made for static type checking, streaming, and data
freshening can play an unexpected role in testing. We saw how to
say it in code when nonemptiness of a sequence is important, using
one-or-more
, exactly-one
, and
treat
expressions. We saw how to isolate a subtree
from interfering siblings and ancestors, using the
snapshot
and copy-of
functions. As for
XQuery Update, while it's not a one-function-call solution, Tale 3
encourages us to look beyond terminology like
update
that might make it harder to see when a
feature is a good fit in a non-primary use case.
References
[B] BaseX, https://basex.org/
[C] Chamberlin, Don and Jonathan Robie, Eds. XQuery Update Facility 1.0 Requirements, W3C Working Group Note 25 January 2011, https://www.w3.org/TR/xquery-update-10-requirements/
[K2] Kay, Michael. XSLT 2.0 and XPath 2.0 Programmer’s Reference, 4th Edition. Wiley: Indianapolis, IN, 2008.
[K3] Kay, Michael, Editor. XSL Transformations (XSLT) Version 3.0, https://www.w3.org/TR/xslt-30/
[KXP] Kay, Michael, Editor. XPath and XQuery Functions and Operators 3.1, https://www.w3.org/TR/xpath-functions-31/
[UM] Unit Functions,
BaseX documentation,
https://docs.basex.org/main/Unit_Functions
[US] US NIST metaschema-xslt repository, pull request 87, https://github.com/usnistgov/metaschema-xslt/pull/87/files. License: https://creativecommons.org/publicdomain/zero/1.0/
[XB] Updates,
BaseX documentation, https://docs.basex.org/main/Updates
[V] Validation Functions,
BaseX documentation,
https://docs.basex.org/main/Validation_Functions
[W] Walmsley, Priscilla. XQuery, 2nd Edition. O’Reilly: Sebastopol, CA, 2015.