Introduction
REST is fast becoming the standard method by which different software systems interact with one another. As such, Rackspace produces a large number of RESTful services both for both internal and public consumption. Because these RESTful services act as integration points, they must be documented accurately and at a very high level of detail. As a REST service is being developed, the documentation must be kept up to date, and the accuracy of the documentation as it relates to the actual implementation of the service must be constantly validated. That said, once the REST service is released the documentation becomes a fixed contract. At this time, there exists the possibility that an implementation my inadvertently drift from what is described in the documentation and this may cause clients to break. Also, when dealing with open source services, other implementations of the same service may be developed by third parties, and it's important for these implementation to behave according to the rules specified by the documentation as well in order for clients to interact with multiple implementations without issue. The process of ensuring conformance to the documentation is complicated, tedious, and error prone.
To automate the process, we had the idea to use our existing XML documentation pipeline based on XProc, XSLT 2.0, DocBook and WADL, which already produced HTML and PDF documentation artifacts, to generate highly efficient validators as well. These validators can be used to check a RESTful service (and its clients) for conformance to the documentation at runtime in a very efficient manner. The validators are capable of using an XSD 1.1 implementation (either Xerces or Saxon) to validate not just the content of a request, but also to perform checks on URI templates and query parameters. The process illustrates the power of single sourcing, in that the same source that is used to produce human readable artifacts is also used to produce machine readable validation instructions. As a result, from the same artifact, we are able to:
-
document APIs consistently and efficiently.
-
produce different documentation artifacts such as user guides, specifications, and quick reference web pages.
-
validate the accuracy of our API implementations and their clients.
-
help describe the calls a particular user is authorized to make in an API.
The purpose of this paper is to describe our REST documentation pipeline and discuss how it was adapted to produce REST validators. We give a detailed overview of the transformation process, including a description of some of the optimizations performed to achieve efficient real time validation. We also describe some of the challenges we've encountered -- including testing the documentation pipeline.
Background
About REST
REST (REpresentational State Transfer) is an "architectural style" described by Roy Fielding in his PhD dissertation Fielding2000. REST seeks to adapt the architecture of the Web, along with its scalability, performance, and other advantages, as a platform for any kind of application. In REST, as on the Web, a resource is an item of interest. Each resource has an address (URI) and one or more representations (a file with a given media type). A client can interact with a resource through a URL and these interactions in turn act as the engine of application state because the representations provide hypermedia links offering alternative directions the client can take to proceed through the workflow. The RESTful ideal is that just as you interact with a Web site and choose which step to take next, without being frustrated that the layout of the page and links offered have changed since the last time you visited the site, so a client using a RESTful API can pick its way through a workflow without demanding a fixed, brittle contract. Likewise, the Web's mature and well-understood caching mechanisms can improve the performance and scalability of your application, protecting the server from spikes in traffic and the client from brief interruptions in the server's availability.
While many APIs describe themselves as RESTful, some RESTful APIs are more RESTful than others. At the most rudimentary level, APIs may employ one or more URIs, but still use only one method (GET, POST) to tunnel their requests over HTTP and do not represent a radical break from WS-* services. APIs that embrace the RESTful architecture more fully offer a larger number of resources, each with a URI, and map a variety of HTTP verbs to the Create Read Update Delete (CRUD) operations found in most applications. In these services, parameters are passed in with requests either as elements of the URI or as query parameters appended to the URI. The Web's caching infrastructure also comes into play. Finally, in an ideal more often discussed than achieved, services that fully embrace the idea of hypermedia as the engine of application state (HATEOS), relying on the client to navigate through the workflow in the same way an end-user at an e-commerce site would, by inspecting the available links and picking the appropriate one.
About Rackspace and OpenStack Developer Documentation
Rackspace is a hosting company that found itself well positioned to play a significant role in the sea change from traditional hosting to cloud computing started by Amazon Web Services. The Rackspace strategy involves creating an open source alternative to the closed, proprietary world of vendor lock-in that AWS represents. To that end, Rackspace partnered with NASA to create a collection of open source cloud software, called OpenStack. Since its launch in 2010, OpenStack has enjoyed rapid growth as an open source project with hundreds of companies and individuals participating. Since much of the code that we develop is contributed to OpenStack, we needed an approach to documentation that facilitated interchange and collaboration. DocBook and the XSLT stylesheets from the DocBook Open Repository provide the basis for our documentation tool chain.
Our documentation pipeline contains the typical components: a schema, authoring environment, source control system, build system, and hosted artifacts. In particular, we use:
-
A variant of the DocBook 5.0 schema, called RackBook.
-
A Java Web Start version of the oXygen XML editor with a custom framework to support RackBook and our specific needs. Contributors are free to use any text editor, but the customized version of oXygen provides many convenience features.
-
Customizations on top of the DocBook XSLT stylesheets to produce pdf and HTML output branded for Rackspace, OpenStack, or other partners as necessary.
-
An extended version of the Docbkx Maven plugin to manage build-time dependencies and perform the build logic to generate artifacts. We've extended Docbkx to incorporate Calabash so that we can preprocess our source code with XProc pipelines. Because OpenStack also relies on this tool, we have open sourced our version of the Maven plugin (clouddocs-maven-plugin).
-
We store our source code in a internal git repositories and schedule builds using Jenkins jobs.
Why WADL?
We evaluated frameworks that provide test consoles and some level of documentation for RESTful APIs like Apigee, Mashery, and Swagger, but found each lacking one or another important feature. We ultimately decided to leverage our DocBook-based tool chain to create human and machine readable descriptions of the APIs from the same source. The human readable artifacts are the documentation, and the machine readable WADL would facilitate the run-time validation of API calls in the via Open Repose, our open source RESTful proxy. There are a number of reasons why we chose WADL over using an existing alternative description languages or designing our DSL:
-
The WADL vocabulary includes a number of features that facilitate authoring and content reuse.
-
Since WADL is XML, it fit easily into our existing DocBook pipeline.
-
The format is extensible and thus it is simple to add support for new features and niche use-cases.
-
The format is grammar agnostic. This means that a grammar can be specified in XML Schema for XML media types and JSON Schema for JSON media types.
-
Additionally, grammars can be used not just to make assertions about the content of HTTP payloads as a whole, but also to make assertions about a particular subset of it.
-
Grammars can also be used to make assertions over all aspects of an HTTP request or response: template parameters, matrix parameters, query parameters and headers.
-
WADL is specific to the HTTP protocol and captures all of that protocol's features. As such, it's a perfect markup language for semantically describing RESTful services.
Flexibility
One of the nice features of WADL is that it has
a very flexible schema especially when associating
URI paths to resources. For example, suppose we
have a very sparse API that allows a
GET
and DELETE
operation to occur only at the URI
https://test.api.openstack.com/path/to/my/resource
.
This can be represented in a WADL in the following
manner:
Note that each individual path segment in the
URI is represented by a
<resource>
element. While
the approach would work well in cases where the
API has a complex structure with many resources,
it's overkill for this example. Instead, we can
represent the API like this:
Here, a <resource>
element
covers multiple segments in the path all at once,
in this case path/to/my/resource
.
Given the sparse API, this is far more convenient.
The WADL need not be entirely written in the form
illustrated in Example 1 (tree
form) or in the form in Example 2
(flat form). A WADL can contain resources that are
simultaneously in both forms (mixed form) as
illustrated here:
In Example 3 we have two
<resource>
elements, one
for path/to/my
and another for
resource
. The ability to intermix
flat and tree forms, allows the WADL author the
flexibility to start with a simple API description
and expand on it as the API grows in complexity.
Content Reuse
WADL contains a number of features that
encourages content reuse. First,
<method>
,
<representation>
, and
<param>
elements can be
specified separately from individual resources and
therefore can be shared between them. For example
suppose that you have two resources
widgets
and gadgets
both of these resources contain a method to access
metadata. You can repeat the method definition in
both resources as illustrated in Example 4.
Repeating the method however can be error prone and tedious. Instead, the method can be written once and referenced from the individual resource as illustrated in Example 5.
Note that the method is referred to by its
id
and the hash (#) is used to
denote the internal link. It's possible that
multiple related methods can be shared between
resources. One can express multiple methods
together, as in Example 6,
however this too can get tedious an error prone.
To alleviate this issue, WADL defines the
concept of a resource_type
. A
resource_type
contains common
behavior that can be shared between multiple
resources. This is illustrated in Example 7.
Realistically, resource_type
s may
capture many different methods. They may also
introduce common sub-resources as illustrated in
Example 8.
Note that <method>
,
<representation>
,
<param>
, and
<resource_type>
elements
need not appear in the same WADL, they may be
linked in from an external WADL as illustrated
below.
Finally, note that multiple
resource_type
s may be associated
with a single resource.
Inline Documentation
All elements defined by the WADL specification
can be associated with inline documentation via
the <doc>
element. A simple
illustration of this is shown in the example
below.
Note that Example 11 uses the XHTML
grammar for narrative text. This is not a
requirement, different narrative text grammars can
be used. Also note the used of the
xml:lang
attribute, multiple
<doc>
elements can be used
each in a different language to aid in the
internationalization of the WADL. The
title
element can be used to give
an overview of the documentation text.
Grammar Agnostic Assertions
WADLs contain a <grammars>
element that allows the association of grammars
such as XML Schema with a REST API. Grammars may
be included by means of the
<include>
element as
illustrated in Example 10 or they
may be written inline as shown in Example 12.
The WADL specification does not encourage the use of a specific schema grammar over another, but allows the usage of multiple grammars within the same WADL. This enables support for APIs where resources are exposed via multiple mediatypes such as XML and JSON.
Besides using grammars to make assertions about a particular media types, they can also be used to make assertions about other aspects of the HTTP request and response. This is illustrated in Example 12.
Here, we define two simple types:
|
An int with a value between 0 and 100 inclusive. |
|
A 36 character long string that represents five groups of hexadecimal digits separated by hyphens. |
path/to/my/resource/3bba8e68-8af5-11e1-ac65-17a552dd2535
and path/to/98
are valid according to
the WADL, but URI paths such as
path/to/my/resource/xyz
and
path/to/101
are not.
Finally, we can use grammars to make assertions
about individual sections of the mediatype by
means of plain
parameters. This is
illustrated in method in Example 13.
Here we define that the response to the
versionDetails
method should
contain an XML payload that validates against the
element defined by the QName
common:version
. Additionally, we
make an assertion that at the XPath
/common:version/atom:link[@rel='self']/@href
there should be a value that validates against the
type xsd:anyURI
. Furthermore, this
URI should provide a link to a resource with a
resource_type
of
VersionDetails
.
Note that plain parameters can also be used by mediatypes that are not XML based. Launchpad uses JSONPath to make similar assertions on its JSON based API. This is illustrated in Example 14.
Here we state that there should exist JSON
attributes at the JSONPaths given by
$['total_size']
,
$['start']
, and
$['entries']
. Additionally the
fields at $['resource_type_link']
,
$['next_collection_link']
,
$['prev_collection_link']
, and
$['entries'][*]['self_link']
should contain links to other resources.
Note that Launchpad uses plain parameters to
make assertions about various aspects of the JSON
representation
without having to
rely directly on JSON schema.
This ability to reference elements from external WADLs allows for common behavior to be shared between different APIs and has the potential to significantly accelerate the definition of APIs with common attributes and behaviors.
Extensibility
The WADL specification defines a fairly standard extensibility model of allowing elements and attributes in foreign namespaces. The example below illustrates a number of Apigee extensions, including one which indicates that authentication credentials are not required in this particular method call.
The ability to support extensions means that niche capabilities can be inserted into the WADL in a compatible manner.
Documentation from WADL
A WADL describes a RESTful API in a machine readable format. There is often a need, however, for humans to study and understand the available methods, parameters, and representations the API offers. Rather than manually reproducing that information in a documentation format and trying to keep these synchronized, clearly a literate programming approach is called for.
In pursuing our literate program approach, we considered extending DocBook with custom markup that we could then use to generate WADL. This approach was attractive in that it would allow us to make the contract author's experience like writing documentation. However, we worried that we would be reinventing the wheel and would spend too much time creating a content model that could be turned into WADL. We also felt that we would need to support round-tripping between our extended-DocBook and the generated WADL. Ultimately, we decided to allow authors to embed WADL markup directly in the DocBook. To allow for this we:
-
Added support for certain elements from the WADL namespace in our customization of DocBook.
-
Added steps to our processing pipeline to turn the WADL elements into DocBook.
-
The
wadl:resources
element wraps one or more resources, as it would in a normal wadl. -
The
wadl:resource
element wraps one or more methods and defines the location of the wadl and theid
of the resource for the methods. -
The
wadl:method
element points to a method defined in the WADL. Each method becomes asection
in the resulting DocBook. All of the methods with a common DocBooksection
ancestor becomesection
s within thatsection
. Alternatively, you can omit thewadl:method
and the system will create sections for all of the methods that are children of the resource in the target wadl.
To ease the burden on authors, we have created schematron rule in a custom framework for the oXygen editor to validate that the references to the WADL point to ids in the WADL.
In addition to pointing to resource and method combinations, it is also possible to point to a resource and have all the methods within that resource pulled into the document or to point to the WADL and have all of the resources and methods from the WADL pulled in.
Finally, instead of pointing to an external WADL,
the author can simply include the entire
wadl:resource
and
wadl:method
(s) in the DocBook
document directly.
The following is an image of the section produced in PDF result after processing [Example 16]:
The following is the WADL method that is being imported into the DocBook:
Note that while many of the attributes and elements map directly from the WADL representation to the PDF rendering, there exist some subtleties:
-
Query parameters are rendered in two different ways: Inline with the URI and in the Request Parameters table
-
Section, Table, and Example titles are automatically generated based on context. For example, the example title "List Volumes Response: XML" is generated because there is a representation example of mediaType application/xml that is contained within method called "List Volumes".
-
Parameter descriptions are also generated based on context.
-
Other parameters such as the
tenantId
parameter are implied simply because of the URI the method is associated with. -
Note the use of the
xsdxt:code
extension to associate an example document with the API response.
WADL Normalization
As described in section “Content Reuse”, the
WADL format includes a number of features that
facilitate WADL authoring and content reuse. For
example, when creating a WADL, it is typical to put
all of the method
elements at the end of the document and refer to them
indirectly from the resources where they are used. You can
link to methods in a separate WADL. Likewise, you can
define resource_type
elements containing various methods and parameters.
Each resource can
then be declared to be of one or more resource types
to avoid repeated information. Finally, WADLs can
refer to XSD files to define data types used in the
WADL. The XSD files in turn often have a modular
design. While these features facilitate information
reuse and support "single point of truth" practices, a
side effect of the indirection is that WADLs can be
difficult to process. To address this issue, we
created a WADL normalizer which uses XSLT to flatten
out the WADL and associated XSDs. In the WADL, for
example, all references to method
s and
resource_type
s are resolved. Further
processing is therefore greatly simplified. We have
made this tool available as open source software.
Use cases for the WADL normalizer include:
-
Preparing the WADL for use with tools like SoapUI. Limitations and bugs in SoapUI cause it to fail if certain features of the WADL are used. For example if you use
resource_types
or refer to a method in an external WADL, SoapUI cannot load the WADL and throws and exception. -
Preparing the WADL before using it to generate human-readable documentation, as described in section “Documentation from WADL”.
-
Open Repose uses the WADL normalizer as a preprocessing step before using the WADL for validation is discussed in section “The Validation Pipeline”.
The WADL normalizer offers a number of options to control the formatting of the normalized wadl file, summarized by the usage below:
dcramer@anatine ~/rax/wadl-tools/src/test/scala (scalaCLI) $ normalizeWadl.sh -? Usage: normalizeWadl.sh [-?fvx] -w wadlFile OPTIONS: -w wadlFile: The wadl file to normalize. -f Wadl format. path or tree path: Format resources in path format, e.g. <resource path='foo/bar'/> tree: Format resources in tree format, e.g. <resource path='foo'><resource path='bar'>... If you omit the -f switch, the script makes no changes to the structure of the resources. -v XSD Version (1.0 and 1.1 supported, 1.1 is the default) -x true or false. Flatten xsds (true by default). -r keep or omit. Omit resource_type elements (keep by default).
One of the most important options is the format of the resource paths. By default, the normalizer leaves the path attributes on the resource elements unchanged. So if the source wadl contains a mixture of hierarchical and flat path attribute values, these are left untouched. In the following code listing, notice that some of the resource elements are nested and others have multiple items in the path attribute:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <resources base="https://test.api.openstack.com"> <resource path="a/b"> <resource path="c"> <method href="#foo"/> </resource> </resource> <resource path="d"> <resource path="e/f"/> </resource> <resource path="g"/> <resource path="h/i/{j}/k"> <param name="j" style="template" type="xsd:string" required="true"/> <method href="#foo"/> </resource> <resource path="h/i/{j}/k/l"> <method href="#foo"/> </resource> </resources> <method id="foo"/> </application>However, you can also flatten out these paths or expand them so that all the resources are nested. Here's the same wadl in the
pathformat:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <resources base="https://test.api.openstack.com"> <resource path="a/b/c" id="d12e4"> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> </resource> <resource path="h/i/{j}/k" id="d12e9"> <param xmlns:rax="http://docs.rackspace.com/api" name="j" style="template" type="xsd:string" required="true" rax:id=""/> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> </resource> <resource path="h/i/{j}/k/l" id="d12e12"> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> </resource> </resources> <method id="foo"/> </application>The
rax:id
attributes have been added to
preserve the original ids that could not be duplicated
in the normalized wadl without making the wadl
invalid. These are required for down-stream processing
when we generate documentation from the normalized
wadl.
You can also expand the flat paths into a fully hierarchical tree format. The following example shows the same wadl expanded into the tree format:
<application xmlns="http://wadl.dev.java.net/2009/02" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <resources base="https://test.api.openstack.com"> <resource path="a"> <resource path="b"> <resource path="c"> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> </resource> </resource> </resource> <resource path="d"> <resource path="e"> <resource path="f"/> </resource> </resource> <resource path="g"/> <resource path="h"> <resource path="i"> <resource path="{j}"> <param name="j" style="template" type="xsd:string" required="true"/> <resource path="k"> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> <resource path="l"> <method xmlns:rax="http://docs.rackspace.com/api" rax:id="foo"/> </resource> </resource> </resource> </resource> </resource> </resources> <method id="foo"/> </application>
In addition, the normalizer can optionally flatten
out XSDs by pulling in included XSDs and filtering
based on the vc:minVersion
and
vc:maxVersion
attributes.
Finally, you can optionally filter out
resource_type
elements from the
normalized wadls. In some cases, it is useful to
preserve these element, but they can cause problems
for certain tools. Therefore a parameter is provided
filter out the resource_types
.
The Validation Problem
As a REST service is being developed, the goal is to ensure that the documentation accurately matches the implementation. Once a REST service reaches a mature state, or is released publicly, the documentation becomes a contract; both clients and alternate implementations use this contract to guide their development. Iteroperability between all parties requires that both the documentation and the implementation remain stable, even in the face of bug fixes, upgrades, enhancements, and the introduction of new features via extensions.
In the presence of these changes, there always exist the possibility that either the implementation or its docs may inadvertently drift from one another. This drift often introduces incompatibilities that can cause clients to fail. Thus, it's important to constantly test that the implementation and its documentation conform to one another. Unfortunately, our quality engineering teams are often not focused on document conformance, or on the intricate details of the REST/HTTP aspects of the service. Instead, these teams are focused on the functionality of the service itself. It's been our experience that a number of incompatibilities often slip through the cracks as a result. What's more, because this focus on functional testing has little to no regard to documentation of the service, we've found cases where both the implementation and its tests drift away from the docs simultaneously. Adding to the complexity is the fact that service developers tend to allow their implementations to be flexible and loose when accepting messages from clients — this means that conformance and functional tests themselves may inadvertently drift from the service contract without notice.
One of our main goals is to better incorporate the documentation in the testing process in an automated way. The idea is to use the documentation pipeline to generate validation rules that can be checked in a layer between the function tests and the service itself. This is illustrated in Figure 19.
Here, the validator intercepts requests from the functional tests before they reach the service and check them for strict conformance to the documentation. Likewise, the validator intercepts the response for the REST service and also validates them. There are several advantages to taking this approach:
-
Because the validation rules are generated directly from the documentation, we can ensure that the validator is strictly accurate with conformance to the docs.
-
Because the validator sits between the functional tests and the service itself, it can be used to check for drift by both parties simultaneously.
-
The technique can be easily incorporated into existing services — it does not require changes to existing functional tests.
Validating REST with Automata
Given any possible HTTP message, the validator in Figure 19 needs to be able to tell the difference between an HTTP message that meets all of the criteria defined in the documentation, from an HTTP message that does not. In other words, the validator must accept the subset of all HTTP messages that are valid according to the description in the source document. The validator also needs to categorize those messages that are not valid according to the error code that should be generated by the underlying REST service, so that the error code that the REST service produces can also be verified.
Accepting messages that meet some criteria is a common problem in computer science. One technique for solving the problem is to utilize an automaton. An automaton is a state machine that transitions from an initial start state to other states based on the current input. If after the message is read the machine is in an accept state, then the message is accepted, otherwise the message does not meet the required criteria.
The idea behind our validators is to translate documentation in the form of a WADL (either stand-alone or extracted from the contents of a RackBook document), into a representation of an automaton that can be used to validate messages as they are intercepted between functional tests and the REST service.
An example automata is illustrated in Figure 20.
For simplicity, the automaton above is only checking for conformance to the URL path and the HTTP method. In this particular API, the only allowed operation is a GET on the path /path/to/record/{date}, where {date} is an xs:date as defined by W3C XML Schema W3CSchema2010. The start state is S0. There are three accept states:
-
SA: accepts HTTP messages that follow the constraints defined by the API.
-
d30U: accepts HTTP messages for which a 404 (Not Found) response should be expected from the REST service.
-
d30M: accepts HTTP messages for which a 405 (Method Not Allowed) should be expected.
Starting at state S0, the process examines the URL path, one path segment at a time. The HTTP method is not accessible until the path has been completely parsed. The transition from one state to another is based on input matches, except in the case of an ε transition, where a state is advanced without regard to the current input. In the machine above, U(x) matches the current URL path segment to x, where x may be a regular expression or a QName representing an XML Schema simple type. U!(x), matches an URL segment that is not accepted by x. On a successful match the URL path is advanced to the next path element. After the URL path is completely read, the machine proceeds to read the HTTP Method. Here, M(x) and M!(x), work the same as U(x) and U!(x), except they match against the HTTP method instead of the URL path segment. Because the automaton is only checking for URL paths and Methods the process ends after the HTTP method has been read.
Table I illustrates the states that are transitioned based on an example set of inputs.
Table I
Input | States Traveled | Result |
GET /path/to/record/2001-01-02 | S0, d18e4, d18e5, d18e6, d18e7, d18e9, SA | 200 Okay |
GET /my/path/ | S0, d30U, d30U, d30U | 404 Not Found |
PUT /path/to/record/2001-01-02 | S0, d18e4, d18e5, d18e6, d18e7, d30M | 405 Bad Method |
The Checker Format
REST Automata are internally represented by an XML format, which we call the checker format. An instance document in this format is illustrated in Figure 21.
This simple, terse, document type maps each state in the state machine, along with the edges that it connects to, into a single step element. All steps in the checker format contain an id attribute of type xs:ID which uniquely identifies the step in the machine and a type attribute which indicates the type of check the step performs. Note, that in Figure 21, there are two types of URL checks: the steps that check a path segment against a regular expression are of type URL, and those that check the segment against a simple XML Schema type are of type URLXSD. Connections between steps are represented by a next attribute of type xs:IDREFs. Additionally, most steps also contain a match or notMatch attribute that provides the details of how the a match is interpreted.
Currently there exist steps to match against all aspects of the HTTP request, including steps that match against request media types, that check the well formness of XML and JSON content, that validate XML via an W3C XML Schema (XSD) 1.1 validator and that check for the presence of required elements by means of XPath expressions. Note that this is particularly important because a REST service may define many different types of elements, the element attribute in a representation binds an element to a particular operation — and this is enforced via an XPath expression step in the machine.
Another important checker step is the XSLT step, which performs a transformation of the request message before it is validated by the XSD step. It may seem odd to that an XSLT transform may be required for validation, but the requirement comes into play in cases where the type of the request document needs to be restricted (or extended) based on the operation being performed. Let's say, for example, that a REST API dealing with Widgets allows a Widget to have all attributes when performing a PUT operation, but restricts the POST operation to Widget documents containing only a subset of the available attributes — this is a common pattern in REST services. The restriction on POST is specified in the WADL in Figure 22.
Note that we use a WADL plain parameter (described in section “Grammar Agnostic Assertions”) to connect a restricted type (which prohibits a number of attributes) of the Widget to the POST operation, by specifying the WidgetForCreate type and an XPath of where the type should apply. In the automaton, this is translated into an XSLT step that modifies the request document by adding an xsi:type parameter, thus informing the validator to validate the contents of the Widget message as an instance of the restricted type. In this case, the restriction is made in the root document, but these types of restrictions can be placed on any element in an XML document.
Optimization
The automata created to validate REST services may be quite complex, often involving many states and connections. As an example, Figure 25 illustrates the automaton for validating the OpenStack Compute API (excluding extensions).
Optimization stages may be introduced into the transformation process to try to reduce the number of states in the automaton. Each optimization stage is simply an XSLT that takes a checker document as input and creates a checker document with less states as output. Optimization stages can therefore be chained together. With each stage, the checker document produced should perform the exact same function as the original unoptimized one.
Currently, optimization stages work by compressing redundant (or nondistinguishable) states into a single state. Redundant states occur quite frequently because the translation code that converts a WADL to a checker document does so without regard to the number of states produce in order to keep the translation code simple. Figure 26 illustrates the effect of running these optimization stages on the compute API. Note that the number of states is reduced dramatically.
Other optimization techniques are possible. For example an optimization stage may:
-
Convert the simple type checks in URLXSD steps into regular expression URL steps.
-
Compress multiple regular extension checks into a single check.
-
Compress multiple boolean XPath expressions into a single expression.
-
Compress XPath expressions, XSLT transformations, and XSD schema validation into a single schema aware XSLT transform step.
The Validation Pipeline
The validation pipeline is responsible for producing REST automata from WADLs. The pipeline is illustrated in Figure 27.
The first three parts of the pipeline are shared with our general documentation pipeline. The stages utilize XProc, XSLT2, and XSD 1.1 to generate the final checker document. The final stage, creates an immutable data structure from this document that is used by the Repose WADL proxy component to process and validate API requests within the Repose HTTP proxy. This stage is implemented by means of a specialized SAX handler. The data structure the handler produces is essentially a model of the automaton that can be executed by the Repose component when validating request. The process for validating HTTP request with the data structure avoids the use of global state, meaning that most context is passed via the call stack. Because of this, and because the internal data structure is immutable, the entire process is thread safe, meaning that it can validate multiple HTTP request simultaneously in different threads of execution.
It's important to note that the validation pipeline above is a prepossessing step. Once the immutable data structure is create it can be used to handle many HTTP request simultaneously, at run-time, and in an efficient manner.
Another note is that the stages in the validation pipeline are configurable. There are configuration options that dictate:
-
The strictness of the validation — which itself is determined by which validation steps should be added to the final checker.
-
Specialized options for each individual steps. For example, should an XPath version 2 implementation be used in an XPath step? Should Saxon EE or Xerces be used for XSD 1.1 validation in the XSD step?
-
The optimization stages to use.
Other Use Cases
The initial goal for creating validators from REST documentation is described in section “The Validation Problem”, however, once an implementation of the validation pipeline was created, it became evident that the pipeline can be used to solve other problems. Some of these additional use cases are described below.
Filtering and Error Reporting
REST services receive many millions of HTTP requests a day. Many of these requests are malformed — the result of user errors. Because a validator can detect malformed requests, it can filter these from the underlying REST implementation. What's more, the validator has enough context to respond with an appropriate error message. For example, rather than responding with simply a 404 (Not Found), the validator can respond with: 404 /path/to/widget not found, expecting "server | image" instead of "widget".
Our implementation has added support to this capability and it has proved useful.
Authorization
Often different sets of users can access different parts of the API. For example, administrators may have the capability to reboot a server, but regular users may not. The idea here is to build on the filtering capabilities in the previous section and assign different sets of users different validators, so that administrators have a validator that accepts the reboot operation, and regular users have a validator that does not. The interesting thing about this use case is that the WADL becomes an authorization policy file.
API Coverage
Validators are capable of logging what the stages in the automaton have been visited. This information can be used to compute test coverage. If there are states in the automaton that are not visited by functional tests, then the functional tests are obviously missing something. Also, in production, the information can be used to gain insight into what parts of an API are frequently accessed by users.
Conclusions
In this paper, we described the process by which we used an existing documentation pipeline for documenting REST APIs and extend it to support the creation of validators that can help confirm that the documentation accurately reflects reality or that an implementation does not drift from what's stated in the docs.
We took an intermediary approach for performing validation, where messages are intercepted between client and server applications and an automata-based programming approach that enabled the efficient validation of messages at run-time. The technique opened up other use cases such as filtering, accurate error reporting, authorization, and API coverage. We feel that a main key to the success of the project was in choosing a tag set that semantically described all aspects of a RESTful service and contained author friendly features.
In the next sections we describe some of the challenges we encountered while developing the validation pipeline and discuss some possible future extensions to the pipeline.
Challenges
XSD 1.1, early adoption
Because OpenStack APIs are extensible, describing the XML media types with XSD 1.0 was not an option, we need the new open content features that are now offered by the XSD 1.1 standard. Since OpenStack is an open and free platform, we have the goal of ensuring that everything that we develop remains open and accessible to the community, while still remaining friendly to enterprise customers. To that end we decided that our validators should support both the Xerces (open source) and the Saxon EE (proprietary) XSD 1.1 implementations.
We've had great success in ensuring that our validators interoperate with both implementations seamlessly, however, during the development of the pipeline we have encountered a number of errors with the Xerces implementation. In fairness, the Xerces implementation is still labeled BETA, and the Xerces team has done a great job of resolving most of the issues we've discovered thus far. The major challenge, for us however, is we've come to rely on XSD 1.1 features when there is yet a full, production ready, and free open source XSD 1.1 implementation.
Parsing XPaths in XSLT 2
WADL makes references to XPaths when describing plain parameters. These XPaths need to be successfully copied as they proceed along the validation pipeline. Ensuring that the XPaths remain in tact under all circumstances has proved difficult given that there may be conflicts with namespace prefixes, or worst, contention for the default namespace.
Our implementation makes a best effort to keep XPath valid, but there are still some edge cases that trip it up. We are debating whether to create an XSLT extension that will enable us to leverage an XPath 2 parser, or to write our own XPath 2 parser in XSLT to resolve the issue.
Testing The Validation Pipeline
An early challenge that we encountered was that we needed to develop a method of testing the validation pipeline. Because validation and our documentation pipeline both rely on the WADL normalizer, it is important to ensure that the output of the normalizer accurately reflects the intent of the original WADL author. To avoid regressions as we add new features, we implemented a functional testing framework that takes advantage of the xml features of Scala to perform transformations on simple WADLs and WADL+XSD combinations and compare those with the expected result. The following code sample shows one of these tests.
Notice that the test infrastructure supports the scenario/given/when/then format of behavioral driven development.
Future Work
The goal described in section “The Validation Problem” is not entirely complete because we are still missing considerable checks on the HTTP response generated by the REST service. We plan on extending the automata to account for this. Additionally, we plan on adding JSONPath and JSON Schema stages as only JSON well formness is currently tested. We also plan on adding support for other validation languages in XML such as RelaxNG and Schematron. Finally, we plan to pursue some of the extended use cases we discussed in section “Other Use Cases” such as authorization and API coverage.
References
[ApigeeWADL2012] Apigee. The Apigee WADL Library https://github.com/apigee/wadl-library
[Fielding2000] Fielding, Roy Thomas. Architectural Styles and the Design of Network-based Software Architectures. 2000. http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm.
[Steiner2007] T. Steiner. Automatic Multi Language Program Library Generation for REST APIs Master Thesis, Institute for Algorithms and Cognitive Systems University of Karlsruhe. http://www.lsi.upc.edu/~tsteiner/papers/2007/automatic-multi-language-program-library-generation-for-rest-apis-masters-thesis-2007.pdf
[hREST2008] Kopecky, J., Gomadam, K., Vitvar, T.: hRESTS: an HTML Microformat for Describing RESTful Web Services. In: The 2008 IEEE/WIC/ACM International Conference on Web Intelligence (WI2008), Sydney, Australia, IEEE CS Press (November 2008)
[JSONPath2008] Goessner S. JSONPath - XPath for JSON http://goessner.net/articles/JsonPath/
[Jaxrs2009] M. Hadley and P. Sandoz, JAX-RS: Java API for RESTful Web Services, Java Specification Request (JSR), vol. 311, 2009. http://jcp.org/en/jsr/detail?id=311.
[Launchpad2012] Launchpad. Hacking the Launchpad web service https://help.launchpad.net/API/Hacking
[Mashery2009] Mashery.; Mashery: Untold Secrets Behind Managing an API http://www.readwriteweb.com/archives/mashery_secrets_behind_managing_an_api.php
[O2007] L. Richardson and S. Ruby. RESTful Web Services. O’Reilly Media, Inc., May 2007.
[O2010] Allamaraju, S., and Amudsen, M. RESTful Web Services Cookbook. O’Reilly. 2010.
[O2011] Webber, J., Parastatidis, S., Robinson, I.: REST in Practice: Hypermedia and Systems Architecture. O’Reilly & Associates, Sebastopol (2010).
[Maleshkova2009] Maria Maleshkova, Jacek Kopecky, and Carlos Pedrinaci. Adapting SAWSDL for Semantic Annotations of RESTful Services. In Workshop: Beyond SAWSDL at OnTheMove Federated Conferences & Workshops, 2009.
[Swager2011] Swagger. Swagger Specification. http://swagger.wordnik.com/spec
[Takase2008] T. Takase, S. Makino, S. Kawanaka, K. Ueno, C. Ferris, and A. Ryman, Definition Languages for RESTful Web Services: WADL vs. WSDL 2.0, IBM Reasearch, 2008
[WADL2Java2008] WADL2Java. wadl2java Tool Documentation. http://wadl.java.net/wadl2java.html
[WSDL2007] Chinnici R., Moreau J., Ryan A., and Weerawarana S. Web Services Description Language (WSDL) Version 2.0. June 2007. http://www.w3.org/TR/wsdl20/
[W3CSchema2010] Gao S., Sperberg-McQueen C. M, and Thompson H. W3C XML Schema Definition Language (XSD). http://www.w3.org/TR/xmlschema11-1/
[Hadley2009] M. Hadley.; Web Application Description Language W3C Member Submission. August 2009 http://www.w3.org/Submission/wadl/