Introduction
Unified Modeling Language (UML) has proved to be a powerful language for modeling software systems. Modeling is an important activity in the software development process. It is a good way of how to abstract from technical details of a software system and concentrate on structural, semantic and functional aspects of the system. In connection with UML, a methodology called Model-Driven Architecture (MDA) has been established. It proposes to start modeling a system at an abstract platform-independent level stripped from technical details and then specify the model with more technical details at a platform-specific level. From this level, actual executable code can be then generated (ideally) automatically. In practice, the step from the platform-specific model (PSM) to executable code requires usually some level of the designer's interaction, but nonetheless, the presence of the layered model is a significant improvement both during initial formation of the implementation and it's management in the future.
In our previous work (Nečaský 2012A, Nečaský 2012B), we have shown how beneficial it can be to interconnect the world of UML and MDA with the world of designing XML schemas. Our premise was the fact that complex software systems are often modeled with UML diagrams at the platform-independent level. In particular, a UML class diagram is suitable for modeling a conceptual schema of the application domain. Such schema is called schema in a platform-independent model or simply PIM schema in MDA terminology. The term model means a modeling language (the language of UML class diagrams in our case). Our method allows to exploit the existing PIM schema to design XML schemas in the system. Each XML schema is modeled as so called schema in a platform-specific model or simply PSM schema. A PSM schema is, again, a UML class diagram. It represents a part of the PIM schema and shapes it to the form of the aimed XML schema. From the PSM schema, the XML schema expressed in a selected language (we currently support XML Schema) can be derived automatically. We have implemented the method in our CASE tool eXolutio.
One advantage of such approach is that an XML schema designer takes an existing PIM schema and derives from it a PSM schema of the aimed XML schema. This is much easier then writing the XML schema manually even when the designer uses an XML schema visualization tool. Another advantage of using a PIM schema is that it prevents from inconsitencies between the XML schema and the conceptual schema.
Several UML/XML schema mapping approaches were proposed so far (Pagano 2009, Bauman 2009), the advantage of our approach is hidden in the fact that a software system usually does not presume only one XML schema, but a whole family of XML schemas each representing a selected part of the application domain (e.g. in the domain of of e-commerce, the system might contain following XML schemas: purchase order, product catalogue or customer detail). Moreover, several different XML schemas may overlap and share some part of the reality but may represent it differently (e.g., the concept of Customer in of e-commerce is shared by different XML schemas which represent it with different XML structures). Using our approach, the reality is modeled only once in the PIM schema and its different representations in different XML schemas are derived from the PIM schema as PSM schemas. When the designer examines a particular PSM schema, he can always track any used concept back to the common model and thanks to this fact never looses the "big picture". The management of a family of schemas is thus more efficient and error-proof. Moreover, our approach makes the XML schemas better readable - anyone can look up what real-world concepts modeled in the conceptual schema are represented with a selected part of a particular XML schema and, vice versa, how a selected real-world concept is represented in which XML schemas.
In this work, we extend our previous work towards modeling complex integrity constraints (ICs). ICs are a substantial part of every software system, equally important as a conceptual schema. However, most of them cannot be expressed with UML class diagrams and, therefore, Object Constraint Language (OCL, see OCL specification) was introduced for this purpose. OCL is a formal language which can be used to declare ICs over a UML model. The ICs are in a form of logical expressions and can check constraints such as "employee, who is a division manager, must have a collage degree" or "the store manager can get a bonus only when the profits have grow at least 10% compared to last year" (the following section contains more examples of integrity constraints and how they are expressed fromally in OCL) . Usaing OCL has two main benefits in comparison to the natural language:
-
unlike natural language, the formal expression is exact and unambiguous,
-
an expression in a formal modeling language can be used to generate code that verifies the validity of the IC in actual data or running program.
The authors of Dresden OCL toolkit (see Dresden OCL) have demonstrated how OCL ICs can be checked at runtime in Java (by translating OCL into Java code and adding triggers that verify the constraints when objects are changed) and also in relational databases (by translation of OCL expression into SQL statements, which allow to select records violating the IC).
When a system uses XML (e.g. for message exchange in communication or storing data), the ICs should be checked in the XML data and using XML technologies. In this work, we show how OCL ICs can be checked at runtime in XML documents using Schematron. The result of our research can be again profitably used by XML schema designers in cases when a PIM schema complemented with OCL ICs. Together with modeling the XML schemas as PSM schemas derived from the PIM schema, the designers can also easily convert OCL ICs from the PIM schema to the PSM schemas.
Having OCL ICs expressed at the platform-independent level the question is how they can be translated to the platform-specific level and from here to the XML schema level. The conversion to the platform-specific level is not covered in this paper. Just let us note that we have already implemented several methods for this conversion in our tool eXolutio but we do not describe them in this paper. Instead, we focus on the actual conversion of OCL ICs expressed at the platform-specific level to their suitable XML conterpart, which are Schematron schemas. Figure 1 shows an overall architecture of the system.
Schematron is a rule-based XML schema language based on XPath expressions. It is often used as a complement of grammar based schema languages (such as XML Schema, Relax NG or DTD), which define a proper vocabulary and validate the structure of the XML document, to check data and co-occurrence constraints - it serves a similar purpose, as does OCL in UML. Because the approach used in OCL to build expressions is rather different than the space of expression provided by XPath, it is necessary to extend XPath to provide comparable expressive power. We implemented the required extensions in our XSLT functions library called OclX. Since Schematron validation is usually implemented as several steps of XSLT processing, our extension will not require modification of the existing Schematron validation tools, except referencing OclX library. We will examine different approaches, how the required functions can be implemented, and show how several of the new constructs (such as higher-order functions, maps, or error-recovery instructions) proposed in the 3.0 drafts of XPath, XQuery and XSLT can serve our goal.
Outline
The rest of this work is organized as follows: section “OCL Constraints” gives brief introduction to OCL and shows examples demonstrating its expressive power. section “Iterator expressions” lists possibilities of translating iterator expressions, section “Tuples as maps” elaborates on tuples, section “Different kinds of collections, nested collections” on other types of collections than sequences. In section “Error recovery”, we describe, how semantics of OCL error handling can be achieved. section “Implementation” describes the implementation of OclX in our tool and the use of XProc for validation. In section “Expression rewriting and syntactic sugar”, we outline possibilities of rewriting certain classes of expressions. In section “Conclusion”, we conclude.
OCL Constraints
In this section, we will show on several examples the usage of OCL constraints.
Figure 2 shows a sample platform-independent schema of a chess leagues information system. [1].
Figure 3 shows some constraints defined for the PIM schema. We will not elaborate the syntax of OCL in detail in this paper. Briefly: OCL script consists of declaration of blocks of integrity constraints, each block starts with the selection of context class and contains one or more invariants for this class. The invariants are logical expression refereing to the concepts defined in the PIM schema. Standard OCL does not allow to specify an error message describing the violated constraint. However, this is a valued feature of Schematron which we did not want to lose. That is why we extended OCL to allow an optional message after each invariant definition. The message can contain subexpression, as is shown in IC PIM1.
From this schema, two platform-specific schemas (Figure 4, Figure 5) were created, each using the classes from Figure 2, but each for a different type of XML documents used in the system. XSDs obtained from the PSM schemas are shown alongside the figures.
Figure 4: A sample platform-specific schema modeling a type of XML documents for Matches and Players involved in a Tournament
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="tournament" type="Tournament" /> <xs:complexType name="Tournament"> <xs:sequence> <xs:element name="name" type="xs:string" /> <xs:element name="start" type="xs:dateTime" /> <xs:element name="end" type="xs:dateTime" /> <xs:element name="matches" type="Matches" /> <xs:element name="participatingPlayers" type="Players" /> </xs:sequence> </xs:complexType> <xs:complexType name="Matches"> <xs:sequence> <xs:element name="day" type="Day" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:complexType name="Players"> <xs:sequence> <xs:element name="player" type="Player" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:complexType name="Player"> <xs:sequence> <xs:element name="name" type="xs:string" /> <xs:element name="email" type="xs:string" /> </xs:sequence> </xs:complexType> ... </xs:schema> |
Figure 5: A sample platform-specific schema modeling a type of XML documents for Tournament Qualification
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="tournaments" type="Tournaments" /> <xs:complexType name="Tournaments"> <xs:sequence> <xs:element name="tournament" type="Tournament" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:complexType name="Tournament"> <xs:sequence> <xs:element name="name" type="xs:string" /> ... <xs:element name="qualification" type="Qualification" /> </xs:sequence> </xs:complexType> <xs:complexType name="Qualification"> <xs:attributeGroup ref="OpenTournament-att-opt" /> <xs:attributeGroup ref="League-att-opt" /> </xs:complexType> <xs:attributeGroup name="OpenTournament-att-opt"> <xs:attribute name="open" type="xs:boolean" use="optional" /> </xs:attributeGroup> <xs:attributeGroup name="League-att-opt"> <xs:attribute name="leagueName" type="xs:string" use="optional" /> </xs:attributeGroup> </xs:schema> |
Figure 6 shows, how the constraints from Figure 3 can be applied in the PSM schemas, where they are relevant. For the purposes of using OCL at the PSM layer, we added additional ways of navigation (to those defined in the standard) - parent and child_N, choice_N, set_N, seq_N (navigate to the n-th child, choice set or sequence in the content model of a class).
It is also possible to add additional constraints, which do not have a counterpart PIM constraint, for each PSM schema (e.g. constraint PSM5). In this paper, we will not show, how the relevant constraints are chosen for each PSM schema, but we will focus on how they can be verified in XML documents. (MARTIN: Tady ale zhruba rozved, jake jsou vyhody toho, ze to je na PIMu. Tj. odpovedi na otazku, proc si to nemuzu rovnou vyjadrit na PSM urovni?)
From the nature of the excerpt Figure 6, it can be seen that contexts and invariants in OCL play the same role as rules' contexts and asserts in Schematron. The core of the transition from OCL to Schematron thus lies in translating OCL expressions into XPath tests, preserving the semantics. Some expressions (e.g. literals, if-then-else, variables, arithmetic, string and boolean operations, some collection operations) may be mapped directly to a corresponding XPath expression or function.
For some operations, a corresponding function does not exists or has a different semantics
(e.g. OCL indexOf
behaves just like XPath index-of
when the searched
item occurs at most once in the searched sequence, but XPath version returns indexes
of all
occurrences, when there are more). For these cases, we created a set of functions
in OclX
library preserving the semantics.
Feature-call expressions are a fundamental part of OCL - starting at a variable, they
allow traversing the UML model along associations and accessing attributes. These
are mapped
to path expressions of XPath, which traverse the XML document. E.g. expressions
self.start
or self.matches.day
can be expressed as
$self/start
and $self/matches/day
respectively.
In the following sections, we will be dealing with those OCL constructs that cannot be expressed trivially - iterator expressions, tuples and nested sequences. We will also analyse the differences in error recovery in both languages.
Iterator expressions
In the examples above, a construct distinctive for OCL is used several times. It is
the
so-called iterator expression. An iterator expression is a certain kind of a function.
It is always
called on a collection object. It does not have ordinary parameters, instead each
usage
declares the iterator variable(s) it will use and a body
expression. OCL standard library contains several predefined iterator
expressions, the fundamental being iterate
. Its most general form is:
iterate(i; acc = {acc-init} | {body})
The semantics of the expression is as follows:
-
Expression
acc-init
is evaluated (in the actual context, i.e. it may reference any variables valid in the calling expression). The result is assigned to variableacc
, which, from now on, functions as an accumulator. -
For each member of the source collection (for which iterate is called), the member is assigned to variable
i
. Variablesi
andacc
(containing the value of accumulator from the previous iteration) are added to the context. -
Expression
body
is evaluated (the context now contains the variablesi
andacc
. The result is assigned to variableacc
and will be used in the next iteration. -
After the last iteration, the value in the accumulator is returned as a result.
There are several facts regarding iterator expressions:
-
There are two fundamental iterator operations -
closure
anditerate
. From these two,iterate
is not very often used directly by the designers working with OCL. More often, another, more specific, iterator expression is used. The importance of iterate lies in the fact that all other iterator (with the exception ofclosure
) expressions (and a majority of collection operations) can be defined in terms of the fundamental operationiterate
. E.g. operationexists(it|{body})
is defined asiterate(it; acc=false| {acc or {body}})
. -
Iterator expressions
forAll
andexists
(serving as quantifiers) together withnot
andimplies
make OCL expressions at least as powerful as first order logic . Operationclosure
increases the expressive power with the possibility to compute transitive closures. Operationiterate
allows to compute primitively recursive functions (for more on the expressive power, see Mandel 1999. -
Multiple iteration variables, such as in
c->forAll(v1,v2|v1 <> v2)
, are allowed for some expressions, but that is just a syntactic shortcut for nested calls (i.e.c->forAll(v1|c->forall(v2|v1<>v2))
). -
Collection operations define additional variables to
self
, which is global in the scope of the invariant, and these variables (iterators and accumulator) are local (they are valid in the subexpression only).
When we want to translate OCL ICs for XML, property 1 guarantees that showing how
to
express iterate
and closure
proves that other iterator expressions
can be expressed as well, because they can be defined using these two (although the
actual
implementation would be often more efficient when implemented directly). Property
3 relieves
us of the necessity of considering expressions with multiple iterators. Property 4
requires
the translation to abide the rules for context and scope of variables. Every expression
can
refer to the global variable self
and possibly other global variables declared
explicitly by the user. Iterator expressions define local variables.
There is no construct similar to iterator expression in XPath, which is used by
Schematron. In some special cases, iterate
can be translated to XPath
for
expression, but not in general - whereas iterate
can in each
iteration refer to the result of the previous iterations (through acc
variable),
in XPath for
, computations of every iteration are strictly separated. Thus, XPath
must be extended to fully support iterator expressions and in this section, we will
show
several approaches, how it can be achieved. Since "iterate" is a name of a construct
in OCL, a
name of its implementation in our library OclX and also a name of an XSLT instruction,
we will
distinguish them by using iterate
for the OCL construct,
oclX:iterate
for its implementation in OclX and by refering to the XSLT
instruction as to xsl:iterate
.
Iterators via generated functions
The principle of using accumulators in recursive calls is a routine practice in XSLT. The expression from the integrity constraint PSM4 from Figure 6 can be rewritten into an XPath expression and an accompanying XSLT function as follows:
The context of the expression (valid variables) is passed as parameters
(accumulator
, self
and iteration
). The value of
accumulator is computed using the combination of the two expressions: expression from
the
definition of exists
function (acc or (body)
) and from the actual
call of the function (m.start = self.start
), which is substituted for
body
.
In this way, it is possible to translate every iterator expression (closure
function can be expressed similarly). The negative side is that for each usage (call)
of an
iterator expression, a separate function has to be generated and each Schematron schema
must
be equipped with a specific set of functions that handle iterator expressions. The
generated
functions for the same iterator expression (e.g. exists
) would differ very
little (in the case of exists, only in the second argument of or
computing the
value of newAcc
). Our aim is to create a set of universal functions directly
corresponding to general iterator expresions, but for that, we will need some features
beyond the 2.0 versions of XPath and XSLT.
Iterators via dynamic evaluation
Function in Figure 7 can be looked upon as a template. As we
have pointed out in the previous paragraph, the generated functions for all iterator
expressions (except closure
) would differ very little - only in the line
computing the next value of the accumulator. Moreover, the functions generated for
different
calls of the same predefined iterator (e.g. all calls of exists
) would differ
even less - only in the body
($m/start eq $self/start
in the
example). Thus, the whole generated function can be looked upon as a sort of template.
If
only it were possible to create such "templated function", we would be able to use
the same
template for all calls, parameterized by body
.
One way to achieve this is by using dynamic evaluation. Each iterator expression can
then be mapped to a call of the same function, for which body
is passed as a
string. The value of body
is then computed dynamically. Figure 8 shows an implementation of iterate
using
dynamic evaluation.
Dynamic evaluation is a feature not present in standardized XSLT 2.0, but exists both
in
the form of cross-processor portable external libraries (such as EXSLT) or
vendor-specific extensions (e.g. in Saxon). Unfortunately, the semantics
of dynamic evaluation is not consistent in these cases, e.g. EXSLT allows to use free
variables in the evaluated expression (e.g. self
, m
and
acc
in the example) and they are bound to values defined in the context where
the function is called. On the other hand, in Saxon, all free variables are bound
to the
parameters passed to the call of evaluate
(as a consequence to this design
decision, the names of the free variables are coerced to be p1
, p2
etc. and there is a limit on the maximal amount of free variables allowed).
W3C answered the call for dynamic evaluation in the draft of XSLT 3.0 and introduced
a
new instruction - xsl:evaluate
. The semantics is closer to the one used by
Saxon, values of free variables are assigned using xsl:with-param
instructions
(the names of the parameters are in this case up to the user to decide). As it turns
out,
there is no perfect solution with dynamic evaluation, because there are these obstacles:
-
There is no upper limit on the amount of both free and bound variables the user can use in an OCL expression.
-
The names of the variables are also selected by the user.
The first one renders Saxon, with its limit on the number of free variables, not general
enough. All three approaches also share the same problem - the names of free variables
are
either coerced (Saxon) or they are defined statically (EXSLT, xsl:evaluate
). If
xsl:evaluate
allowed dynamic naming of the parameters, the following call
would be usable (the excerpt replaces the definition of newAcc
in the template
from Figure 8:
<xsl:evaluate xpath="$body"> <xsl:with-param name="{accumulatorVar}" select="$acc" /> <xsl:with-param name="{iterationVar}" select="$collection[$iteration]" /> <xsl:with-param name="self" select="$self" /> </xsl:evaluate>
Yet, even this solution works only when there are only three allowed free variables
in
body
expression - accumulator, iterator, and self.
Figure 9 shows, how exists
iterator expression
can be implemented (as function oclX:exists
) according to its specification in
the terms of iterate
(implemented as function oclX:iterate
, Figure 8). The body
expression is combined with the
expression defining exists iterator expression (acc or (body)
) using simple
string manipulation.
To conclude this discourse, we implemented a version of OclX library which uses dynamic evaluation. We used the extension function provided by Saxon. We solved the second problem by passing the values of all free variables in one value (a sequence of values) and replacing the references of free variables by expression indexing the sequence before the expression is evaluated. This solution works with XSLT 2.0 (+ Saxon extension). Passing more values in one sequence is a problem of itself, because empty values and values, which are sequences themselves, have to be encoded (more about this in section section “Tuples as maps”, the approach used there can also be applied for the encoding problem here). However, XSLT 3.0 offers another approach to tackle iterator expressions and free variables satisfactorily and elegantly, and those are higher-order functions.
Higher-order functions
The drafts of XPath 3.0 and XQuery 3.0 define a new kind of item in the common data model - function item. Funciton item is an item, that can be called. This makes functions in XPath and XQuery first-class citizens that can be results of expressions and passed as arguments to function calls, making XPath/XQuery full-fledged functional languages (technique for using higher-order functions also in XSLT 2.0 was shown in Novatchev 2006, but that is far from native). A prototype implementation of HOF is available in Saxon since version 9.3.
The OCL iterator expressions can be viewed as functions, which expect another function
as their parameter - i.e. higher-order functions[2]. In iterate
expression, body
can be looked upon as a
parameter, which expects a function. Figure 10 shows how
iterate
is implemented in OclX. We also used the new instruction -
xsl:iterate
- which is also proposed in the XSLT 3.0 draft and has a very
similar semantics to its OCL namesake.
The function's body
parameter expects a function with two bound variables,
the first one representing the accumulator, the second one the member of the iterated
collection for this iteration. Also, the definition of function oclX:exists
in
Figure 11 shows that this definition is general enough to be used
to define other functions in its terms. The implementation of closure
is
similar to that of iterate
, only it uses recursion.
Now let us compare the higher-order function approach to the dynamic evaluation approach. As in OCL, there is no limit on the number of free variables in XPath function items, so the mapping is seamless in this respect. Furthermore, the semantics of free variables in a higher-order function is that their values are obtained from the context at the place of the call - the same semantics OCL uses for free variables.
As an icing on the cake, we can utilize the reflection function
function-arity
(supported for function items) to allow iterator expressions
with more than one iterators directly (not via replacing them by nested calls). Figure 12 shows a multiple-iterator version of exists
(which is one of the few iterator expressions, which allows multiple iterators by
OCL
specification). We use two helper functions: getIndices
(which returns the n-th
number in a system with a given base
in the form of a zero-filled sequence) and
functionItemCall
, which assigns the values to the iterators and calls the
higher-order function. However, we are forced to fix the maximum amount of iterators
allowed, because a call of function item requires a fixed amount of parameters (of
course,
we could prolong the last if
as we like).
Local variables
The issue of local variables is closely related to iterator expressions, because they
are one of the two ways how local variables can be defined. Apart from iterator expressions
(which define local variables implicitly), OCL also contains let
expression,
which defines a variable valid in the scope of the expression. The semantics of local
variables is similar to that of XPath - the value is assigned in the moment of declaration
and does not change from that place on.
In XPath 2.0, the same effect can be achieved using for
, but only when the
value of the local variable is not a collection (for
would only perform one
cycle). In other cases, we must descend to inlining the value (since it is immutable).
Alternatively, when the variable is defined in the scope of the invariant (making
it global
variable for the whole invariant), it can be translated into XSLT variable definition.
XPath 3.0 again makes the translation easier, because it allows let/return
expressions, with semantics eqivalent to OCL let
.
Tuples as maps
In this section we propose how to deal with tuples (anonymous types) from OCL.
OCL allows the designer to combine values in expressions into tuples. Tuples can be
considered an anonymous type, which has a finite number of named parts. An example
of a tuple
may be Tuple { firstName = 'Jakub', lastName = 'Malý', age = 26 }
. The values of
the parts may be of arbitrary type, including collections and other tuples. The names
of parts
(firstName
, lastName
, age
in the example) must be
unique and are used to access the parts of the tuple in the expressions, similarly
to
attributes of classes, i.e. it is possible to write employees->collect( e | Tuple { name
= e.name, salary = e.salary })->select( t | t.salary > 2000)
, the result of this
expression would be a collection of tuples.
There is a noteworthy property coming with the possibility to create tuples: together
with
tuples, OCL was equipped with the operation product
, defined as follows:
product(c1:Collection(T1), c2:Collection(T2)) = self->iterate(e1; acc = Set{} | c2->iterate (e2; acc2 = acc | acc2->including( Tuple{first = e1, second = e2}) ) )
The result of product
is a collection of type Collection(Tuple(first:
T1, second: T2))
, which contains all possible pairs where the first compound comes
from collection c1
and the second collection c2
. This operation thus
finalizes the suite of equivalents of the constructs required for a language to be
relationally complete (see Codd 1972):
-
Select - can be expressed using
select
iterator expression, -
Project - can be expressed using
collect
iterator expression that creates a tuple with the projected attributes (see theemployees
example above, which, in fact, performs projection to attributesname
andsalary
) , -
Union - OCL has union operation as well,
-
Set difference - OCL has operation '
-
' working on sets, -
Cartesian product - can be expressed using
product
, -
(Rename) - can be expressed using
collect
in the same manner as project operation.
Thus, not only tuples can be used to write more concise expressions, but, together with the operation product, they increase the expressive power of the language to relational completeness.
As far as XPath 2.0 is concerned, there exists no construct that could be naturally
used
to represent tuples, they can only be partially simulated. One possibility is to use
sequences, where each item in the sequence corresponds to one part of the tuple, i.e.
Tuple { firstName = 'Jakub', lastName = 'Malý', age = 26 }
would be represented
as a sequence ('Jakub', 'Malý', 26)
.
But this solution is not completely satisfactory from the following reasons:
-
We loose "safety" and clarity in the expression, because we have to write
$t[1]
to represent the OCL expressiont.firstName
. -
When some part is missing (which is in OCL indicated by
null
value), we have to use some placeholder value to keep the length of the sequence constant (e.g.('null', 'null', 26)
). -
A part of a tuple in OCL can be of any type, including other tuples and collections. Here, this approach fails utterly, because all sequences are flattened in XPath. This also makes implementing
product
operation impossible, because it should return a collection of tuples, i.e. a collection of sequences.
Instead of representing tuples using sequences, an alternative would be to represent them using temporary documents, for example:
<Tuple> <firstName>Jakub</firstName> <lastName>Malý</lastName> <age>26</age> </Tuple>
This approach would overcome the first issue (t.firstName
would be
represented as $t/firstName
), the second issue (an empty element could represent
a missing part) and also the third issue (nesting is no problem here and collections
could be
encoded into trees as well). However, it brings two problems of its own:
-
The value which is about to become a part of a tuple, is copied to a temporary XML document. This would not hurt so much with atomic values, but would be a significant overhead, when the value was a node or a sequence of nodes in the input document (whole subtree would be copied in this case).
-
When a node is copied into the temporary document, its position in the document is lost, it would not be possible to navigate outside its subtree (e.g. using
parent
axis) and operations relying on node identity (such as theis
operator) would give unexpected results when applied on nodes from the input tree and from the temporary tree.
To conclude, restrictions inherent in XPath 2.0 data model prevent from satisfactory representation of tuples with semantics corresponding to OCL.
However, the possibility of extending the XPath/XQuery model with another item type
- map
- is being discussed and a prototype implementation is now available in Saxon (an alternative is to use XQuery implementation of maps using function items instead
of map
items, for details see Snelson 2011). The map type requires atomic values for
keys and allow items of any type as values. These properties of map type make it a
great
candidate for representing tuples, thus, we decided to create a version of OclX which
supports
tuples represented as maps. Strings containing the name of a tuple part can be used
as keys
(and the names of parts must be distinct in an OCL tuple as well). The tuple from
the example
would be represented as map{'firstName' := 'Jakub', 'lastName' := 'Malý', 'age' :=
26}
, expression t.firstName
would be represented as
$t('firstName')
. A value in a map can also be another map or sequence, which is
consistent with semantics of OCL tuples. Operation product
can be defined either
by translating the definition from specification (using two nested iterates
) or
via a much more succinct expression:
for $e1 in $collection1 return for $e2 in $collection2 return map{'first' := $e1, 'second' := $e2}
Different kinds of collections, nested collections
OCL defines an abstract type Collection and 4 different types of collections - Set, OrderedSet, Bag (in other languages sometimes called multiset) and Sequence. A member of any collection can be an arbitrary value, including another collection.
We treat all collections as sequences in OclX, yet it would be possible to represent the other kinds of collections using maps (or sequences as well).
Nested collections are a foreign concept to XPath data model. The disadvantages of encoding collections into temporary documents were discussed in section section “Tuples as maps”.
With the introduction of maps, there is a, rather ugly, way of encoding nested sequences
-
thanks to the possibility of using maps as values and members
of sequences. A nested sequence ((1,2),(3),()) could be encoded to map{'s' := (map{'s'
:= (1,2)}, map{'s' := (3)}, map{'s' := ()})}
, the expression returning number two
would be written as (map{'s' := (map{'s' := (1,2)}, map{'s' := (3)}, map{'s' :=
()})})('s')[1]('s')[2]
. Using this approach also requires the functions which
concatenate sequences not to use XPath operator ,
, because it flattens the
resulting sequence.
The double indexing (first to get the value from the map, second for getting the desired
member of the sequence) can seem confusing, but it can be in fact hidden behind a
library
function at(i)
(which OCL uses instead of the operator [i]
to get an
i-th member of a sequence). The wordy and a bit unclear way of creating a nested sequence
appears in those expression that create nested sequences using literals. This, however,
could
be eliminated by using preprocessing of the schemas before evaluating the expressions
(adding
another step to Schematron pipeline).
Error recovery
OCL as a language has a direct approach to "run-time" errors or exceptions. Errors
in
computation cause the result of the expression to be invalid
- a sole instance of
type OclInvalid. It conforms to all other types and any further computation with
invalid
results in invalid
- except for operation
oclIsInvalid
[3], which returns true
, when the computations results in
invalid
and false
otherwise. This operation thus provides the
only, very coarse-grained error checking (there are no error codes or exception types)
available in OCL. Unlike OCL computation, XPath/XSLT 2.0 processor halts when it encounters
a
dynamic error and there is no equivalent of oclIsInvalid
. It is also not possible
to instruct it to jump to the validation of the next IC when a validation of an IC
fails.
XSLT 3.0, however, introduces new instructions - xsl:try
and
xsl:catch
- which provide means of recovery from dynamic errors. With these
instructions, it is possible to implement oclIsInvalid as listed in Figure 13. We, again, utilize higher-order functions capabilities - the
expression is evaluated in a function call wrapped in try/catch. OCL expression
oclIsInvalid(1 / 0)
can be translated to oclX:oclIsInvalid(function() { 1
/ 0 })
. Optionally, our validation pipeline (fully introduced in section “Implementation”) allows to safe-guard the evaluation of each expression using
try/catch, so that the validation of another constraint may continue if a runtime
error occurs
and it is not contained by oclIsInvalid
.
Implementation
We implemented the work presented in this paper into our XML schema modeling tool eXolutio, the workflow is depicted in Figure 14. The tool incorporates a UML and OCL editor for both PIM and PSM schemas and implements algorithm for suggesting/translation of relevant constraints from PIM to PSM (steps 1-3, not covered in this paper) and translation from OCL (4) to Schematron schemas. The user may choose between schema-aware and non-schema-aware (which add data conversion for extracting typed values from the XML document) schema and between implementation of iterator expressions using dynamic evaluation or higher-order functions. The generated schema can be then used to validate an XML document. XProc pipeline is then used to perform the validation. It first executes the transformation steps from standard Schematron pipeline (5.), adds includes for OclX library (6.) and then validates the document (7.) with the resulting XSLT. The pipeline expects the schema (5.) and validated document (8.) on its input ports and writes validation result - a SVRL document - to its output port (9).
The tool itself (incl. examples), OclX library and the XProc pipeline are all available
for free download on the tools website. Finally, Figure 15 depicts the
translation of integrity constraints from the examples in Figure 6
(schema-aware, iterators via higher-order functions). The error messages defined in
OCL
translated into schematron error messages and their subexpressions into value-of
instructions (see translation of PSM1).
Expression rewriting and syntactic sugar
In many cases, iterator expressions and functions, can be expressed using a native
XPath
construct, e.g. forAll
can be expressed as every/satisfies
, select
by application of a filter etc. The current version of the algorithm does not provide
any of
such rewritings, but in the following work, we plan to examine the possibilities of
such
rewritings and offer them to the user, where applicable.
OCL expressions have a useful property that they can be read from left to right thanks
to the arrow notation '->'
for collections and dot
'.'
for calling operations . When translated to XPath, the
readability decreases (the object before the arrow/dot corresponds to the first parameter
of
the function. Also, the notation of higher-order functions in XPath is not so transparent.
If
desirable, it would be possible to preserve OCL notation even for Schematron schemas
(and
change it to usual XPath notation in another preprocessing step).
A valued feature of Schematron is the possibility to specify human-friendly error messages for violated integrity constraints. However, OCL does not provide such functionality in the current version of the standard (2.3.1). Since custom error messages were strongly advocated (e.g. in Malaika 2009), we extended recognized OCL grammar to allow specification of error messages.
Conclusion
In this paper, we inquired into the possibility of using OCL for modeling integrity constraints over XML data. We have discussed several ways of how non-trivial OCL expressions can be expressed using XPath with our proposed extension in a form of a XSLT function library OclX. The library itself contains a useful set of functions, which can be used on their own and provide a handy toolset with considerable expressive power. However, the main contribution is the possibility to check OCL integrity constraints in XML data. Our approach allows using the integrity constraints defined in the UML model also for XML, which supports consistency of the system as a whole (the constraints do not have to be rewritten by hand and managed manually after each change). It helps to bridge the gap between the platform-independent conceptual model and its applicability at runtime.
Since translated OclX constraints are, in structure, very simlar to OCL constraints, they will be easy to comprehend even for a modeller less familiar with XML technologies.
In the future, we want to thoroughly examine the possibilities of rewriting the expressions in order to use native XPath constructs to express iterator expressions where possible and allow the user to choose from several ways of expressing a certain constraint. As another part the future work, we plan to extend our approach for schema evolution and document adaptation (see Maly 2011) by utilizing OCL integrity constraints.
Acknowledgement
This work was supported by GAČR grant no. P202/10/0573.
References
[OCL specification] OCL Specification, version 2.3.1. Object Management Group (OMG). January 2012. http://www.omg.org/spec/OCL/2.3.1/
[Dresden OCL] Software Technology Group at Technische Universität Dresden. Dresden OCL - OCL support for your modeling language http://www.dresden-ocl.org/
[Pagano 2009] Pagano, Dennis, and Anne Brüggemann-Klein. Engineering Document Applications — From UML Models to XML Schemas. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). http://www.balisage.net/Proceedings/vol3/html/Bruggemann-Klein01/BalisageVol3-Bruggemann-Klein01.html doi:https://doi.org/10.4242/BalisageVol3.Bruggemann-Klein01.
[Bauman 2009] Bauman, Bruce Todd. “Prying Apart Semantics and Implementation: Generating XML Schemata directly from ontologically sound conceptual models.” In In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). http://www.balisage.net/Proceedings/vol3/html/Bauman01/BalisageVol3-Bauman01.html. doi:https://doi.org/10.4242/BalisageVol3.Bauman01.
[Nečaský 2012A] Martin Nečaský, Jakub Klímek, Jakub Malý, Irena Mlýnková, Evolution and change management of XML-based systems, Journal of Systems and Software, vol 85. 2012, ISSN 0164-1212, doi:https://doi.org/10.1016/j.jss.2011.09.038. http://www.sciencedirect.com/science/article/pii/S0164121211002524
[Nečaský 2012B] Martin Nečaský, Irena Mlýnková, Jakub Klímek, Jakub Malý, When conceptual model meets grammar: A dual approach to XML data modeling, Data & Knowledge Engineering, vol 72. 2012, ISSN 0169-023X, doi:https://doi.org/10.1016/j.datak.2011.09.002. http://www.sciencedirect.com/science/article/pii/S0169023X1100125X
[Mandel 1999] L. Mandel, M. Cengarle. On the Expressive Power of OCL. In FM'99 Formal Methods, volume 1708 of LNCS. Springer Berlin / Heidelberg, 1999. http://dl.acm.org/citation.cfm?id=730476
[Saxon] Saxonica. Saxon - the XSLT processor. http://saxonica.com/
[EXSLT] C. Stewart, C. Bayes, J. Fuller, U. Ogbuji, D. Pawson, J. Tennison. EXSLT - community initiative to provide extensions to XSLT. http://www.exslt.org/
[Novatchev 2006] Dimitre Novatchev. Higher-Order Functional Programming with XSLT 2.0 and FXSL. In Extreme Markup Languages 2006, Montréal, Québeck. http://conferences.idealliance.org/extreme/html/2006/Novatchev01/EML2006Novatchev01.html
[Snelson 2011] J. Snelson. An implementation of a red/black tree built using XQuery 3.0 higher order functions, and an associative map library built on top of it. https://github.com/jpcs/rbtree.xq
[Maly 2011] J. Malý, I. Mlýnková, M. Nečaský. XML Data Transformations as Schema Evolves. Proceedings of ADBIS 2011: Advances in databases and infromation systems. September 2011. Lecture Notes in Computer Science. Springer. http://dl.acm.org/citation.cfm?id=2041783
[eXolutio] eXolutio - a Tool for Conceptual Modeling of XML. http://eXolutio.com
[Malaika 2009] S. Malaika, Ch. Pichler. Enforce basic document structure with XML constraint checking. IBM. http://www.ibm.com/developerworks/library/x-consthealth/index.html
[Codd 1972] E. F. Codd. Relational Completeness of Data Base Sublanguages. In: Database Systems, Prentice Hall and IBM Research Report RJ 987, San Jose, California, 1972.
[1] Examples for this paper are based on Mazeiar Salehie's OCL Tutorial.
[2] Yet OCL does not have the full power of functional languages, since it does not allow functions to be results of expressions.
[3] To be accurate, another operation - oclIsUndefined
- behaves equally to
oclIsInvalid
when the argument is invalid
, but it also returns
true, when the result of the computation is null
.