Gotti, Fabrizio, Kevin Heffner and Guy Lapalme. “XSDGuide - Automated Generation of Web Interfaces from XML Schemas: A Case Study for
Suspicious Activity Reporting.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). https://doi.org/10.4242/BalisageVol15.Gotti01.
Balisage: The Markup Conference 2015 August 11 - 14, 2015
Balisage Paper: XSDGuide – Automated Generation of Web Interfaces from XML Schemas: A Case Study for
Suspicious Activity Reporting
Kevin Heffner is President of Pegasus Research &
Technologies, a Montreal-based company specialized in flight
simulation and training, constructive simulations, unmanned/autonomous systems
and command & control.
Guy Lapalme is Professor of Computer Science at the Université de Montréal
(Laboratory for Applied Research in Computational Linguistics), where he has
been a faculty member since 1980. He is a leading expert in the computer
processing of human language. He has published on many aspects of the subject
including spelling correction, dictionary editing, text generation, automatic
summarization, information extraction, opinion mining and machine translation
tools. His career combines innovative research and outreach to the practical
world through long-term collaboration with partners from both the academic and
industrial worlds. Recently, he was awarded an Honorary Doctorate from the
Université de Neuchâtel (Switzerland) and Lifetime Achievement Award from the
Canadian Artificial Intelligence Association.
This article presents XSDGuide, a
software prototype aimed at facilitating the creation of user interfaces consistent
with a data model expressed as a set of XML schemas. XSDGuide was developed while
researching intelligent user interfaces for data entry associated with the
production of Suspicious Activity Reports (SARs) conforming to NIEM-SAR, an
XML-based information-dissemination framework. These SARs communicate potentially
suspicious or unlawful incidents to the appropriate authorities. The XSD schemas
defining a specific SAR are fed to XSDGuide, which then automatically creates user
interface guides, rendered on a web page. The user can interact with this
application to populate the report’s fields, validate the SAR being created and save
the report as a valid XML instance. Validation is a two-step process, where a
JavaScript ruleset created from the schema pre-validates the document in the browser
before it is sent for full validation to the back end, which relies on a traditional
full-fledged validator. Despite the prototype’s limitations, the HTML interfaces
that are generated allow users to inspect and become familiar with complex schemas
and also to produce validated XML instance documents for the purposes of
experimentation and testing.
Suspicious activity reporting refers to the process by which
members of the law enforcement and public safety communities as well as members of
the
general population communicate potentially suspicious or unlawful incidents to the
appropriate authorities. This reporting has been identified as one part of a broader
Information Sharing Environment (ISE) as defined by [1]. The ISE
establishes a framework to support reporting, tracking, processing, storage and
retrieval of terrorism-related suspicious activity reports (SARs). The ISE initiative
builds upon the foundational work by the US Departments of Justice and Homeland Security
that have collaborated to create the National Information Exchange Model (NIEM), which
has received approval from the governments of the US [2] and of
Canada [5].
SAR is one of a set of messages that is supported by the NIEM. In particular, the
NIEM
has developed a specific model for suspicious activity reporting, the NIEM-SAR [3]. Preliminary NIEM-SAR prototypes have shown great promise for
information sharing for a broad range of activities, but several areas requiring
improvement were noted in December 2011[1]. In particular, faster response times are needed to get information into the
system, to process the information and to make it available to users. It is noteworthy
that SAR capabilities already are functional in an operational capacity in the US
in
some local jurisdictions. However these systems lack the ability to process information
automatically and therefore require significant manual intervention in data centers.
The
current work proposes the use of adaptive user interfaces as a potential means for
reducing the workload related to producing and processing SAR data.
The main functionality of the XSDGuide prototype presented in this paper is to assist
the user in the creation of valid suspicious activity reports (SARs) compliant with
the
NIEM-SAR framework. In so doing, it allows the user to become familiar with the business
rules in a manner that is more efficient than browsing the XSD documents.
The design and implementation presented here do not make any assumptions about who
the
user is, although they fall into two broad categories. The first category includes
users
registered with an agency and who have known skills, proficiency and expertise. They
have specific access and privileges according to the role that they play in their
organization, e.g. a police officer, an airport security agent. The second category
consists of users not registered with an agency and who may or may not have Public
Safety and Security domain-specific knowledge or skills. It is assumed that the
generalized case for an unauthorized user is that it is someone from the general
public.
In the following sections, first the NIEM-SAR framework is described, including the
implementation constraints faced by a SAR authoring tool. Then the general architecture
of the prototype is presented as well as the various interface guides that are created
based on the input XSD schemas. Afterward, the XML validation and file saving steps
are
described. Finally, the last section describes XSDGuide’s limitations and suggests
perspectives for future work.
NIEM-SAR and Suspicious Activity Reporting
Information Exchange Package Documentation and XML Schemas
Using the NIEM-SAR framework involves creating and using an IEPD (Information
Exchange Package Documentation). An IEPD is designed to transmit informational needs
for a given domain, and is typically created by experts of this field. Their task
is
to build a data model describing the environment in which suspicious activity
reporting occurs. Dedicated software tools are used to create IEPDs, for instance
Cameo Enterprise Architecture with the NIEM plugin[2].
The IEPD is a zip archive containing XSD schemas capturing the data model, as well
as extensive documentation. Instances of these schemas are the actual suspicious
activity reports (SARs).
Only a few NIEM-SARs are freely available on the web. Here are some
examples:
“Suspicious Activity Reporting (SAR) for Local and State Entities IEPD
v1.1.1” from the Bureau of Justice Assistance (BJA).[3]
“Suspicious Activity Report” from the “Texas Department of Public Safety,
Crime Records Service” [5]
They are quite complex, both because of the large number of types they define and
because the SARs they propose are quite intricate. To give an idea of the
elaborateness of this architecture, the IEPD “ISE-FS-200-version-1.5 Suspicious
Activity Reporting (SAR)” mentioned above contains 74 XSD files defining 196 simple
types and 658 complex types. IEPDs make heavy use of inheritance (with abstract
classes) and substitution groups. The package is organized so as to include numerous
libraries from the NIEM core types, from which these customized classes are derived
or augmented.
Reconciling the need to comply to such a standard with the need for timely
creation of SARs is quite delicate.
The Case for an Enhanced User Interface
The creation of XML instances meeting the constraints expressed in XML schemas is
not a new problem. Existing solutions go from the very simple text editor to the
dedicated IDE.
Over the last few years, the Eclipse IDE[6] has developed extensive tools to manipulate XML, including the creation
of XML instances validated by schemas. XML editors like Oxygen[7] are extremely helpful in guiding the creation of XML instances (like
SARs). Oxygen notably offers context-dependent autocomplete suggestions,
documentation and live validation of the document. The latter is an excellent
solution for IT specialists, but becomes quite difficult for the average person.
Oxygen does offer an MS-Word-like author mode that works very well for a set of
recognized schemas (such as Docbook), but reverts to tag-based editing when a
document associated with an arbitrary schema is created.
XSDGuide’s General Architecture
The general principle behind the prototype XSDGuide is shown in Figure 1.
An XML schema (XSD format) is first selected by the user who wants to create a
suspicious activity report conforming to that schema[8]. As mentioned in the introduction, these schemas are rarely standalone and,
for instance, in the NIEM ecosystem they usually come packaged as IEPDs. These IEPDs
are
zip archives that contain, among other documents, the SAR schema as well as any
necessary XSD schemas imported through import or include
statements. These imported documents are the required libraries on which the SAR schemas
are built, and act somewhat as an SDK. As long as the XSD imported schemas can be
found
using the specified absolute or relative URLs, XSDGuide can readily manage such an
archive.
Once the schemas are read, two additional inputs are needed: The user is prompted
to
specify which one of the schemas contain the root element, and what this element is
within the file. XSDGuide uses the schemas provided to build three components (middle
of
Figure 1).
User interface guides are created for elements defined in the
schemas. These guides are at the heart of the prototype. Each one holds all the
information required to create a user-friendly UI element. They mostly correspond
to
information for a given element (XML schema base type, cardinality, etc.) but not
always. For instance, an <xsd:choice> corresponds to a UI guide
allowing the user to pick one of the elements proposed by the choice. Importantly,
the
guides maintain information about their child guides too. This hierarchical organization
mirrors that of the XML schema. It is noteworthy that these guides are independent
of
the rendering medium. They are abstract in their nature, and could be rendered in,
say,
a standalone application or a web page. Their HTML rendering is described in Section
section “HTML Rendering and Data Entry”.
An XML document can be created on demand whenever the user needs
the prototype to create an actual instance document (a SAR in our case), driven by
the
schema provided earlier. Each element of the document is tied to the specific user
interface guide (see previous paragraph) that facilitated its creation. Ultimately,
the
user interacts with the rendered UI guide in order to inject values into the
corresponding XML element.
A validator built on Java’s XML validation library is also
constructed from the schema(s) provided. This is the most straightforward use of such
a
schema, and it allows the prototype to validate the SAR being built by the user.
Validation messages are presented to the user, in order to elicit an appropriate
response.
HTML Rendering and Data Entry
XSDGuide strives to facilitate the creation of suspicious activity reports (SARs).
To
achieve this, the various UI guides created from the underlying XML schema(s) must
be
rendered in a user-friendly way, while at the same time
enforcing the various constraints expressed in the schema.
It is important to note that our library is not tied to any specific rendering of
the
interface guides. Indeed, one of XSDGuide’s design principles was to create a Java
library that would handle most of the processing associated with the tasks at hand.
The
user-visible part could then either be materialized as a standalone, Swing-like
application or, as we did here, as a web application.
We opted for the latter solution, because we felt that an HTML rendering lent itself
naturally to the representation of nested elements (the XML nodes). Since there were
also time constraints to the coding of the application, HTML provided a way to
fast-track the development of an aesthetically pleasing GUI. A web application has
other
advantages, including portability, across all operating systems and most hardware,
including smart phones. This portability is typically difficult to achieve using the
usual GUI SDKs, including Swing. Moreover, the majority of users are already familiar
with web applications (e.g. Gmail, Facebook, etc.).
The main drawback of this approach is the necessary separation of the implementation
logic between server-side and client-slide elements, as well as the additional
networking component between the two.
Figure 2 shows the complete web interface created by
XSDGuide from a single XML schema file (SAR-RALI.xsd, see code below for an
excerpt). We created this schema for illustration purposes in this article. It is
a
simplified schema allowing the creation of basic SARs, while preserving the general
philosophy and terminology of the SAR schemas found in IEPDs. It is worth noting that
XSDGuide can fully process the latter IEPDs.
The same interface adapts itself to smartphone screens, as seen Figure 3.
Its navigation bar features the following menus.
New report: Creates a new SAR based on the XSD schema selected (as well as
on the specified root element).
Schema manager: Allows the user to upload new XSD schema or XSD schema
archive (.zip) to the application. See Section section “Schema Management”.
Switch to simple view: Toggles between simple and advanced views.
More rendering examples are available on our website[9].
<!-- Excerpt of schema SAR-RALI.xsd used as the running example here. Two complex elements are listed. -->
<xs:element name="SuspiciousActivityReport">
<xs:annotation>
<xs:documentation>A structure that describes a SAR Report </xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element ref="sarrali:Metadata" />
<xs:element ref="sarrali:Data"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="Metadata">
<xs:annotation>
<xs:documentation>A structure that describes Metadata about a related SAR</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element ref="sarrali:UniqueId"/>
<xs:element ref="sarrali:Title"/>
<xs:element ref="sarrali:SubmissionSystem"/>
<xs:element ref="sarrali:Author"/>
<xs:element ref="sarrali:CreationDateTime"/>
<xs:element name="DisseminationCriteria" type="sarrali:DisseminationCriteriaType"/>
<xs:element ref="sarrali:RelatedSarList" minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:element>
Application Architecture
Figure 1 shows the general client-server
architecture implemented by XSDGuide. The back end implements most of the
XML-related logic, including the creation of the XML document, the management of
schemas and the exploration of their constraints. These features are made possible
by Java libraries from the Apache Xerces™ Project[10]. The back end also includes a lightweight web server (implemented with
Apache Jetty) responding to queries made from the front end, written in JavaScript
and leveraging the popular frameworks JQuery[11] and Bootstrap[12].
Implementation Details
For this project, even if we are writing a web application, we opted to implement
most of the logic within the Java backend. This means that most of the models for
the XML document and the corresponding schemas are maintained there, and that the
HTML rendering is carried out in the backend. This is consistent with the fact that
we wanted to be able to render the UI guides within the Java library we created,
instead of relying too heavily on the client-side JavaScript to carry out this task.
Moreover, we wanted this rendering step to be as “close” as possible to the Java
models and validators to simplify the design.
The interaction steps involved when creating a new document can further explain
XSDGuide:
The user visits the page and the Jetty server produces an interface
essentially made out of static elements. HTML elements are laid out with
Bootstrap, which simplifies the creation of the UI, and provides an elegant
responsive interface (e.g. for smartphones).
When the user clicks “New report”, JQuery is used to create an AJAX
request to the backend to create a new element. The request specifies the
name of the XSD file stored in the backend (e.g. SAR-RALI.xsd)
and the root element of the XML document (e.g.
http://rali.iro.umontreal.ca/sarrali:SuspiciousActivityReport).
The server parses the corresponding schema, and creates a new XML document
in memory. This is done using the Apache Xerces API and the
javax.xml package, which both play key roles in this
project. The server then returns the id of the new document, as well as the
id of the newly created root element.
The JavaScript client asks for the HTML rendering of the elements it
wants to display (here, the root element as well as any non-optional child
elements).
The server replies with a snippet of HTML for each element. This snippet
includes the rendering of the associated interface guide (see following
section), as well as additional information on the possible child elements
and attributes. The JavaScript library is responsible for parsing and
positioning this code snippet within the HTML document. It also adds event
listeners to the guide in order to validate the data the user enters into
the newly created controls.
The backend offers a few simple services, called with AJAX from the user’s
browser. These include the standard CRUD operations on an element, as well as the
validation operation on the document. The validation process is described in Section
section “XML Validation”.
When implementing the HTML form elements needed by the prototype, we briefly
considered using XForms[13], which provides sophisticated form markup to gather, validate and
process XML data within web pages (among other document types). However, to our
knowledge, none of the popular web browser supports XForms natively, and the W3C
recommendation seems to have been eclipsed in part by HTML5 controls. The latter
were also investigated and, while they provide valuable support for the validation
of some constraints, they cannot implement some of the simplest XSD rules. For
instance, an HTML5 input control with a type of “number” cannot constrain the number
of digits after the decimal point (fractionDigits in XSD). For these
reasons, and simply because we wanted to retain full control over this
implementation, we used traditional web forms augmented with JavaScript
controls.
Interface Guides
This section explains how XSDGuide transforms XSD-defined constraints into usable
UI
guides for the end user. The rendering of these constraints is a—sometimes
difficult—compromise between enforcement of the constraint expressed in the schema
and
the need for an accessible user interface.
General Principle
The general principle of the UI guides is to establish a mapping between an
element or attribute type (whether it be named or anonymous) and a specific UI
widget or widget group. For instance, if an element type is derived (either directly
or not) from the base type xs:string, then it makes sense to render it
as text field in HTML. Furthermore, if additional constraints (e.g. a regular
expression pattern) control the content of the element, it is desirable that these
constraints be present when the user fills out the fields so as to provide valid
information as early as possible in the SAR authoring process.
Element Nesting
Element nesting (e.g. a complex type containing sub-elements) is presented as a
set of nested html elements, so that the user understands the compositionality of
the complex elements. For instance, in Figure 2, a
SuspiciousActivityReport root element is composed of an element
Metadata, clearly visible as a nested box inside the element
SuspiciousActivityReport. The element Metadata also
contains sub-elements. The simple-typed sub-elements (e.g. UniqueId)
appear as simple text fields inside Metadata, while complex ones (e.g.
SubmissionSystem) is a sub-box.
To allow the user to customize this nested view, a triangle icon to the right of a
complex element’s name hides or shows the child elements.
Element Documentation
Element documentation included in xs:documentation schema elements are presented
to the user either as text (under the element heading) or as tooltips when hovering
over fields corresponding to element information. When collecting
xs:documentation, we traverse the complete type hierarchy for a
given element to gather as much documentation as possible, from the base class down
to the current element. Additional help is provided by the prototype itself, for
instance when an xs:choice is encountered, in order to help the user
make sense of the choice that is presented to them (see Section section “xs:choice and Substitution Groups”).
Number of Occurrences
XML schemas specify the number of occurrences of attributes and elements, through
different types of rules. Examples of this are xs:sequence rules, where
each child element can appear from 0 to any number of times. By default, the minimal
number of occurrences and the maximum number of occurrences is set at 1. They can
often be overridden using minOccurs and maxOccurs
attributes.
These limits are not all explicitly stated to the user. For instance, when a new
element is created, all elements whose minimal number of occurrence is greater or
equal to n are also created n times. In Figure 2, for
instance, the creation of the SuspiciousActivityReport causes the
creation of one Metadata element and one Data element. The
Metadata element is also populated likewise recursively.
When the schema allows the user to pick the number of elements, the user can click
on links like the one labeled “Add new RelatedSarList (optional)” in Figure 2. The element is then dynamically added to the
current report (and possibly populated with mandatory sub-elements). If the user
tries to add more elements than allowed by the schema, a warning appears.
The user can also delete unwanted elements by clicking a “Trash” visible when the
user hovers over an element. The latter is then removed dynamically from the report.
The user cannot remove more elements than the schema allows.
The Java back end naturally mirrors the changes made in the web page, by creating
and deleting elements in its in-memory representation of the SAR.
The same is true of attributes whose presence or absence can be customized (in
this case, the occurrence count is either 0 or 1).
Enumerations
In suspicious activity reports, there are numerous places in the schema where
experts in safety and security have elaborated exhaustive lists pertaining to the
description of entities. For example, there are 26 possible colors of gun finishes
defined by NIEM-SAR. It is critical that such enumeration be presented in a
user-friendly way. The current implementation translates enumerations into simple
dropdown lists. A tooltip presents the documentation for each element of the
enumeration, when it is available.
Data Entry Widgets
One way of minimizing the risk of entering invalid data in a SAR is to provide
widgets and UI cues guiding the input of valid values in fields. These widgets can
also alert the user when a value is incorrect as soon as a field loses focus.
We put a lot of effort in detecting the base type for most simple elements in
order to implement these UI guides. For instance, a field based on an
xs:id type will alert the user when the id provided is not unique
in the document. Figure 4 shows some UI guides for some of the
primitive types referenced by an XML schema.
xs:choice and Substitution Groups
xs:choice rules and substitution groups are schema constraints that
differ in nature but are rendered similarly in the user interface. This is an
interesting instance where the potential complexity of the schemas is hidden from
the user, who sees two different constraint types expressed in the same way: a
simple logical disjunction (an or).
A choice model group (xs:choice) is used within a complex type to
specify a set of element types from which a single element can be selected. A
substitution group consists of a set of element types. When an element type
associates itself with a substitution group (by specifying a
substitutionGroup attribute), it is a valid substitution for the
referenced element type.
Figure 5 shows the listing (top) defining the type
LengthType for our schema. This type includes an
xs:choice alternation. The figure shows the interactions the user
can have with the control derived from this type. The user can either specify the
height of an individual as a MeasurePointValue (a single value) or as a
MeasureRangeValue (a range).
XML Validation
XML Validation ensures that the suspicious activity report being written conforms
to
the underlying XML schema or schemas. In XSDGuide, it is a two-step process. It first
involves the logic built into the front end, then, if no errors are found, that of
the
back end. Figure 1 shows the two constituents.
Validation is invoked when the user clicks on the navigation bar item “Validate XML”.
The user is oblivious to whether the error messages emanate from the front end or
the
back end. In both cases, they are shown at the top of the page.
Validation Carried out by the Front End, in the User’s Browser
When the front end is built, not only are visual elements laid out for the user to
interact with, but validation rules are created in the JavaScript logic running in
the user’s browser. These rules are built client-side by relying on information
provided by the server indicating the base type of the field (XSD’s built-in
datatypes), as well as additional restrictions.
Here are some examples of the rules implemented for elements and
attributes.
When elements have a number of occurrences of at least one, or when
attributes are marked required, then the front end will check for their
presence.
Types deriving from xs:id are checked to make sure they
are well-formed and unique in the document
xs:idrefs should reference an existing xs:id
in the document
Decimal and floating-point numbers are deemed valid if they are
consistent with possible minimum and maximum values.
Regular expressions restricting the content of text-based data are
used to validate strings.
The validation feedback for an element of type xs:id is shown in
Figure 4 (top). Whenever an error is found for a specific
field, it is highlighted in red and a short description of the problem is presented
to the user.
Validation taking place in the front end is especially concerned with the data
entered in the different fields provided to the user. In other words, the structure
of the document itself, e.g. the nesting of elements and their respective number of
occurrences is not validated client-side. Indeed, the user would be hard-pressed to
find a way to circumvent those rules while creating a report, since interactions
that would create such validation errors are prohibited.
Consequently, when validation is invoked by the user, the front end checks if data
entered in each field is consistent with the rules found for it. These rules were
manually crafted for most data types and elements, but still constitute a
best effort. Indeed, the Apache Xerces validator in the
back end is bound to be run on the document when the front end has deemed it
error-free (see following section).
The advantages of first running the validation on the front end are twofold. This
scheme allows for a quasi-immediate response from the browser, without having to
send the document over the network and wait for the validator messages. Moreover,
this validation can be carried out interactively as the user is typing data, which
allows quick rectifications of the data just entered, fresh in the user’s
mind.
Validation Carried out by the Back End, XSDGuide’s Java Engine
As mentioned in Section section “XSDGuide’s General Architecture”, XSDGuide
builds a full-flegdged XML validator from the XML schema(s) selected by the user to
create his SAR. This validator is put to good use during this second step, and any
remaining validation errors are captured and sent back to the user.
At this point in the development of the prototype, this validation still leaves
room for improvement. The principal problem is that, when validation fails, the
error messages are not clearly tied to the offending field(s) (contrarily to the
messages produced in the step described in the previous section). See Section section “Current Limits and Perspectives” for more on this.
Saving the Suspicious Activity Report
At any time during the creation of the SAR, the user has the possibility to save his
report by clicking the appropriate navigation menu item. This triggers the download
of
the XML document being edited. The document root contains the association to the
corresponding XSD schema, through an xsi:schemaLocation attribute. The URL
of the schema points to the XSDGuide server, which acts as a schema server, for the
referenced schema and its possible XSD dependencies.
This allows the validation of the SAR using external tools, such as <oXygen>, which
dereferences the schema URL and proceeds with validation.
Schema Management
In order to demonstrate the versatility of XSDGuide, we implemented a feature allowing
the user to upload his own schema (or schema archive) in order to create SARs based
on
new XSD schemas. The user only has to click the “Add XML Schema” to upload a new XSD
file. He also has the possibility of uploading and entire zip archive containing the
XSD
file as well as its dependencies (mainly specified through import and
include statements). Uploading an entire archive is quite useful in our
cases, since most complete SAR schemas are saved in IEPD zip files (see Section section “NIEM-SAR and Suspicious Activity Reporting”).
Whether uploaded alone or alongside its dependencies, each XSD file is validated
before the operation can proceed. The validation consists in the compilation of the
XSD
file using the relevant Apache Xerces functions. If validation fails for at least
a
file, the operation aborts and the user is shown the offending file name and validation
error(s).
Current Limits and Perspectives
In its current stage of development, XSDGuide is still a prototype, and our effort
focused on making sure that most NIEM-SAR IEPD schema rules are recognized and correctly
processed. However, the XSD standard taken as a whole is quite vast, and consequently
there are various XSD validation rules that are yet to be implemented. Furthermore,
some
features are lacking from the overall application.
XSD Rules to Implement
Some of XML schema’s constraints are not yet implemented in XSGuide. They were
either rarely seen in the IEPDs we worked with, or posed difficult ergonomics
problems. We describe some of them below and give an idea of the prevalence of these
rules in the IEPD “ISE-FS-200-version-1.5 Suspicious Activity Reporting (SAR)”
mentioned in Section section “NIEM-SAR and Suspicious Activity Reporting”.
The subtle distinctions between the text-like types string, ncname, nmtoken,
token have not been implemented. Only xs:token and
xs:string are used in the IEPD ISE-FS-200.
Some facets for numbers and text (e.g. whitespace, length,
totaldigits) are incomplete.
The regular expression language used in XSD to validate text content has not been
entirely ported from the schema to the user interface. This proved difficult because
the W3C XML Schema standard defines its own regular expression flavor, and some
patterns cannot be copied verbatim from the schema specification to JavaScript. For
instance, the range subtraction construct ([...-[...]]) does not exist
in JavaScript. For now, only simple regular expressions are copied from XSD to
JavaScript. Only one pattern is used in the entire ISE-FS-200 IEPD.
For the time being, the number of occurrences for model groups xs:sequence,
xs:choice or xs:all can only be 1. There are no
xs:choice or xs:all rules in the example IEPD and the
cardinalities for xs:sequence is always 1. However, the IEPD contains
56 substitution groups. Consequently, we focused our efforts on these use-cases.
Expanding on this to include other cardinalities should not be difficult.
Elements with mixed content (mixed="true") constitute a particularly
arduous constraint when it comes to producing an appropriate UI guide. Fortunately,
they rarely appear in IEPD (they are absent from all IEPDs we studied). Nonetheless,
they represent an interesting challenge.
An element with mixed content may contain text, usually interspersed with nested
elements. The UI guide should make it clear that the user can type arbitrary text
and that he can insert nested elements within that text. The current interface
choices implemented by XSDGuide make it difficult to provide this type of guide. We
could have provided the user with the possibility to insert tags inside his free
text, but we opted to avoid their use as much as possible, since they require a
level of computer literacy that is not to be expected from the average user.
Figure 6 shows an idea for a mixed content guide. A text
area allows the user to enter free text, and buttons allow the creation of nested
elements within the text. Whenever the user clicks on these nested elements, the
complete element appears below the text area, and behaves like any other element
guide.
Additional Features
SAR Loading
For now, the most important feature lacking from the prototype is the ability
to load a previously saved SAR. While it is possible to create an XML instance
of a given XSD schema, the interface does not allow the user to open such an
instance and edit it in the interface. There are no specific conceptual hurdles
to implementing this, it is simply that we could not complete this feature
within the short timeframe allotted for this project. Obviously, such a feature
is essential if our prototype is to be rolled out in a production
setting.
Validation Feedback
While validating, the feedback provided by the back end is too generic and
does not indicate clearly to the user the offending fields or values. Contrarily
to error messages provided by the front end, these messages do not come with a
visual feedback including the highlighting of the fields at the origin of the
problem. This is indubitably disconcerting to the user.
Traditional text-based XML editors like <oXygen/> do not suffer from this
problem, since the validation API provided by Apache Xerces associates line and
column numbers to each validation problem. The editor can then highlight the
problem in the code. In our case, we cannot benefit from such clear indications,
since the SAR document is not text-based: it is kept as an in-memory DOM. One
solution to this is to inspect the post-schema-validation
infoset (PSVI)[14] provided by the API. After validation, the PSVI includes
assessment outcome information that can offer the
validation status of some elements and attributes. It then becomes a matter of
mapping these statuses back to the interface so that the user understands the
corrections needed.
Other Schemas
In theory, XSDGuide is not tied to a specific schema. In practice however, it has
been designed to implement constraints found in our test set. The limitations one
is
bound to encounter when loading new schemas in our prototype have been outlined
earlier in this article. Beside the (admittedly important) fact that not all
constraints are implemented, other considerations are to be examined in order to
tackle non-SAR schemas.
One of the most complex problems we see is that XML schemas can be used (and
abused) to encode data models in ways that do not lend themselves well to the
automated generation of a user interface. For instance, an XSD file may encode a
data model featuring multiple inheritance through custom-made elements that only
make sense to the application that created the schema. One way to encode such “proprietary”
information is through the
<xs:appinfo> element in XSD. For instance, a software tool could create
<xs:appinfo> sub elements like <myapp:baseclass
qualifiedname="basetypename"> to achieve a data model with multiple
inheritance. XSDGuide’s corresponding interface would not be able to translate this
clearly, simply because these extra layers of meaning are obviously not accessible
to the schema processor.
In these cases, it’s difficult to imagine how a program like XSDGuide could be
useful. Additional resources would need to be provided in addition to the XSD
schema. Creating a generic tool in these conditions becomes arduous, if at all
possible.
Conclusion
The XSDGuide prototype we have presented here was aimed at facilitating the creation
of suspicious activity reports by public safety and security experts as well as by
members of the public, in a timely fashion. Moreover, one of our aims was to design
a
tool allowing users to inspect and understand complex schemas by using familiar user
interface controls.
In spite of some limitations, we feel that the prototype is sufficiently developed
to
clearly showcase the possibilities that intelligent user interfaces offer to achieve
these goals. XSDGuide proposes a way to materialize schemas created within the NIEM-SAR
framework into a concrete user interface in a web application. The latter can be used
to
create validated SARs but also to explore the data model defined by these schemas
by
parsing these constraints and rendering them in a uniform, user-friendly manner.
A formal evaluation of the prototype (probably after some improvements whose nature
is
outlined in Section section “Current Limits and Perspectives”) should be carried out in order
to objectively assess the usefulness of the software. This evaluation could measure
the
time needed to create the same SAR using XSDGuide versus a more traditional approach.
The quality of this SAR should be evaluated as well. Ultimately, however, the approach
we propose here can only be judged when it is integrated in the full information
processing pipeline. This pipeline goes from the creation of the SAR, to the
data centers where information is stored and cross-referenced,
and back to users in the fields in the form of notifications, warnings, etc.
A recurring question during the development of the software presented here is the
quality of the standards used. While NIEM-SAR is undoubtedly an exceedingly well
thought-out framework, the complexity that arises from such exhaustiveness can be
perplexing for the authors of SARs. Moreover, some of the implementation choices in
XSD
are debatable. For instance, certain elements allow free text when they should probably
have been enumerations, or complex types constrain the order of sub-elements when
it is
unnecessary. These questions arise simply because creating schema-backed XML documents
is an excellent way of putting these schemas to the test.
An interesting perspective to the project is the collection of data through the
creation of SARs. With such a tool as our prototype, it does become possible to envision
a SAR creation campaign soliciting the help of interested parties (e.g. law enforcement
agencies). Such data could prove invaluable in the creation of additional guides during
SAR creation, like autocomplete features based on previously entered values. XSDGuide
would then act as a “bootstrapping” tool in the implementation of a more advanced
SAR
authoring tool.
References
[1] Information Sharing Environment (ISE) Functional
Standard (FS) Suspicious Activity Reporting (SAR) Version 1.5
[2] Adoption and Use of the National Information Exchange
Model (NIEM)