The Organization for the Advancement of Structured Information Standards
The OASIS membership consortium was first founded as SGML Open in 1993, years before the XML syntax was even considered. At the 1998 meeting where the organization was being renamed for fear of being tied to SGML, there was a push to use “XML” in the name. But the organization avoided being typecast again by adopting the new name put forward by Jon Bosak, the father of XML. Indeed, OASIS does not exist only to develop XML standards. Jon chaired the first meetings to develop what has become the “Technical Committee Process” for the running of groups of members creating work products for localized or global use. OASIS has matured into a world-class standards development organization, backed by dedicated and talented staffers who are supporting a myriad of committees that are creating standards being used internationally.
Vocabulary ecosystems have developed around the work products of many OASIS technical committees using this process, including those for office documents (OASIS ODF - Open Document Format), for technical documentation (OASIS DITA - Darwin Information Typing Architecture, and OASIS DocBook), and for business documents (OASIS UBL - Universal Business Language).
The OASIS TC Committee Process is recognized for its quality by other standards developing organizations in that an OASIS Standard is automatically accepted for consideration should it be put forward to, for example, ANSI in the US (the American National Standards Institute), or ISO/IEC internationally (the Joint Technical Committee 1 of the International Organization for Standardization and the International Electrotechnical Commission). Accordingly for ISO/IEC, as a Publicly Available Specification (PAS) submitter, the OASIS organization and its committee process gives a community a pathway to ISO standardization for their normative work products.
The Universal Business Language (UBL) is a good example of a work product developed within the OASIS TC Process, successfully deployed around the world, and standardized as an ISO/IEC standard.
The Universal Business Language vocabulary ecosystem introduction - conveying business information
Business documents such as purchase orders, invoices, waybills, etc. are exchanged around the world. The Universal Business Language establishes a structured vocabulary for such procurement and transportation documents so that communities and users need not create such structures themselves. Moreover, interoperability is promoted when many communities base their structured business documents on the same vocabulary.
Jon Bosak, then of Sun Microsystems, established the UBL committee in 2001. Funded exclusively by the volunteer participation of committee members, the first version seen deployed publicly was 0.7 by the government finance ministry in Denmark. Unsurprisingly, the most active members of the committee at that time were from Denmark. Version 1.0 with only eight document types was released shortly after, it just wasn’t released in the time frame needed by the Danes to legislate its use in government invoicing. Based on some tough lessons learned in version 1.0, development immediately began on version 2.0 to create a framework on which to expand the scope and utility of the specification.
Now approved as ISO/IEC 19845:2015, UBL 2.1 is a family of 65 business documents around a common library of business objects. The recently released UBL 2.2 was finalized with 81 business documents and a richer common library than found in UBL 2.1. To maintain its availability and relevance, by design, each minor version of UBL is strictly backwards compatible with all previous versions in the same major version. That is, every schema-valid instance of UBL 2.0 is a schema-valid instance of UBL 2.1, and every schema-valid instance of UBL 2.1 is also a schema valid-instance of UBL 2.2. This ensures the UBL ecosystem can grow continuously and user communities can migrate organically to updated minor versions of the UBL specification without impacting on other users. Moreover, the design of UBL accommodates different communities’ requirements through a number of tailoring techniques.
The very nature of the use of business documents such as purchase orders, invoices and waybills implies the need for an ecosystem of product developers servicing end users who are needing to conduct business using a information vocabulary. Consider a choreography governing the exchanges between a Buyer and a Seller:
And there is not just the Buyer and the Seller in a business scenario. Consider the many roles described by UN/CEFACT, the United Nations Centre for Trade Facilitation and E-Business, outlining a number of possible roles engaging in the Buy-ship-Pay process in addition:
Regardless of the sector environment, business information is conveyed from a sending role to a receiving role as a transaction within a profile of choreography. The sender has its own business practices developed over time to meet its obligations. The receiver could have very different business practices because its obligations and its history differ from the sender. Traditionally the exchange of the paper business document bridges the two environments.
Employing a digital exchange removes the challenge of printing and interpreting the printed content, though it does not remove the challenge of starting off with correct information. But if the information is correct, then using digital technologies can drastically reduce the opportunities for incorrect information ending up in the receiver’s business practices. The sender marshals their information out of their application into the syntax that is transported to the receiver who unmarshals the information from the syntax into their different application. The choreography doesn’t change and the business practices don’t change, but the integrity of the information exchanged is greatly improved.
All of the aspects described above fit into the ISO/IEC Open-edi Reference Model, ISO/IEC 14662, first developed starting in 1992. While the abbreviation for “electronic data interchange” historically is often associated with financial information, Open-edi has always been agnostic of the nature of the information being exchanged. From the introduction of ISO/IEC 14662 one reads:
The field of application of Open-edi is the electronic processing of business transactions among autonomous multiple organizations, authorities or individuals within and across sectors (e.g. public/private, industrial, geographic). It includes business transactions which involve multiple data types such as numbers, characters, images and sound.
The Open-edi Reference model is independent of specific:
information technology implementations;
business content or conventions;
business activities;
parties participation in business activities.
In this depiction, the Open-edi reference model is described the left column. The centre column outlining the components of an Open-edi configuration is from ISO/IEC 15944-20. The right column enumerates specifications available to address the two Open-edi aspects of information representation: bundles of semantic content, and data in syntax.
Open-edi describes two “views” of electronic business (the rows in the diagram): the business operational view and the functional services view. The business operational view (BOV) describes the abstract properties of the environment, the scenarios, the roles in the scenarios and the bundles of information conveyed between roles. The functional services view (FSV) describes the concrete machine-processable properties of user data representation of information bundles, the choreographies engaged by the roles in the scenarios of the environment and the transport of the content between the parties.
Also shown in the diagram, in particular in the rightmost column, is the bridging of the business specification of the information objects and definitions to the machine-processable specification of the binding of the information objects to actual syntax representations suitable for applications to produce and ingest. The two examples of syntax-independent information bundle description technologies cited are the UN/CEFACT Core Component Technical Specification (CCTS) https://www.unece.org/cefact/codesfortrade/ccts_index.htmland the Unified Modeling Language (UML). The three examples of syntax technologies cited are the text-oriented XML and JSON, and the binary-oriented ASN.1. The technology that bridges the two is the set of naming and design rules governing creating from the business view of information bundles (the models) the functional view of user data (the syntax).
This bridging is accomplished in a rigourous mechanical fashion, producing robust and accurate document constraint expressions without the need for hand-crafting. For UBL, the technical committee formalized and standardized the OASIS Business Document Naming and Design Rules (NDR) http://docs.oasis-open.org/ubl/Business-Document-NDR/v1.1/csprd01/Business-Document-NDR-v1.1-csprd01.html for the application of CCTS and the realization of schema artefacts from declarative models of the information. As an example of the work product of one OASIS technical committee being used by another, these NDR are also being used by the OASIS Business Document Exchange Technical Committee (BDXR) for work on the business document envelope and exchange header envelope projects.
Information described by CCTS takes three forms to be expressed as a hierarchical tree of business objects. The Aggregate Business Information Entity (ABIE) is the shape of the branch of the information tree. The Association Business Information Entity (ASBIE) is an instance of the branch of the information tree. The Basic Business Information Entity (BBIE) is a leaf of the information tree. CCTS modeling is not based on syntax, thus allowing different syntactic expressions of the information tree. The UBL TC has normatively standardized on an XML serialization of the CCTS information tree, and has published non-normative alternative expressions of UBL in JSON schema for JSON syntax, and in ASN.1 binary syntax.
With this standards-based foundation used to create the comprehensive UBL specification, considerations must be made when deploying the work in difference scenarios across the world-wide ecosystem.
Deploying the Universal Business Language vocabulary across the ecosystem
The UBL Technical Committee recognized that even when two communities are using the very same UBL structures, the business contexts of those communities will govern different values to be used. These values might be in code lists, identifier lists, contextual value constraints, etc. Accordingly, the only two normative components of the UBL standard are the semantics of the standardized constraints, and the document and business structures expressed in XSD schemas. There are no enumerations in any of the UBL schemas. Only the structures are standardized, not the values that go into those structures. Business value constraints can change on a daily or even hourly basis and it would not be at all desirable to require schemas to be modified and reintegrated into production processes so rapidly.
Accordingly, the UBL committee non-normatively suggests that UBL documents run through a two-pass validation phase before an application code acts on the content. In this diagram, phase 1 shows the application of the structural constraint checks (both element structure and the lexical element/attribute content structure) using XSD, and phase 2 shows the application of value constraint checks for example in XSLT:
The use of ISO/IEC 19757-3 Schematron is common in UBL communities for the expression of the value constraints. To help with the generation of the Schematron expressions, the OASIS Code List Representation Technical Committee has published the genericode XML vocabulary for the expression of lists of coded values, and the Context/value Association (CVA) XML vocabulary for the expression of XPath contexts to which genericode and other value expressions are applied. Free tools are available to transform the CVA and genericode files into Schematron, and then translate the Schematron into the XSLT for runtime use.
But a common issue among new users or by communities considering UBL is almost always raised regarding the magnitude of the published specification. Why is the UBL vocabulary so big and how can it be used effectively?
Enabling communities to work effectively with UBL
When Jon Bosak founded the UBL committee, he was fully aware that one committee’s definition of the information components for electronic commerce would never be able to meet every business requirement globally. Nor should it try to do so, though the effort can be made to support as many as possible. However, such a vocabulary can have particular features that would allow the vocabulary to be a basis on which every business document information requirement globally could be accommodated. The resulting specification for UBL accommodates all of this, and version 2.2 of UBL includes 81 document types and 4600 distinct semantic business objects realized as elements in those document types.
Firstly, consider the Pareto principle, also called “the 80/20 rule”. UBL is designed with the Pareto 80/20 principle in mind: the committee believes that 80% of world business can run with only 20% of the UBL business objects. The other 80% of UBL exists in order to address less-common but still accepted business requirements for the defined document types. This enables yet more of world business to work with the standardized UBL business objects, though most people won’t need them. Moreover, the design of UBL incorporates user-defined extensions available to address in a standardized UBL document all of the remaining unaddressed requirements not available in UBL business objects. Finally, the common library utilized by the UBL document types is available to be used by user-defined document types that are not included in the UBL suite. All this should allow the UBL vocabulary to find a home in all business environments.
To manage these three concepts, the nomenclature used in UBL deals with extension schemas, subset schemas, and additional schemas.
To accommodate business objects that are not found anywhere in UBL, the user community can create extension schemas and embed content conforming to those schemas. Every UBL document type has an extension point as a home for arbitrary content from multiple sources. Under this extension point is a scaffolding of metadata describing the apex of an information structure of arbitrary XML content. A sending application adds the extension information under the extension point, and the receiving application looks under the extension point only for those extensions that it recognizes. All unrecognized extensions in a UBL document are ignored by the processing application.
In UBL the extension point is the very first child of the document element in every document type. This is important for streaming applications to be able to consume and consider all extension information before encountering standardized content just in case the extension content impacts on the semantics to be interpreted by the receiving application.
There are some non-UBL business concepts that have already been standardized outside of OASIS and have had established XML schemas developed under the formal governance of others. It is not UBL’s intent to re-express those concepts using CCTS. Rather, the extension point is a home for XML constructs from foreign vocabularies using non-UBL namespaces or no namespaces. An example of this is digital signatures. The UBL Technical Committee has published the scaffolding necessary to embed W3C Digital Signature structures, using the W3C namespaces and structures and schemas, inside the extension point of any UBL document.
But for those non-UBL business concepts that have not already been standardized elsewhere, users need to be able to augment the UBL document to include such information in their own extension. While there is no obligation to use CCTS, doing so is consistent with the rest of UBL. Moreover, the user community may wish, then, to submit their CCTS-based designs to the committee for consideration under UBL’s governance rules. The hierarchical tree structure of an extension with custom information for a line item is depicted in the following diagram.
Note in that diagram how the line item identifier is copied into the extension so
that the
customized information in the extension can be associated with the standardized information
in
the UBL business objects. Being considered for the future UBL 2.3 is making each and
every
aggregate extensible by having an optional <ext:UBLExtensions>
element at
every branch of the tree, not only at the document element. This contextualizes the
extended
information at the location of the UBL structure where the standardized constructs
are being
augmented. This relieves the need to use other means by which to associate the extended
information at the beginning of the document with the standard information found deep
inside
the document. This was proposed for consideration after receiving feedback from implementers
regarding some awkwardness, though not technical deficiency, of the current approach
to UBL
extensions.
To user communities considering adopting UBL, a challenge of a vocabulary with 4600 distinct semantic business objects is the determination of the base 20% that applies to their situation, and which of the other 80% might also apply. To address this the community needs to create a subset schema for their users. With a subset schema, every schema-valid instance of the subset schema is also a schema-valid instance of the full UBL schema, but the end-user dealing with the subset is not overwhelmed by the entire UBL suite. However, communities need to remember that subset schemas should play only a limited role in a deployed solution.
Consider Postel’s law, cited in http://www.cookcomputing.com/blog/archives/000551.html, that states:
In general, only a subset of a protocol is actually used in real life. So, you should be conservative and only generate that subset. However, you should also be liberal and accept everything that the protocol permits, even if it appears that nobody will ever use it.
Jon Postel, 1979, re: TCP/IP
This protocol-related principle can be applied conceptually to an XML vocabulary ecosystem. The senders of UBL should be constrained by the subset set of constraints, but the receivers should not be so constrained. Receivers should be accepting all of UBL because there is no guarantee that only users of the subset will be sending them content. Through some stage of value validation (perhaps by Schematron creating XSLT or culling the input instance of undesired constructs and then using the subset schema) the UBL-valid document can be checked before the receiving application acts on the content. This is shown in this diagram:
Finally, user communities can create additional CCTS-based document types that share the use of the common library of aggregate (ABIE - branch shapes), association (ASBIE - branch instances) and basic (BBIE - leaf instances) information entities. Additional schemas importing the UBL common library into CCTS-based non-UBL documents can also incorporate non-UBL supplemental library constructs. These supplemental library constructs can use common library constructs, but common library constructs cannot be modified to use the supplemental library constructs.
Of course using an abstract modeling technique, such as CCTS in the case of UBL, begs the question of how to get the actual runtime validation artefacts expressed according to the naming and design rules. These are the schemas and other constraint expressions that applications will use in the generation and validation of the syntax. The UBL Technical Committee uses free tools available on GitHub to create XSD schemas, OASIS Context/value Association expressions, and JSON schemas. Depicted in this diagram is the CCTS model collaboratively modified by committee members as a Google Docs spreadsheet, downloaded as an OASIS ODF spreadsheet, transformed into an OASIS genericode serialization that is, then, transformed into the many artefacts published by the committee.
See http://goo.gl/DgMAqy for a description of the process and links to the free tools used to create the validation artefacts.
This resulting environment effectively services a global community of users using UBL in different ways, while still retaining a base level of commonality and conformance. Much like the web world. Jon Bosak has said that he wanted UBL to be “the HTML of e-commerce”: a commonly-understood freely-available base vocabulary on which user communities can tailor their specific solutions without the overhead of starting from scratch. And, also, for end-users to leverage work products created by a cadre of global and regional experts who have followed an open and transparent process using effective development tools.
Leveraging the OASIS TC process, tools and resources to create UBL
The quality and global acceptance of the UBL committee work products are evidence of the results of collaborating within an effective standards development process.
Important in any development of such an open specification, in order to gain the trust of potential users in the ecosystem, are three critical aspects: governance, transparency and availability. The rules of engagement and obligations by contributors are formalized by the governance of the project. The openness of the development process to public scrutiny is needed for transparency. The openness of the work product is characterized by its unfettered availability (recognizing that even “mandatory registering for a free copy” is a barrier to availability).
The internationally-recognized OASIS Technical Committee (TC) process at http://www.oasis-open.org/policies-guidelines/tc-process (accredited by ANSI in the US and ISO as suitable for creating national and international standards) is an ideal framework under which one would create and run a committee of members publishing work products for local or global use.
Jon Bosak chaired the first OASIS committee to hone the definition of the TC process. The objective he set was to be general enough that “if Japanese subway operators wanted to get together to create an XML vocabulary for interchanging scheduling information, the process should be straightforward and flexible enough that they would find a home at OASIS to do so”. (Author: I don’t think any Japanese subway operators actually did so, but it exemplified the kind of framework OASIS was striving for.)
The process has matured and become very successful, and OASIS offers assistance to technical committees to help promote membership in the TCs. And the legal counsel at OASIS has ensured the important issues of copyright and intellectual property rights involved in group developments of open-use standards are appropriately accommodated by member participation agreements and by non-member submission agreements. Such gives confidence to the user community to exploit OASIS work products without concerns of losing their investment in the technology by claims from third parties.
Having worked out such IPR issues, the TC process and procedures protect the work product from being blind-sided by IPR claims (provided that the TC members respect their membership obligation and members of the public only use the Public Comment list to submit, which has obligations built in to subscribing to the list). The OASIS process dictates that all meeting agendas, minutes, TC mail list and documents be transparently open to the public at all times. OASIS puts no encumbrances on using the work products, not even “register to use”, and puts all work products in the publicly-accessible file repository. Ownership of the resulting specification rests with OASIS, but the specification is fully open.
The charter for a new technical committee needs to spell out the purpose of the new group and the expected work products. The required five member companies needed to form a committee must be identified, and it is in the interests of stakeholders to find at least another three charter members, hopefully more. If one had only a particular geographic focus for the economic sector, some interest in participating might be raised internationally if other geographical areas had similar interests, thus making the new committees work products globally-interesting.
The OASIS TC process for public review and creating a committee specification is extensive and rigourous. The TC administration support of wikis, JIRA ticket management (very important for building and maintaining the specification), a document repository and a file repository are all available to use by a TC at no charge. Other development tools are also available. There is no software to install or maintain. Public visibility is mandated for all of the projects actions: meeting agendas and minutes, member rosters, discussions, document drafts and final versions, committee specifications and distribution artefacts.
A technical committee can be arranged with subcommittees responsible for certain domains, and the subcommittees make recommendations to the technical committees to include in deliverables.
Given that OASIS is an accredited ISO/IEC JTC 1 Publicly-Available Specification (PAS) submitter, the option is there to make a work product an ISO standard. For example, UBL 2.1 is now ISO/IEC 19845:2015, a recognized ISO Standard. ODF is another example of an OASIS Standard that has become an ISO standard, initially ISO/IEC 26300:2006 and now split into many parts.
All committee work must be performed transparently. One can find the UBL 2.1 vocabulary information model, expressed using CCTS, at https://docs.google.com/spreadsheets/d/1amzk8jn1boD2q3ze9rR14PVB6OGDyHTc2pQl92JutvE/view. The use of Google Drive allows international members of the committee to collaboratively edit the content simultaneously. For archive purposes, periodic snapshots of the ever-changing live document are made and stored in the OASIS repository. This ensures a transparent history of the evolution of artefacts and prohibits the modification of the historical artefacts after the act of publishing.
Using the UBL TC example, these are the OASIS artefacts and resources related to the essentials of open specifications development:
-
governance: the rules of engagement and development
-
transparency: the committee member mail list
-
availability: the artefact repository (no registration required)
-
intellectual property: https://www.oasis-open.org/policies-guidelines/ipr
These are the resources available to the user community:
-
committee home page: announcements and overview
-
charter: focus of the committee objectives
-
outside input: the public comment list
-
an unmoderated developer community mail list:
-
a community web site (overseen by a UBL subcommittee)
-
a Wikipedia entry (overseen by a UBL subcommittee)
These are the resources available to committee members but publicly visible in the interests of transparency:
-
a document repository for interim committee work and development snapshots:
-
a JIRA issue tracker:
-
a source development GitHub repository
-
a collaborative wiki
-
a committee calendar:
-
a balloting framework:
-
an XML document publishing vocabulary and publishing stylesheets
Challenges faced by the UBL committee
Overall, the biggest challenge faced by the committee is the availability of time for individual members to contribute to the efforts. Membership in the committee has waned and waxed. Non-voting members have access to all tools and have all messages pushed to them. Voting members participate in ballots regarding committee direction and work product development. Voting privileges are accorded to those members who are actively participating in meetings. Voting privileges are lost when active participation drops, until they are easily recovered by participating once again.
Understandably, this always is a factor of members’ management’s commitments to volunteering their staff to an effort that may only be indirectly benefiting their organization. As demonstrated by Denmark in the early days, their determination to shape UBL into a useful tool for their immediate objectives justified their contribution of time and effort in participation in the community. The end result met their requirements while at the same time established interoperability with others who also decided to base their work on UBL. Those considering participating in the committee can point to this success when presenting their rationale to their own organizations.
To be fair to all, face-to-face meetings need to be held in turn at locations around the world. This can significantly add to the costs of participation and to the time taken away from one’s organizational obligations. The frequency of international meetings depends on the need for productive time together as a group.
Logistically, the globally-distributed membership presents a challenge for all members to speak together at the same time between the face-to-face meetings. This is addressed by splitting a single weekly meeting into two teleconferences: the Pacific Call and the Atlantic Call. The Pacific Call is attended by members in North and South America and the Pacific Rim. The Atlantic Call is attended by members in North and South America and in Europe. Both calls are held on the same Wednesday considering UTC time, which for those in North and South America puts the Pacific Call on their Tuesday evenings. Preliminary discussions and proposed decisions are tabled during the Pacific Call for subsequent discussion and change or endorsement by the Atlantic Call. This is not perfect, as decisions can be postponed if important issues are raised during the Atlantic Call without consideration by those in the Pacific Call.
This shifts a lot of responsibility to members to use the many tools made available by OASIS. The tools themselves work well and are well-maintained by OASIS staff, and a lot of effort is put into making many and varied tools available to committees who may work better in one way or in another. And the tools reinforce the transparency to the public regarding the inner-workings of the committee and the decisions being made. But, in particular for UBL, it is a challenge to get members to use JIRA tickets effectively to appropriately record their observations and proposed dispositions of issues that are raised. As mentioned above, the work products present abstract business concepts and the technical artefacts are synthesized using software being maintained by very few. The value in UBL is in what UBL defines, not in the artefacts that support that definition. Not all committee members are well versed in the online collaborative tools, nor have they developed the discipline to use the tools effectively and in a timely fashion.
Open and free standards, all for only the price of membership
A common thread in all of this is the yeoman effort made by Jon Bosak to create an effective standards development process and to use that process to create a world-class work product collaborating with a diverse team of dedicated committee members who value their influence on creating the specifications.
While there is zero cost to obtain or use OASIS work products, and zero cost to publicly comment on OASIS work products, there is a justifiable cost of membership to participate directly in the OASIS standardization TC process. All of the enumerated list of tools and support environments comes just with the creation of a new technical committee, being supported by OASIS TC Administration, and so justifies the cost of active participation (see https://www.oasis-open.org/join/categories-dues for details).
And while the OASIS Universal Business Language (UBL) is very large and encompassing of many of the world’s business document information requirements, the vocabulary design methodology can accommodate an ecosystem’s requirements through a base definition, an extension methodology, a subset approach, and the leveraging by additional business documents. It has become, indeed, the HTML of e-commerce.
Together, the process and the end result illustrate the creation of and the use of a world-class markup vocabulary ecosystem that can be mimicked when addressing one’s own requirements for such a solution.