Introduction to the DITA Vocabulary
The DITA standard (Darwin Information Typing Architecture) was originally developed within IBM in the late 1990s as an XML application to support the authoring and production of modular documentation, especially documentation for IBM software and hardware products intended primarily for web delivery. DITA builds on architectural ideas developed for IBM's IBM ID Doc document type, which had been developed in the early 1990s as an SGML replacement for IBM's GML-based BookMaster product. IBM donated DITA to OASIS Open in 2003 and DITA 1.0 was published as an OASIS standard in 2005. The current version of DITA 1.3, published in 2015. The DITA Technical Committee is currently working on DITA 2.0.
DITA is used widely in a number of industries, including software, hardware, publishing, and government.
DITA's driving requirements are:
-
Modularity: The ability to author atomic units of content that stand alone and that can be reused in different contexts. DITA calls these modules "topics".
-
Reuse: The ability to reuse content at either the module (topic) level or at the element level. Reuse can be within a single publication or across multiple publications.
-
Interoperability: The ability for documents with different local document types and different element types to be used together within the same publication and to be processed with the same set of tools with a minimum of document-type-specific code.
-
Hyperlinking: The ability to create rich hyperlinks within and across modules.
Around 1989 a meeting was held among the major software vendors of the time, including IBM, Digital Equipment, HP, Group Bull, and others, hosted by Fred Dalrymple and chaired by Eve Maler, with the goal of defining a common markup vocabulary to enable the interchange and interoperation of documentation among the various vendors. Eliot Kimber and Wayne Wohler from IBM attended. IBM's takeaway from the meeting, based on the initial analysis prepared by Maler and Jeanne El Andaloussi, was that there was a core set of elements common to all documents: titled divisions, paragraphs, lists, figures, tables, etc., but that every group used different names for these elements.
Out of this meeting many of the attendees went on to develop DocBook through the Davenport group. Wohler and Kimber formed the core members of the team at IBM that developed IBM ID Doc, along with Don Day, the founding Chair of the DITA Technical Committee, and Simcha Gralla.
IBM was and continues to be a federation of many different divisions, acquired companies, product groups, and so on, all of which have both common requirements and local requirements.
Our experience with BookMaster, which was a centrally defined and managed monolithic application intended to be used across all groups within IBM, was that the monolithic approach did not work at scale and was unnecessarily restrictive. By the time we started developing IBM ID Doc, BookMaster reflected more than 600 element types across a number of distinct information and publication types and was still growing. We needed something that would satisfy the common requirements and ensure consistency and interoperability of content and supporting tooling without limiting the ability of individual groups to quickly satisfy their local requirements.
The recognition of a universal base set of semantics coupled with HyTime's architectural forms facility gave us the answer: as long as all elements ultimately map back to one of the base types and conform to the minimal content model and attribute requirements of the base types, interoperability and interchange would be assured, while still allowing different groups to optimize the markup they use with minimal constraints. This idea then became the basis for IBM ID Doc.
BookMaster also had fairly sophisticated reuse and hyperlinking mechanisms, at least for the time, and those requirements were also supported in IBM ID Doc, updated to take advantage of SGML technology and HyTime's features for enabling linking, addressing, and re-use in an SGML context.
IBM ID Doc used the architectural forms mechanism from the ISO/IEC HyTime standard to define a layered architecture by which a core set of base element types could be formally extended to define new element types that were processable in terms of their base types. As an SGML standard, HyTime required the use of SGML-specific features, such as SGML declarations, features that were not retained in XML.
DITA replaced IBM ID Doc's HyTime-based architectural forms with a simpler mechanism that uses attributes to declare each element's type and relationship to its base types. This is DITA's @class attribute, which simply specifies the ancestry of a given element as an ordered sequence of tokens, one for each ancestor and one for the element type itself. The syntax of the DITA @class attribute was designed specifically to work with CSS attribute selectors, in particular, the "~=" (token) selector.
This formal declaration of ancestry means that every element can be understood and processed in terms of any ancestor (or itself) by simple inspection of the @class attribute value, avoiding the need for more complex declaration mechanisms as used in HyTime and IBM ID Doc, at the cost of requiring every element to carry the @class attribute or for documents to be parsed with grammars that can supply default values for attributes.
One interesting side effect of this attribute-based declaration mechanism is that DITA documents do not require grammars to be processed, or even necessarily to be validated, as all the information needed to understand any conforming DITA document in terms of its DITA-defined semantics is explicit in the document itself. Any conforming DITA document can be transformed into a document where all the element types are base types defined in the DITA standard, which can then be validated against the standard DITA grammars, enabling validation of conformance to at least the minimum requirements defined by the DITA standard.
There are several intended audiences for DITA customization:
-
People configuring their local DITA environment to reflect local requirements by doing "configuration", for example, omitting DITA modules that are not needed. This audience is not necessarily a dedicated DITA practitioner or document type designer.
-
People configuring their local DITA environment to add additional constraints on top of existing DITA document types, basically a continuation of item (1) ("constraints"). This requires more grammar facility but can be supported through interactive tools as the activity is fundamentally the process of either removing things you don't want, changing repeating OR groups to sequences, or making optional elements or attributes mandatory.
-
Specialists defining new structural types (maps and topics) or new mix-in modules (domains), that provide new attributes or element types that are specializations of existing types ("specialization"). This requires more traditional document type analysis and implementation skills, although some simple types of specialization are quite easy and do not require any specialized skills beyond the ability to create simple grammar modules.
The use of customization varies widely within the DITA user community: some organizations refuse to do any customization, using the grammars as provided by the DITA Technical Committee, while others specialize almost every element type they need. Simple configuration is fairly common, in part because it is required in practice as it is a prerequisite for doing any kind of configuration or customization. Specialization is less common, although many DITA users will do simple specializations such as defining new conditional attributes. Modern DITA-aware content management systems require some configuration and specialization in order to use CMS-specific attributes and elements, such as attributes to capture CMS-specific object IDs or identifiers.
The DITA standard defines two base types of document: maps and topics.
Topics are the atomic unit of content authoring and delivery. A topic must have a title and may have a body that contains content elements and may have nested topics, creating a titled hierarchy within a single topic document. Topics may also have descriptive metadata.
Maps are collections of hyperlinks that serve to create some kind of publication structure, such as a traditional book structure, a web site, or some other structure for whatever purpose. The links within a map may be to DITA topics or to any non-DITA resource. Maps can also define links among the resources linked to by the map (external links in XLink terminology but using a different syntactic approach).
Topics can be published in isolation but are usually combined with other topics in the context of maps.
With DITA version 1.3 RELAX NG is the grammar language used for the master DITA grammars, with DTD and XSD versions generated automatically. However, most DITA users use DTD-based grammars, both for historical and practical reasons. XSD grammars are less used but are needed by tools that only understand XSD, such as some XML editors.
Because DITA relies heavily on attributes with default values, use of RELAX NG for DITA requires implementation of the RELAX NG DTD compatibility specification, which until recently was not generally available. George Bina has implemented support for DTD compatibility in Java, making it generally available to Java-based tools, which is the vast majority of DITA processing implementations. Since then, the community has started to increase the direct use of RELAX NG for DITA documents, although it is still a tiny fraction of DTD-based users.
The DITA Technical Committee implemented an RNG-to-DTD-and-XSD convertion tool for generating DTD and XSD versions of all the TC-defined modules and document type shells. This tool is available through GitHubRNG2DTD.
Modularity and Customization
DITA defines a modular architecture for grammars, independent of the grammar technology used. The RELAX NG schemas defined by the DITA standard are normative, with DTD and XSD versions generated from the RELAX NG grammars. All three grammar languages reflect the same modular architecture, although XSD 1.0 limitations on extension and override make the XSD implementation pattern slightly different from the RNG and DTD patterns, which are as similar as it is possible for them to be.
The DITA specification defines the following module types:
-
Structural modules define top-level types, either maps or topics. Map types represent top-level document types because maps cannot be literally nested within a single document instance. Topic types represent either top-level document types or subelements because topics can be literally nested within a single document instance.
-
Domain modules define sets of element types that can be "mixed in" to structural types to add new element types or attributes. The element and attribute types defined in domain modules are always specializations so they serve to extend the base grammar such that the domain-provided types are allowed anywhere their base types are allowed.
-
Constraint modules define constraints on the structural and domain types included within a given DITA document type. Constraint modules can impose any constraint as long as the result is no less constrained than the base. For example, a constraint can change an OR group into a sequence or disallow optional elements but cannot allow elements where they would not otherwise be allowed or make mandatory elements optional.
An essential aspect of the DITA architecture is that DITA grammar modules are invariant for a given version in time, meaning that every copy of a given module should be identical. That is, one should never directly modify any DITA grammar module. All customization is thus done indirectly through the customization facilities defined by the DITA specification. The invariance of modules is essential to making DITA interchange and interoperation work. It also means that, in theory, documents need only name the modules they use—processors could dynamically construct the actual grammars needed to do validation, not that any such tools have been developed to date.
The DITA standard also defines a set of grammar coding patterns that, while not normative, are reflected in the grammar modules developed by the DITA technical committee and by most DITA practitioners. This tends to make the implementation details of DITA grammars remarkably consistent across the DITA community. It also enables automated tools that can work with DITA grammars reliably.
DITA modules are "integrated" in the context of document type "shells" that serve to combine a set of either map or topic modules with zero or more domain modules and zero or more constraint modules. Map and topic types may not be combined within the same document type as map documents may not literally contain topics.
The DITA standard defines the concept of a "DITA document type", which is simply a unique set of modules.
Two documents that use the same set of modules by definition have the same DITA document type, irrespective of the actual grammar files, if any, used to validate documents.
DITA document elements use an attribute, @domains, to declare the modules used (or allowed or expected to be used) with the document. Thus any two DITA documents can be compared to determine if they do or do not reflect the same DITA document type. This makes them completely independent of the use of any particular grammar file.[1]
DITA customization involves three basic types of modification to the base declarations:
-
For any element type or the attributes @base and @props, allowing specializations of that type to occur wherever the base type is allowed (domain extension)
-
For any element type, allowing constraint of its content model
-
For any element type, allowing constraint of its attribute list
Within content models, every element type is represented by an extensible or over-ridable component: named pattern (RNG), parameter entity (DTD), name group (XSD). Individual attributes are not extensible so there is no need to represent them using extensible components.
Content models and attribute lists are defined using over-ridable components, making it easy to override them in order to impose constraints (or as easy as it can be for XSD 1.0, which is not always very easy due to limitations in the XSD redefine feature).
In addition to general content model configuration, each topic type provides an over-ridable component for defining the set of topic types that may be literally nested within the topic, if any. Each topic type module defines a default value for this component (typically just allowing the topic type to nest itself, if nesting is allowed at all) and then document type shells may override this configuration as needed.
Domain Integration
A key aspect of DITA customization is "integrating" domains.
Domain modules provide new element types that are specializations of base types (and that are not themselves map or topic or any specialization of map or topic).
Domain elements are "mixed in" such that anywhere a given domain-provided element's base is allowed the domain-provided element is allowed. This makes integration easy but means that domain elements can occur anywhere that the base is allowed, which may not always be what is desired. In this case it is possible to use constraints to limit where domain-provided elements can occur.
For example, consider a domain "dbParaDomain" that defines a specialization <para> of the base element type <p> (paragraph). When the domain is integrated into a DITA document type shell, the element type <para> will be available wherever <p> is allowed.
In DTD syntax this is done by overriding the declaration of the parameter for the <p> element to also include <para> in the document type shell:
<!-- Document type shell -->
...
<!-- Inclusion of base element type parameter entity declarations -->
<!ENTITY ... SYSTEM ...>
%...;
<!-- Inclusion of dbParaDomain parameter entity declarations -->
<!ENTITY ... SYSTEM ...>
%...;
...
<!ENTITY % p "p | %dbPara-d-p; >
...
<!-- Inclusion of base element type element type declarations -->
<!ENTITY ... SYSTEM ...>
%...;
<!-- Inclusion of dbParaDomain element type declarations -->
<!ENTITY ... SYSTEM ...>
%...;
<!-- End of document type shell -->
Where the parameter entity %dbPara-d-p is declared as:
<!ENTITY % dbPara-d-p "para" >
(The name "dbPara-d-p" is read as "specializations of <p> provided by the dbPara domain".)
Within a content model, any reference to "%p;" now expands to "p | para":
<!ENTITY % body.content
"(%p; |
%fig; |
%table; |
%section;)*
"
>
If the desire on the part of the document type shell author is to allow <para> but not <p>, that can be done in the shell by simply omitting "p |" from the declaration of the %p parameter entity:
<!-- Only allow <para>: -->
<!ENTITY % p "%dbPara-d-p; >
Now references to %p will expand to "para", not "p | para".
This omission of <p> in the shell is technically a constraint but the DITA standard does not require a separate module file for it.
RELAX NG Configuration
RELAX NG makes combining DITA grammar modules about as easy as it can be. Unfortunately, because DITA also uses DTDs and it must be possible to generate those DTDs from the RELAX NG grammars, DITA RNG grammars defined by the DITA Technical Committee cannot use RNG features that are not available in DTDs, such as <notAllowed> patterns or context-specific patterns.
However, DITA RNG grammars can take advantage of an important RELAX NG feature, the ability for one pattern to unilaterally extend another pattern. This allows DITA domain modules to be "self integrating". It is this feature of RELAX NG that motivated the Technical Committee to make RNG the master grammar language for DITA from which DTD and XSD versions are generated. Self-integrating domains make setting up new DITA document type shells about as easy as it can be for an otherwise unaided human.
Each element type has a corresponding pattern name for the element that includes the element type itself:
<define name="p"> <ref name="p.element"/> </define>
Domain modules define patterns that include all the element types in the domain that are specializations of a given base element:
<define name="dbPara-d-p"> <choice> <ref name="p"/> </choice> </define>
The domain can then extend the element type pattern using the domain-defined element choice pattern:
<define name="p" combine="choice"> <ref name="dbPara-d-p"/> </define>
Which has the effect of making the effective value of the "p" pattern:
<define name="p"> <choice> <ref name="p"/> <ref name="para"/> </choice> </define>
If the desire is to omit the base element but allow specializations, then the base type's pattern must be redefined in the document type shell:
<grammar ...> ... <div> <include href="topicMod.rng"> <define name="topic-info-types"> <ref name="topic.element"/> </define> <define name="p"> <!– No p allowed, only specializations --> </define> </include> ... </div> ... </grammar>
RELAX NG document type shells are just sets of references to modules plus any constraints that can or should be defined in the shell, rather than in separate modules. RNG shells must also provide special declarations for attributes of type ID due to a quirk in the RELAX NG design.
Because domain modules are self integrating, there is no need for separate domain integration patterns as there is for DTDs.
In addition, RELAX NG only requires a single file for each module, while DTDs require two files for each structural and element domain module, one for parameter entities and one for the element type and attribute declarations. Attribute domains only require a single file in DTD syntax and in RNG.
Map type grammars only involve the inclusion of domain modules and constraints because maps cannot nest the way topics can.
Topic modules also allow configuration of the allowed topic nesting for each topic type integrated into the document type shell:
<grammar ...> ... <div> <include href="topicMod.rng"> <define name="topic-info-types"> <ref name="topic.element"/> </define> <define name="p"> <!– No p allowed, only specializations --> </define> </include> ... </div> ... </grammar>
Here the shell simply allows the topic type "topic" to nest itself. If the shell included other topic types it could allow those to be nested as well.
Each topic type provides its own topic-type-specific topic nesting pattern, allowing different topic types within the same shell to have different nesting rules.
This is the one place in DITA where a document type shell can make the document type less constrained rather than more constrained. However, it makes sense because maps, via hyperlinks, can create arbitrary hierarchies of topics of any type, so allowing topics to literally nest within a single XML document is really more of a convenience for authoring or storage and any constraint on topic nesting imposed by a shell is not (directly) enforceable for topics combined using maps.
Constraints that are not done directly in the document type shell are done by replacing a reference to a module with a reference to the constraint module that then redefines patterns in the original module in its reference to the original module:
<grammar ...> <!– Shell for the constrained task topic type --> ... <div> <a:documentation>CONTENT CONSTRAINT INTEGRATION</a:documentation> <include href="strictTaskbodyConstraintMod.rng"> <define name="task-info-types"> <ref name="task.element"/> </define> </include> </div> ... </grammar>
Where strictTaskbodyConstraintMod.rng is:
<grammar ...> <div> <a:documentation>CONTENT MODEL OVERRIDES</a:documentation> <include href="taskMod.rng"> <define name="taskbody.content"> <optional> <ref name="prereq"/> </optional> <optional> <ref name="context"/> </optional> <!– section omitted --> <optional> <choice> <ref name="steps"/> <ref name="steps-unordered"/> <!– steps-informal omitted --> </choice> </optional> <optional> <ref name="result"/> </optional> <optional> <ref name="tasktroubleshooting"/> </optional> <optional> <ref name="example"/> </optional> <optional> <ref name="postreq"/> </optional> </define> </include> </div> </grammar>
The constraint module includes the base module being constrained, in this case the TC-defined taskMod.rng, and redefines any patterns defined within the referenced module (or any modules it references). This is an example of constraining an element's content model by overriding the element's content model pattern.
The base declaration for taskbody.content is:
<define name="taskbody.content"> <zeroOrMore> <choice> <ref name="prereq"/> <ref name="context"/> <ref name="section"/> </choice> </zeroOrMore> <optional> <choice> <ref name="steps"/> <ref name="steps-unordered"/> <ref name="steps-informal"/> </choice> </optional> <optional> <ref name="result"/> </optional> <optional dita:since="1.3"> <ref name="tasktroubleshooting"/> </optional> <zeroOrMore> <ref name="example"/> </zeroOrMore> <zeroOrMore> <ref name="postreq"/> </zeroOrMore> </define>
Comparing the two versions of the taskbody.content pattern, you can see that the constrained version omits <section> and <steps-informal> and replaces the initial repeating OR group with a sequence.
DTD Syntax Customization
DTD customization is similar to RNG customization structurally but has to account for the limitation in DTDs that parameter entities must be declared before they can be referenced and the first declaration of a given parameter entity name wins.
This means that, except for attribute domains, all modules require two files, one for parameter entities and one for element type and attribute list declarations.
In addition, domain element integration must be done in document type shells, as shown above.
Constraint modules have the additional challenge that they must declare every parameter entity referenced by the parameter entities the constraint module overrides, which can make for a lot of cutting and pasting (the RNG-to-DTD conversion tool automates this cutting and pasting for a number of constraint patterns).
Otherwise, the customization pattern is conceptually the same as for RNG:
-
Document type shells include the structural and domain modules that make up the document type, as well as any constraint modules.
-
Every element type has a corresponding parameter entity used for domain integration.
-
Every element type has corresponding %*.content and %*.attlist parameter entities that can be overridden to constrain the content model or attribute list of that element type.
XSD Syntax Customization
XSD customization is complicated by the need to use the XSD 1.0 redefine facility, which allows redefinition of groups in a way that is conceptually similar to RNG pattern redefinition.
However, the XSD 1.0 redefine feature presents a couple of challenges:
-
The feature is defined ambiguously such that different processors can implement it in incompatible ways, only one of which works for DITA, which happens to be the way that the Apache Xerces parser implements it.
-
The requirement for "particle preservation" in redefined models.
The particle preservation requirement is defined as follows: The definitions within the <redefine> element itself are restricted to be redefinitions of components from the redefined schema document, in terms of themselves. That is,
-
Type definitions must use themselves as their base type definition;
-
Attribute group definitions and model group definitions must be supersets or subsets of their original definitions, either by including exactly one reference to themselves or by containing only (possibly restricted) components which appear in a corresponding way in their redefined selves.[2]
This requires a workaround where you refactor the original sequence into a sequence of named groups that then allow redefinition.
XSD 1.1 includes a new feature, override, that allows for direct specification of the kinds of constraints DITA needs. Unfortunately, the XSD 1.1 specification is not widely implemented so the DITA standard cannot use it for TC-defined XSD grammars.
For DITA 2.0 the TC has decided to not provide modular XSD versions of the TC-defined modules, although it may provide non-modular versions as a convenience. Non-modular XSDs are XSD schemas that do not use redefine, including any constraints in place of the original base declarations, avoiding the need for redefines or overrides. It should be relatively straightforward to generate a single-file XSD version of any RNG document type shell.
See Kimber1 for details.
Interchange and Interoperability
For DITA, interchange and interoperability apply to the following areas:
-
Interchange and interoperability of documents
-
Interchange and interoperability of working grammars
-
Interchange and interoperability of processing
-
Interchange and interoperation of knowledge
Interchange and Interoperability of Documents
DITA maximizes interchange and interoperability of documents by ensuring that any conforming DITA document can be processed in at least a minimal but correct way by any general-purpose DITA processor. DITA's hyperlink-based approach for combining individual topics into complete publications allows any DITA document to be used with any other DITA documents.
DITA provides two primary forms of re-use:
-
Use of topics by reference from maps
-
Use of individual elements within topics or maps by reference.
Reuse of topics from maps is not inherently constrained, meaning any DITA topic can be used from any map. Maps can be designed to impose constraints on the kinds of topics allowed by a particular kind of reference and, through specialization of the hyperlinking elements in a map, specific structural rules can be imposed, but the vocabulary details of topics do not impose any constraints on how topics may be used from maps.
Reuse of individual elements is constrained such that a given element can only re-use an element of the same type or a more specialized type. This rule ensures that the effective document resulting from the reuse is still valid with respect to the document type of the using document. Compare with XInclude, which allows any element of any type to be used in any context where the grammar allows xi:include to occur.
The DITA standard as originally defined imposed more strict constraints on element-level reuse, requiring that the DITA document types of the two documents involved be "compatible" such that the document type of the element being reused was not less constrained than the document type of the document making the reuse reference. The intent was to ensure that constraints imposed on the using document were not circumventable by the reuse.
In practice, this constraint has been rarely enforced by tools or desired by user communities. It leads to annoying limitations, for example, being unable to reuse elements in more-general topic types from more-specialized types where the reuse would otherwise be fine in the context of the local content rules.
In DITA 1.3 the constraint requirement was relaxed so that unconstrained reuse is now the default behavior.
Interchange and Interoperability of Grammars
DITA's modular approach to grammar organization allows grammar modules to be interchanged reliably because the defining modules are never modified (every copy of a given version in time of a module should be identical). The coding patterns and extension mechanisms used in the DITA grammar files allow DITA modules to be used together with a minimum of effort.
In the context of a DITA-aware tool like OxygenXML, using new DITA grammars is as simple as deploying the grammar-providing plugins to the DITA Open Toolkit used by Oxygen. Those document types can then immediately be used to create new documents, edit documents that use those document types with full DITA functionality automatically available (because Oxygen's configuration is specialization aware and thus can be applied to any DITA document without further configuration effort), and apply DITA processing to those documents.
Interchange and Interoperability of Processing
Because specialization-aware processors can handle any DITA document in at least a minimal way, processing is inherently interchangeable at that level. The DITA standard also defines requirements for invariant processing where processing must be consistent to ensure interoperability and consistency of results, for example address resolution and use-by-reference resolution. It also provides processing suggestions for elements that most users would expect to be processed or rendered a certain way.
Beyond that, the modular nature of DITA grammars maps naturally to modular software approaches, such as plugin-based frameworks. Where such software exists, such as DITA Open Toolkit and OxygenXML, processing for new specializations can usually be added by providing software modules that simply extend the base processing to handle the specializations as needed.
In addition, because all DITA elements can be processed in terms of their base, specializations that do not require any special processing do not require configuration or processing support simply to account for a new element type or attribute.
For example, having defined a new specialization module and packaged it as an Open Toolkit plugin, simply by deploying the grammar-providing plugin to the Open Toolkit used by OxygenXML, OxygenXML immediately enables visual editing of the new specialization simply by providing fallback processing to all the specializations. If the new specialization requires some special configuration, such as unique styling, that can be added by defining a new OxygenXML document type framework that is an extension of the built-in DITA framework, re-using all the existing style sheets and only requiring new styles for the new specializations where the base styling is not what you want.
Interchange and Interoperability of Knowledge
The coding patterns for DITA grammar modules and document type shells defined in the DITA standard mean that the knowledge of how to use, configure, and customize DITA grammars is reusable and interoperable. That is, any person who understands the DITA coding patterns should be able to immediately understand and use the document type shells, specialization modules, and constraint modules developed by any other DITA-aware person. These coding patterns also enable automatic and interactive tools that make it easier to work with or generate DITA grammars. For example, Jang Graat has implemented an interactive tool for defining new constraint modules that then generates the RNG for the constraint from which DTD or (with limits) the XSD version can be generated.
Finally, the organizational patterns for DITA grammars end up providing a general pattern for how DITA grammars are packaged with entity resolution catalogs for use with tools, as implemented by the open-source DITA Open Toolkit. DITA document type shells and grammars can be packaged into Open Toolkit plugins which Open Toolkit can then automatically combine with other document-type-providing plugins in the context of a single master entity resolution catalog. Because Open Toolkit is both cross-platform and open-source, anyone or any tool can use it, effectively providing a de-facto standard for packaging and use of DITA grammars.
DITA Customization How To
This section demonstrate how to:
-
Remove an element
-
Add a new inline element
-
Add a new block element
-
Constrain an attribute value or the data type of an element
-
Constrain the content model of a block element
-
Define a new top-level document type
Remove An Element
Removing an element in DITA means disallowing the element from being used in any context. In DITA terms this is a constraint. The details of how the constraint is implemented depend on whether or not the disallowed element is a base element and if it is, has associated domain-provided specializations. For specialized elements the details of the constraint depend on whether or not the element is defined in a domain or in a topic or map type.
For domain-provided specializations, disallowing the element means omitting it from the domain-defined domain integration pattern or parameter entity.
For example, the DITA "highlight" domain provides the <b> and <i> elements. You want to disallow these two elements (but allow other elements from the domain, such as <u> and <line-through>).
For RELAX NG you override the domain-defined pattern that adds the domain-provided elements to the element-type-name pattern for the base type (<ph> in this case):
<grammar xmlns="http://relaxng.org/ns/structure/1.0"> <div> <a:documentation>INCLUDE MODULES</a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"> <define name="topic-info-types"> <ref name="topic.element"/> </define> </include> ... <include href="urn:oasis:names:tc:dita:rng:highlightDomain.rng"> <define name="hi-d-ph"> <choice> <!-- Omit b and I: <ref name="b.element"/> <ref name="i.element"/> --> <ref name="line-through.element" dita:since="1.3"/> <ref name="overline.element" dita:since="1.3"/> <ref name="sup.element"/> <ref name="sub.element"/> <ref name="tt.element"/> <ref name="u.element"/> </choice> </define> </include> ... </div> </grammar>
Within the highlightDomain module this declaration adds the domain-contributed specializations of <ph> to the base <ph> element-type-name pattern:
<define name="ph" combine="choice"> <ref name="hi-d-ph"/> </define>
The redefinition of the "hi-d-ph" pattern has the effect of removing <b> and <i> from all content models that would have otherwise reflected them because they refer to the "ph" pattern.
For DTDs, the same constraint is implemented by simply replacing the reference to %hi-d-ph; in the domain-integration parameter entity with the list of element types from the highlight domain to be included:
<!-- ============================================================= --> <!-- DOMAIN EXTENSIONS --> <!-- ============================================================= --> <!-- Omit b and i: --> <!ENTITY % ph "ph | line-through | sup | sub | tt | u ">
To disallow a base element type for which there are domain-provided specializations, then it's simply a matter of removing the element type from the element-type-name pattern (RNG) or domain integration parameter entity (DTD).
For RNG you can override the element-type-name pattern to use <notAllowed>:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"> ... <div> <a:documentation>INCLUDE MODULES</a:documentation> <include href="../../base/rng/topicMod.rng"> <define name="p"> <notAllowed/> </define> <define name="topic-info-types"> <ref name="topic.element"/> </define> </include> <include href="dbParaDomainMod.rng"/> ... </div> </grammar>
For DTD you simply omit the element type from the domain integration parameter entity:
<!-- ============================================================= --> <!-- DOMAIN EXTENSIONS --> <!-- ============================================================= --> <!-- Omit p: --> <!ENTITY % p "%dbPara-d-p;" >
If the base element to be disallowed does not have any domain-provided specializations then for DTDs you cannot simply set the domain integration parameter entity to "" because that will result in invalid content models anywhere the parameter entity is referenced.
Thus, for DTDs you must override the declaration of any parameter entity that references the element's domain integration parameter entity to omit the reference to it and the connectors associated with it. Fortunately, this can be done automatically when generating the DTD modules from the RELAX NG modules.
To disallow elements defined in map or topic modules, you simply override the content model patterns or parameter entities that include the element to be disallowed.
For example, to disallow the base element <section> from generic topics, you would define a constraint module like so:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"> <a:documentation>Constraint on generic topic to disallow use of sections within body</a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"> <define name="body.content"> <zeroOrMore> <choice> <ref name="body.cnt"/> <ref name="bodydiv"/> <ref name="example"/> <!–- Disallow section <ref name="section"/> --> </choice> </zeroOrMore> </define> </include> </grammar>
In a document type shell the constraint module is referenced in place of the reference to the constrained module:
<grammar xmlns="http://relaxng.org/ns/structure/1.0" ...>
<div>
<a:documentation>INCLUDE MODULES</a:documentation>
<include href="topicBodyNoSectionConstraintMod.rng">
<define name="topic-info-types">
<ref name="topic.element"/>
</define>
</include>
...
</div>
...
</grammar>
The DTD equivalent uses a contraint module that overrides the declaration of %body.content; to omit section:
<!-- Constraint to disallow section within body -->
<!ENTITY % body.content
"(%body.cnt; |
%bodydiv; |
%example;)*"
>
This constraint module is then included in the document type shell before the reference to the base topic.mod file:
... <!-- ============================================================= --> <!-- CONTENT CONSTRAINT INTEGRATION --> <!-- ============================================================= --> <!ENTITY % topicBodyNoSection SYSTEM "topicBodyNoSectionConstraint.mod" >%topicBodyNoSection; <!-- ============================================================= --> <!-- TOPIC ELEMENT INTEGRATION --> <!-- ============================================================= --> <!ENTITY % topic-type PUBLIC "-//OASIS//ELEMENTS DITA 1.3 Topic//EN" "../../base/dtd/topic.mod" >%topic-type; ...
Add a New Inline or Block Element
In DITA adding a new element that is not itself a new topic or map type and is not specific to a new topic or map type means defining a new domain module that provides the element type. The domain is then integrated into document type shells to make the element available wherever its base element is allowed. Constraints can be used to allow the element only in specific contexts or to disallow it from specific contexts.
Using the "DocBook paragraph" domain as an example, the RELAX NG domain module would be:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns="http://relaxng.org/ns/structure/1.0"> <moduleDesc xmlns="http://dita.oasis-open.org/architecture/2005/"> <moduleTitle>DocBook para Domain</moduleTitle> <headerComment> Provides a specialization of <p>, <para>, mirroring the DocBook element type for paragraphs. </headerComment> <moduleMetadata> <moduleType>elementdomain</moduleType> <moduleShortName>dbPara-d</moduleShortName> <modulePublicIds> <dtdMod>urn:pubid:dtd:elements:dbParaDomain</dtdMod> <dtdEnt>urn:pubid:dtd:entities:dbParaDomain</dtdEnt> <xsdMod>urn:pubid:xsd:dbParaDomain</xsdMod> <rncMod>urn:pubid:rnc:dbParaDomain</rncMod> <rngMod>urn:pubid:rng:dbParaDomain</rngMod> </modulePublicIds> <domainsContribution>(topic dbPara-d)</domainsContribution> </moduleMetadata> </moduleDesc> <div> <a:documentation>DOMAIN EXTENSION PATTERNS</a:documentation> <define name="dbPara-d-p"> <choice> <ref name="para.element"/> </choice> </define> <define name="p" combine="choice"> <ref name="dbPara-d-p"/> </define> </div> <div> <a:documentation>ELEMENT TYPE NAME PATTERNS</a:documentation> <define name="para"> <ref name="para.element"/> </define> </div> <div> <a:documentation>ELEMENT TYPE DECLARATIONS</a:documentation> <div> <a:documentation>LONG NAME: Para</a:documentation> <define name="para.content"> <zeroOrMore> <ref name="para.cnt"/> </zeroOrMore> </define> <define name="para.attributes"> <ref name="univ-atts"/> <optional> <attribute name="outputclass"/> </optional> </define> <define name="para.element"> <element name="para" dita:longName="Paragraph"> <a:documentation>DocBook-style paragraph</a:documentation> <ref name="para.attlist"/> <ref name="para.content"/> </element> </define> <define name="para.attlist" combine="interleave"> <ref name="para.attributes"/> </define> </div> </div> <div> <a:documentation>SPECIALIZATION ATTRIBUTE DECLARATIONS</a:documentation> <define name="para.attlist" combine="interleave"> <ref name="global-atts"/> <optional> <attribute name="class" a:defaultValue="+ topic/p dbPara-d/para "/> </optional> </define> </div> </grammar>
The domain module is then simply included into any document type shell that wants to allow it:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:dita="http://dita.oasis-open.org/architecture/2005/"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:svg="http://www.w3.org/2000/svg">
...
<div>
<a:documentation>INCLUDE MODULES</a:documentation>
<include href="urn:oasis:names:tc:dita:rng:topicMod.rng">
<define name="topic-info-types">
<ref name="topic.element"/>
</define>
</include>
<include href="dbParaDomainMod.rng"/>
...
</div>
</grammar>
Constrain an Attribute Value or Element Data Type
Because DITA is limited to DTD features in the TC-defined grammars, the DITA standard does not define any element data types. If you are using RNG or XSD as your working grammar syntax you could of course add element data type constraints by adding a constraint module that uses RNG lexical patterns or XSD data type constraints.
For attributes, constraining a value is a matter of overriding the declaration of the attribute in a constraint module.
The DITA grammar coding conventions do not provide general parameterization of individual attribute declarations, so constraining an individual attribute requires overriding the pattern or parameter entity that provides the attribute declaration.
If the attribute is a common attribute used by multiple element types with the same base definition it will normally be in a pattern with related attributes, for example, the "display-atts" pattern:
<div>
<a:documentation>COMMON ATTRIBUTE SETS</a:documentation>
<define name="display-atts">
<optional>
<attribute name="scale">
<choice>
<value>50</value>
<value>60</value>
<value>70</value>
<value>80</value>
<value>90</value>
<value>100</value>
<value>110</value>
<value>120</value>
<value>140</value>
<value>160</value>
<value>180</value>
<value>200</value>
<value>-dita-use-conref-target</value>
</choice>
</attribute>
</optional>
<optional>
<attribute name="frame">
<choice>
<value>all</value>
<value>bottom</value>
<value>none</value>
<value>sides</value>
<value>top</value>
<value>topbot</value>
<value>-dita-use-conref-target</value>
</choice>
</attribute>
</optional>
<optional>
<attribute name="expanse">
<choice>
<value>column</value>
<value>page</value>
<value>spread</value>
<value>textline</value>
<value>-dita-use-conref-target</value>
</choice>
</attribute>
</optional>
</define>
To constrain the @expanse attribute to just the values "column" and "page" you would define a constraint module that has a copy of the display-atts pattern with the modified definition of @expanse:
<grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns="http://relaxng.org/ns/structure/1.0"> <a:documentation> Limits @expanse attribute to page and column </a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"> <define name="display-atts"> <optional> <attribute name="scale"> <choice> <value>50</value> <value>60</value> <value>70</value> <value>80</value> <value>90</value> <value>100</value> <value>110</value> <value>120</value> <value>140</value> <value>160</value> <value>180</value> <value>200</value> <value>-dita-use-conref-target</value> </choice> </attribute> </optional> <optional> <attribute name="frame"> <choice> <value>all</value> <value>bottom</value> <value>none</value> <value>sides</value> <value>top</value> <value>topbot</value> <value>-dita-use-conref-target</value> </choice> </attribute> </optional> <optional> <attribute name="expanse"> <choice> <value>column</value> <value>page</value> <!-- Omit spread and textline --> <value>-dita-use-conref-target</value> </choice> </attribute> </optional> </define> </include> </grammar>
If an attribute only occurs on a single element type or has a unique declaration for a given element type, then you would override the element type's *.attributes pattern.
For example, the @outputclass attribute is available on almost every element and is declared as CDATA on all elements. To specify specific values for @outputclass on say the <keyword> element, you would redeclare the "keyword.attributes" pattern in a constraint module:
<grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns="http://relaxng.org/ns/structure/1.0"> <a:documentation> Define specific values for @outputclass on keyword. </a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"> <define name="keyword.attributes"> <optional> <attribute name="keyref"/> </optional> <ref name="univ-atts"/> <optional> <attribute name="outputclass"> <choice> <value>class1</value> <value>class2</value> <value>class3</value> </choice> </attribute> </optional> </define> </include> </grammar>
Constrain the Content Model of a Block Element
Every element type has a *.content pattern that defines the content model for that element type. Thus constraining the content model for any element is a matter of redefining the *.content pattern in a constraint module. The coding pattern is the same for all element types.
For example, the base definition of the <fig> content model is:
<define name="fig.content"> <optional> <ref name="title"/> </optional> <optional> <ref name="desc"/> </optional> <zeroOrMore> <choice> <ref name="figgroup"/> <ref name="fig.cnt"/> </choice> </zeroOrMore> </define>
A constraint module that makes <title> and <desc> required is:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns="http://relaxng.org/ns/structure/1.0"> <a:documentation> Require title and desc for figure </a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"> <define name="fig.content"> <!-- Require title and desc --> <ref name="title"/> <ref name="desc"/> <zeroOrMore> <choice> <ref name="figgroup"/> <ref name="fig.cnt"/> </choice> </zeroOrMore> </define> </include> </grammar>
Define a New Top-Level Document Type
In DITA a new top-level document type can mean either a new DITA document type, meaning a new combination of existing modules, or a new specialized map or topic type intended to be used as a root element.
Defining a new document type shell is a matter of creating references to the appropriate modules and including any shell-defined constraints.
For a new map or topic type specialization, the minimum is a copy of the appropriate base map or topic type's declaration module (RNG) or modules (DTD) with the base map or topic element type name changed to the specialized name. For example, to define a new topic type "chapter" that is otherwise identical to the base <topic> topic type, you would simply copy the topicMod.rng file to a new file, e.g., chapterMod.rng, update all declarations that refer to the element type "topic" to refer instead to the topic type "chapter", and remove the declarations of all other element types:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="urn:oasis:names:tc:dita:rng:vocabularyModuleDesc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?> <grammar xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <moduleDesc xmlns="http://dita.oasis-open.org/architecture/2005/"> <moduleTitle>Chapter Topic Type</moduleTitle> <headerComment>Represents a chapter within a publication</headerComment> <moduleMetadata> <moduleType>topic</moduleType> <moduleShortName>topic</moduleShortName> <modulePublicIds> <dtdEnt></dtdEnt> <dtdMod></dtdMod> <xsdMod></xsdMod> <xsdGrp></xsdGrp> <rncMod></rncMod> <rngMod></rngMod> </modulePublicIds> </moduleMetadata> </moduleDesc> <div> <a:documentation>ARCHITECTURE ATTRIBUTES</a:documentation> <define name="arch-atts"> <optional> <attribute name="dita:DITAArchVersion" a:defaultValue="1.3"/> </optional> </define> </div> <div> <a:documentation>INFO TYPES PATTERNS</a:documentation> <define name="chapter-info-types"> <ref name="info-types"/> </define> <define name="info-types"> <ref name="topic.element"/> </define> </div> <div> <a:documentation>ELEMENT TYPE NAME PATTERNS</a:documentation> </div> <div> <a:documentation>ELEMENT TYPE DECLARATIONS</a:documentation> <div> <a:documentation>LONG NAME: Chapter</a:documentation> <define name="chapter.content"> <ref name="title"/> <optional> <ref name="titlealts"/> </optional> <optional> <choice> <ref name="shortdesc"/> <ref name="abstract"/> </choice> </optional> <optional> <ref name="prolog"/> </optional> <optional> <ref name="body"/> </optional> <optional> <ref name="related-links"/> </optional> <zeroOrMore> <ref name="topic-info-types"/> </zeroOrMore> </define> <define name="chapter.attributes"> <attribute name="id"> <data type="ID"/> </attribute> <ref name="conref-atts"/> <ref name="select-atts"/> <ref name="localization-atts"/> <optional> <attribute name="outputclass"/> </optional> </define> <define name="chapter.element"> <element name="chapter" dita:longName="Chapter"> <a:documentation>The <chapter> element represents a chapter within a publication</a:documentation> <ref name="chapter.attlist"/> <ref name="chapter.content"/> </element> </define> <define name="chapter.attlist" combine="interleave"> <ref name="chapter.attributes"/> <ref name="arch-atts"/> <ref name="domains-att"/> </define> <define name="idElements" combine="choice"> <ref name="chapter.element"/> </define> </div> </div> <div> <a:documentation>SPECIALIZATION ATTRIBUTES</a:documentation> <define name="chapter.attlist" combine="interleave"> <ref name="global-atts"/> <optional> <attribute name="class" a:defaultValue="+ topic/topic chapter/chapter "/> </optional> </define> </div> </grammar>
When defining a new top-level topic type you would normally also define at least one document type shell for it:
<?xml version="1.0" encoding="UTF-8"?> <?xml-model href="urn:oasis:names:tc:dita:rng:checkShell.sch" schematypens="http://purl.oclc.org/dsdl/schematron"?> <?xml-model href="urn:oasis:names:tc:dita:rng:vocabularyModuleDesc.rng" schematypens="http://relaxng.org/ns/structure/1.0"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" xmlns:dita="http://dita.oasis-open.org/architecture/2005/" xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"> <moduleDesc xmlns="http://dita.oasis-open.org/architecture/2005/"> <moduleTitle>Chapter Topic Type Shell</moduleTitle> <headerComment xml:space="preserve"> Shell for chapter topics </headerComment> <moduleMetadata> <moduleType>topicshell</moduleType> <moduleShortName>chapter</moduleShortName> <shellPublicIds> <dtdShell>urn:pubid:example.org:dita:dtd<var presep=":" name="ditaver"/>:chapter.dtd</dtdShell> <rncShell>urn:pubid:example.org:dita:rnc:chapter.rnc<var presep=":" name="ditaver"/></rncShell> <rngShell>urn:pubid:example.org:dita:rng:chapter.rng<var presep=":" name="ditaver"/></rngShell> <xsdShell>urn:pubid:example.org:dita:xsd:chapter.xsd<var presep=":" name="ditaver"/></xsdShell> </shellPublicIds> </moduleMetadata> </moduleDesc> <div> <a:documentation>ROOT ELEMENT DECLARATION</a:documentation> <start> <ref name="chapter.element"/> </start> </div> <div> <a:documentation>DOMAINS ATTRIBUTE</a:documentation> <define name="domains-att" combine="interleave"> <optional> <attribute name="domains" a:defaultValue="(topic abbrev-d) (topic chapter) (topic equation-d) (topic hazard-d) (topic hi-d) (topic indexing-d) (topic markup-d xml-d) (topic markup-d) (topic mathml-d) (topic pr-d) (topic relmgmt-d) (topic svg-d) (topic sw-d) (topic ui-d) (topic ut-d) a(props deliveryTarget)" /> </optional> </define> </div> <div> <a:documentation>MODULE INCLUSIONS</a:documentation> <include href="urn:oasis:names:tc:dita:rng:topicMod.rng"/> <include href="chapterMod.rng"/> <include href="urn:oasis:names:tc:dita:rng:abbreviateDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:deliveryTargetAttDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:equationDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:hazardDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:highlightDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:indexingDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:markupDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:mathmlDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:programmingDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:releaseManagementDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:svgDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:uiDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:utilitiesDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:xmlDomain.rng"/> <include href="urn:oasis:names:tc:dita:rng:xnalDomain.rng"/> </div> <div> <a:documentation>ID-DEFINING-ELEMENT OVERRIDES</a:documentation> <define name="any"> <zeroOrMore> <choice> <ref name="idElements"/> <element> <anyName> <except> <name>chapter</name> <name>topic</name> <nsName ns="http://www.w3.org/2000/svg"/> <nsName ns="http://www.w3.org/1998/Math/MathML"/> </except> </anyName> <zeroOrMore> <attribute> <anyName/> </attribute> </zeroOrMore> <ref name="any"/> </element> <text/> </choice> </zeroOrMore> </define> </div> </grammar>
Creating this shell is largely an exercise in cut and paste.
References
[DITA 1.3] Darwin Information Typing Architecture (DITA) Version 1.3 Part 3: All-Inclusive Edition Plus Errata 02, OASIS Open, 2018. http://docs.oasis-open.org/dita/dita/v1.3/dita-v1.3-part3-all-inclusive.html.
[Kimber1] Kimber, Eliot and George Bina, RELAX
NG and DITA: An Almost Perfect Match,
presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8,
2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:https://doi.org/10.4242/BalisageVol13.Kimber01. https://www.balisage.net/Proceedings/vol13/html/Kimber01/BalisageVol13-Kimber01.html.
[RNG2DTD] Kimber, Eliot, DITA RELAX NG to DTD and XSD converter, https://github.com/oasis-open/dita-rng-converter.
[1] While this ability to know the DITA document type for documents without the use of grammar files is interesting and unique to DITA, as far as we know, no tools actually make use of it as the practical need for grammars means that most DITA documents have associated grammars, at least for authoring and management purposes. In addition, one of the intended use cases for declaring modules in this way, imposition of re-use constraints, turns out to not be that useful in practice. For this reason, the DITA Technical Committee has decided to make the @domains attribute optional in DITA 2.0.
[2] XML Schema Part 1: Structures Second Edition, clause 4.2.2 Including modified component definitions