How to cite this paper
White, David. “Smart Content for High-Value Communications.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). https://doi.org/10.4242/BalisageVol15.White01.
Balisage: The Markup Conference 2015
August 11 - 14, 2015
Balisage Paper: Smart Content for High-Value Communications
David White
Dave White has been with Quark Software Inc. for 7 years and is currently CTO. Dave
has been in the XML (and SGML) authoring and publishing software business since 1994,
including 13 years at Arbortext in a variety of sales, product management, and business
development roles.
Copyright © 2015 Quark Software Inc.
Abstract
14 years after the original XML specification reached recommendation status and more
than 30 years since SGML solutions had proven the rich value and significant return
on investment for technical documentation, there is still a relatively low number
of XML-based publishing system deployments for non-technical, high-value communications.
Even though marketing departments, product managers, and enterprise publishing departments
face similar challenges as those that documentation departments have addressed, the
value of automated publishing from structured content has eluded these additional
audiences. For these teams of non-technical, subject matter experts and supporting
communications departments, there continue to be too many roadblocks on the value
path to an XML-based dynamic publishing solution. Quark's Smart Content methodology
and RNG schema is meant to address the needs of non-technical communicators with a
rethinking of the fundamental differences required to allow this new user base to
join the dynamic publishing community.
Table of Contents
- Introduction
- Purpose
- XML Authoring Usability: Restrictive Content Models for Use of Blocks and Inlines
- XML Authoring Usability: Archetypes
- XML Consistency: The use of Metadata beyond XML Attributes
- Controlling Order and Occurrence Validation by Type Attribute Value
- Authoring Usability: Component Reuse
- An Overview of the Smart Content Schema
-
- Formal Groups of Blocks: <section>
- Blocks: <p>
- Inlines: <tag>
- Miscellaneous Elements: <table>, <image>, <video>, <xref>, lists, emphasis, etc.
- Metadata: <meta>
- Under Consideration: Block Combinations and Simple Block Combinations: <bodyDiv>
and <simpleBodyDiv>
- Summary
- Smart Content Schema Sample
Introduction
There are two fundamental areas that must be addressed to attract a non-technical
communications market to XML-based Dynamic Publishing: the usability of authoring
in semantically-rich XML; and the ability of an XML publishing engine to address the
needs of content types that do not easily fit into the design constraints of documentation
and reports. Publishing engine constraints can only be solved by the software engineers
that build these engines. The XML schema for document input is generally orthogonal
to the types of formatting features a publishing engine can support. Therefore, Smart
Content primarily focused on addressing authoring usability and, similar to DITA,
can simplify the implementation of a solution by providing a well thought-out base
from which to start an implementation.
There are many non-techdoc content types that could be well served by an XML-based
publishing system. Both content types share many aspects, including but certainly
not limited to the following:
-
content volume
-
publishing frequency of both new and revised documents
-
information sets with differences specific to a particular audience
-
language translation needs
-
opportunities to reuse or repurpose content components across different publications
We have seen successful implementations in Financial Services for investment research
reports, fund fact sheets, ratings reports, and insurance guidelines; in government
for the support and development of laws; in manufacturing for product marketing materials;
and across many industries for standard operating procedures. Many of these content
types also happen to be simpler than techpubs documents in the sense that they have
fewer block and inline markup types and often have less need for complex and restrictive
content models. However, they may have as much, more, or at least very different requirements
for presentation style on output.
One of the strongest characteristics shared across all of these content types is that
the content authors have a primary role in their company which is not authoring. They
are subject matter experts; they are often not technical-minded (in a software technology
sense); authoring is -at best- secondary to their business function; and the frequency
for which they author may vary widely from once a year to a few hours a day.
These authors have traditionally used MS Word as their primary content creation tool.
Some may have used on-line tools such as Google Docs. Common to all of these word
processing applications is the freeform and style-driven nature of the user experience.
Write a sentence, apply a named-style or build up a style from a formatting UI, add
a table or graphic. There are almost no rules except the limitations of the features
available within the tool. Want to apply a Chapter Title style to a sentence with
an inline Table? Sure, why not. Hand-crafted use of style is easy and inviting. However,
anyone in the XML-content community knows why this is a problem, and it's the same
problem when one attempts to deploy automation to any hand-crafted process: Garbage-in,
Garbage-out. There is no consistent and reliable way to deploy automation without
control and structure of the inputs.
Authoring with structure and validation adds intelligence to content and enables powerful
automation typically in the form of multi-channel publishing. Publishing automation
benefits include efficiencies associated with the “single source of truth”, re-use,
and repurposing---all of which contribute to faster time-to-market. Automation also
provides significant quality improvements like improved reading comprehension, style
consistency, and higher message relevance to a particular audience.
XML has been widely deployed for a variety of content and document applications with
different purposes. Even MS Word uses Office Open XML as the file format for a variety
of MS Office applications. As anyone that tries to automate publishing from MS Word
files can explain, XML isn't the complete answer. The answer for "how to automate
content processes" has and continues to be the deployment of a rules-driven, semantic-XML
content process where order and occurrence as well as meaning and purpose are clearly
defined within the content at a very fine-grained level -even down to a single character
if required. Semantic XML enables a software program to automatically use, filter,
index, or transform the enriched, XML source to many outputs for multiple uses with
high-speed, high-quality, high-value, and low risk of failures.
If semantic, structured XML is the input, then authoring directly in XML is the most
direct path to success. But converting an audience of non-technical, occasional authors
from a free-form, style-based word processing tool to using a complex, rules-driven
XML authoring tool is a very big challenge. The problem, defined through consistent
feedback from authors who have tried to make the transition, is usability. XML rules
constrict the author within new and fine-grained boundaries where these authors have
never previously experienced limitations.
Purpose
The goal of presenting Quark's Smart Content model is to gauge the interest in our
methodology and ultimately to see if there is community support to initiate a public
standardization effort.
XML Authoring Usability: Restrictive Content Models for Use of Blocks and Inlines
If automation provides much of the value for content processes, and automation requires
structured and validated input to be successful, then rules must be enforced at the
content authoring stage. A close analogy is the value of validated HTML forms for
data input which provide the user instant feedback while they type data into a field.
However, writing long-form prose is quite different than entering data into a form,
so the nature of how the rules are expressed has to be considerably different.
Most XML authoring usability issues are caused by some form of "hidden rule." A definition
for "hidden rule" is any restriction or boundary that the software enforces but in
order to understand the restriction -or even know that one exists- the author must
read documentation, manipulate one or more user interface widgets, and/or try an action
only to have the action fail. XML schema for content authoring have many structures
that create hidden rules. One very significant difference between XML and free-form
word processing is the use of containment and nested hierarchy which is quite foreign
to non-technical authors. Most XML authoring tools minimize the impact of containment
by deploying structure-tree UIs such as a table of contents for section/topic/chapter
type divisions. Depending on the level of markup included in a structure-tree view
(sections only, all markup, user configurable), a non-technical author may still find
the usability challenging. Also a structure-tree view may not solve the problem
for all containment contexts, for example identifying a run of multiple paragraphs
as a sidebar or callout.
As a result, even authoring in the best, long-standing XML-authoring applications
is easiest if the "tags-on view" is used, because this gives immediate visual context
to the structure of the markup used in the document and provides hints to the rules
the user can/should expect -though that knowledge still requires repeated experience
with the system to truly be internalized. Tags-on view only partially helps but does
not solve the full problem of hidden rules, and the extra visual distraction caused
by the display of markup boundaries is too foreign and unpleasant. As soon as tags-on
view is a requirement to improve usability, non-technical authors rebel. For additional
insight into the challenges and opportunities for authoring usability improvement
see Flynn, Peter. “Could authors really write in XML one day?” Presented at Balisage:
The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of
Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol.
10 (2013). doi:10.4242/BalisageVol10.Flynn02.
Free-form word processing tools have very few hidden rules because nearly any content
is allowed in nearly any location within a document. When an author hits one of the
few hidden rules that do exist, the experience is unpleasant and frustrating. In
MS Word for example, when placing the cursor at the end of a pre-existing run of text
with a bold emphasis, the user is never quite sure if typing more characters are going
to inherit the bold styling or not. There appear to be several hidden rules controlling
emphasis application that depend on whether the entire paragraph is selected and how
the cursor was placed (mouse click, arrow from within bold region, arrow from outside
of bold region, directly next to non-space character, with space character between
bold and cursor, etc.). MS Word has other hidden rules when using tab-space, multi-column
flows, and nested and outline lists, to name a few.
Hidden rules in XML authoring are problematic for some very common and frequently
used actions: moving and reordering content through cut/copy/paste and drag/drop,
adding structural content such as section/title, and adding specific types of content
such as lists, tables, and multi-media. The usability problems caused by hidden rules
are often exacerbated by the over specification of order and occurrence rules at the
block and inline markup level in the XML schema definition.
One simple example of over specification is the restriction of inline markup within
Title elements. XML makes it easy to define a schema which limits the markup that
can be used within a Title. Within the same XML document type a schema may allow
for the use of inlines within a paragraph such as keyword, bold, italic, underline,
company, name, trademark, location, and possibly more. But often an XML schema developer
may come to the conclusion that Title elements should not contain any of these elements
because the output formatting will ignore them. By extension, a developer may choose
to avoid tempting an author to use these elements in the first place by excluding
them from the content model definition for Title.
Take the following simple example of a title and a paragraph:
<title>How to Make</title>
<para>Begin with the ingredients from the <keyword>Thanksgiving Recipe</keyword>.</para>
If the user selects and copies the phrase ‘the <keyword>Thanksgiving Recipe </keyword>.’
and pastes that after 'Make' in the <title> then the authoring tool might block that
paste, because the controlling schema doesn’t allow <keyword> inside a <title> element.
That’s frustrating, and worse, the reason for the failed paste is often hidden from
the user - they can’t figure out why it’s blocked so they think the tool is broken.
Of course a trained, full-time technical author would have a good idea what happened,
would turn on “show tags” (actually they probably started work with tags being displayed)
in their tool of choice, and only select the text they wanted - skipping the keyword
tag. This is a simple example, but many similar use cases exist. It’s a problem the
Quark team refers to as “gross-edits,” and is a significant issue when it blocks a
business user from authoring with the ease with which they are used in a word processing
tool. That ease-of-use -even the openness to the adoption of a structured authoring
tool- is predicated on NOT showing the XML structure in an XML way.
By limiting the content model of Title, the well-meaning XML schema developer has
just created a new hidden rule -only discoverable by the author while having the cursor
in a Title element and using the insert inline markup UI widget. The author will
also bump against this hidden rule when trying to paste text copied from a para that
contains one or more forbidden inline markups, and problematically, many XML authoring
tools will just not allow the paste nor provide any meaningful feedback to the author
on why the action was canceled [note: it is also of high value to have the authoring
tools improve the amount of feedback provided for these types of conditions]. Alternatively,
if the schema allows the inline markup within Title, the author may not get the results
they expect when that inline markup is ignored during output transformations. This
is certainly a trade-off and one that requires careful consideration. Our experience
in these cases: improving usability by reducing user steps has the highest value for
authors, as the ultimate success of any tool lies in its adoption.
This is just one example. There are many other contexts where the restriction of content
models in the XML schema create new hidden rules which may seem logical and helpful,
but actually create more authoring usability problems than the value the restrictions
may offer. While modifying the authoring tool(s) to improve the user experience in
these use cases could improve usability to some degree, to do so would require the
tool to provide the user more feedback about the underlying markup and content model
restrictions with choices to resolve the copy/paste actions when source and target
content models do not match. While this may be acceptable and preferred in some content
types and for some authors, for non-technical, occasional authors it would just move
their usability frustration to the "extra resolution step" which they are not used
to in free-form word processing. It makes sense to limit the number of hidden rules
that are created in the first place. The system can offload the resulting content
structure challenges downstream to an automated step, such as when creating output.
For this reason, the base architecture of the Smart Content schema only allows for
the definition of which blocks and inlines can be used at a section archetype level.
Importantly, there are no controls for order and occurrence of blocks and inlines
within a section. You can use any block at any time as frequently as desired and
you can place blocks in any order. The same is true with inline markup. This significantly
reduces the opportunity for the XML schema developer to over-specify content models
that will reduce authoring usability. And while this does not solve all usability
issues, it significantly moves in the right direction.
XML Authoring Usability: Archetypes
Developing an authoring system that supports and enforces arbitrary XML schema definitions
with complex structure and varying content models that is both performant and highly
useable on a wide variety of computing platforms -and it is clear that contributing
to the content process from mobile devices with varying computing power is a growing
requirement- is a very big challenge. While its relatively easy for a batch XML parser
to validate and report errors against an XML document instance with a specific XML
schema, it is much more difficult to provide the same capabilities in real-time during
authoring. This is true due to both "arbitrary XML" support and the nature of XML
parser processing expectations. To minimize some of this complexity, existing XML
schema such as DITA have utilized an extensible information archetype model. This
has enabled many tools to offer DITA support tailored for a specific purpose without
having to claim support for arbitrary XML schema. The base DITA schema is highly complex
with its original target of solving recurring problems in technical publications,
though there have been and continue to be efforts to offer a simplified version that
would be appropriate for non-technical documents.
The main advantage of a system based on archetypes is that an application can apply
default processing to any markup which has an assigned root class. In arbitrary XML
schema authoring implementations, the software must provide an additional configuration
file that describes the basic processing for each XML element, e.g. <myBlock> should
have a hard return before and after. With an archetype-based system, system implementation
work is reduced since there can be fewer configuration files to develop, test, and
deploy. A challenge with archetype-based systems is when the base archetypes of the
system do not include an information type required from which to start.
Like XHTML and DITA, Smart Content schema starts with a set of common elements which
nearly every document type will contain. These are section, block (<para>), inline
(<tag>), table, image, cross reference, reference notes (e.g. footnote, endnote, etc.),
and metadata. Other content types that are available by default include a variety
of emphasis and lists. Currently being discussed are extensible models for bodyDiv
in which the content model is fixed (e.g. a figure) and a simpleBodyDiv, where the
content model is any block(s) from the parent section and would enable simple semantic
wrapping of a sequence of blocks.
The advantage of having a set of base elements as information archetypes is that the
system can treat customizations of these base types with common processing. Some examples
include:
-
all types of Section appear in a TOC
-
all types of Para get white space before and after
-
all types of cross-reference can utilize the same source and target selection user
interface
In Smart Content each of these base archetypes can be extended through RNG configuration
for custom semantics Similar to XHTML's "class" attribute, the persistent form of
a custom semantic for Smart Content is attribute based: <section type="Purpose">.
There are two reasons for this:
-
It's extremely friendly to HTML developers, is easily transformed to XHTML, and can
support direct presentation in HTML browsers using CSS techniques similar to the XHTML
class attribute.
-
Using a type attribute instead of changing the element name provides an implementation
methodology that better supports the cut/copy/paste/drop/drop of elements between
document contexts. This approach also avoids the traditional pain related to parser
errors associated with an invalid move and thereby skips the common frustration encountered
when an element is out of context in the new location. When the value of a Type attribute
is used for order and occurrence control, it may be easier using available parsers
to resolve the issue in a more user-friendly manner and with high performance as the
system does not have to validate all XML elements within the fragment at one time.
If the base type is allowed, then it is a simple matter of assigning an available
Type value. However, this also requires some unique features of RNG and would be difficult
to implement in systems that support only DTD or XSD schema languages.
XML Consistency: The use of Metadata beyond XML Attributes
For many years, XML document systems, authoring tools, and schema developers have
struggled with the limitations of using XML attributes to capture rich metadata. A
simple example of a multi-value attribute must be expressed in XML using a text delimited
attribute value such as: <section security-audience="Employees; Partners; Customers">
.
As a result custom programming must be used to define the user experience that presents
and constrains the author of multi-value attributes, the validation of a multi-value
attribute, and finally process a multi-value attribute value such as when publishing
a document.
The problem is multiplied if the attribute values should have hierarchy such as when
describing geo-based regions: <section geo-audience="North America:CN,US;EMEA:UK,IR;">
However, most of these use cases can be handled if treated as an XML Fragment rather
than as XML attributes:
<section>
<meta>
<attribute type="security-audience">
<value>Employees</value>
<value>Partners</value>
</attribute>
</meta>
</section>
Smart Content assumes that sections and inlines (and in the future, blocks), can have
a <meta> element directly after the start tag and that, regarding any cut/paste, publish
or other process, that metadata should be treated in the same way that XML attributes
are treated. They are "children" of the element and apply to the element as a whole.
While this does express attributes in a verbose way, it enables the use of existing
XML tools for editing, validating, and processing metadata while providing much richer
expression, constraints, and validation without requiring custom processing. It does
however require tools that process the XML to implement the "lock" of the <meta> fragment
to the element that directly contains it, for example when copying and pasting text
at the beginning of an element that has metadata.
Controlling Order and Occurrence Validation by Type Attribute Value
XML parsers are built to validate the structure of the document using the element
names of a document. Parsers also validate the value of attributes, but attribute
validation is atomic: the validation test is only if the value of the attribute is
correct regardless of where in the document structure or even to which element the
attribute is attached.
The goal of maximizing authoring usability first and minimizing developer work second
is important in this context. Nearly all XML document schema are defined and customized
by modifying the element names. This of course works great for the parser to validate
and control the authoring experience, but it causes significant editing problems when
copying and pasting fragments of XML. The problem is that more than one element of
an XML fragment within a Paste buffer might not be valid at the target location.
The problem might be solved (and there have been many attempts) through authoring
tool development. On a paste of an element into a new context, the authoring tool
has to validate the entire structure of the paste-buffer fragment against the new
target content location. The fragment root element might be invalid, or a child of
the fragment root might be invalid, or the entire fragment structure might not be
valid. To solve all of these cases programmatically ranges from difficult to near
impossible. The simple answer then is to disallow the paste, thus placing the burden
of resolution on the author while increasing his/her frustration. The next best and
reasonably feasible programmatic answer is to shut-off the real-time validation parser,
allow the paste, and then hope that the author can figure out how to re-assign the
element names with very little guidance. And of course, they would have to turn the
tags view on to do this work.
But the frequency of these problems can be dramatically reduced by using archetypes
elements with Type attribute values to control order and occurrence. Then, assuming
that the base elements are allowed nearly everywhere in the document structure, a
paste can occur, the element structure validation parser is satisfied, and only the
Type attribute values might be invalid. Providing a user interface for tracking invalid
Type attribute values is relatively simple, though above and beyond normal XML parser
processing.
As a reminder, Smart Content's methodology allows defined blocks and inlines anywhere
within a given Section type. Assuming a Section has at least one block and one inline
defined, then copy and pasting across Sections with differing Sections type definitions
is always allowed and a Type attribute value user interface can be provided based
on: automatically set to the only available type value; or if multiple type values
are allowed, then alert the user through simple formatting and other user interface
controls that action is required to redefine the type value from a list of available
types.
Authoring Usability: Component Reuse
One of the most heavily marketed features of an XML authoring and publishing system
is the use of content components: the ability to reference an external asset of any
type (xml fragment, image, etc.) and the system resolves that reference as if the
target asset is "inline" with the master document. This "single-source reuse" (versus
traditional copy/paste or re-authoring) has an extremely positive impact on the ROI
of a solution. It reduces the time and costs of content maintenance, enables parallel
authoring and review at a sub-document level, decreases cost of content language translation
by reducing the amount of content sent to a translator, and generally improves the
quality of the content by increased consistency through synchronization of component
edits to all referring parent documents.
Componentization is also one of the most complex features of XML when it comes to
system deployment spawning a whole marketplace of Component Content Management Systems.
The more fine-grained the content can be targeted for reuse-by-reference (e.g. Paragraph
versus Section), the more complex it is for authors to understand what impact their
changes are going to have on the system. For this reason, and again with the filter
of Smart Content's target market, Componentization is limited to Sections and various
special object types such as images, tables, and in consideration are Block Combinations.
Smart Content has adopted the componentization syntax of DITA using "conref" attributes
to define the target content, though implementation of XLink or another syntax that
is system-specific is in consideration.
An argument can be made that there are use cases in any document type for support
of more granular component referencing. However, the added complexity for the non-technical,
occasional author will likely be not worth the tradeoff unless the implementor develops
use-case specific customizations. It's clear that XML techniques can solve almost
any challenge but not always in ways that allow for easy adoption, implementation,
training, maintenance, and most critical - authoring usability. The Smart Content
schema does not currently define component boundaries, only the reference syntax so
an implementation could determine how and when to enable component references.
An Overview of the Smart Content Schema
The Smart Content schema, defined in the RNG schema language, is heavily influenced
by information archetypes such as DITA, in that Smart Content provides a base vocabulary
of information elements that can be extended through configuration. However, Smart
Content differs significantly from DITA in how those custom types are defined and
how they persist in XML syntax. In this regard, Smart Content is more like XHTML.
A very positive benefit of being XHTML-like is that the implementation can be more
easily understood and rapidly adopted by the large volume of web developers.
The Smart Content methodology expressed in the schema has the goal of guiding system
implementers away from creating document structures that cause authoring usability
problems while also attempting to solve some of the long-standing limitations of XML
markup as applied to complex, authored documents. An example of the latter is the
application of element-level metadata that is richer than simple XML attributes support.
The following is an overview of the significant base elements:
Formal Groups of Blocks: <section>
The base element for a formal (i.e. having a Title) group of blocks is <section>.
The intent of Section is similar to HTML div or DITA topic. Section in the RNG is
used to create a custom typed container for a group of blocks, the list of blocks
and inlines that can be used within the Section, and the metadata elements for the
Section. Typically Sections will also start with a title element to enable easy identification
of the boundaries of the section as well as provide a handle for Section navigation
such as a hyper-linked table of contents.
The use of "section" as the semantic for a group of blocks was chosen as the best
compromise given that:
-
Division or <div> is heavily used in HTML for a wide variety of mostly programmatic
purposes and heavily overloaded in the XHTML domain
-
Topic or <topic> might invite unnecessary confusion between DITA and Smart Content
-
Chapter, Article, Part all have some specific definition in a variety of contexts
which may or may not overlap with Smart Content's usage
Sections have the common form of:
<section type="mySection">
<meta>...my metadata fragment...</meta>
<title>Title Text</title>
<body>...run of blocks...</body>
[...zero or more sub-sections...]
</section>
There is currently no distinguishing characteristic between Section as a component
and Section as a document. One use of a Section may be as a root for a publication
and the same Section may also be a component child of another Section for a different
publication. Smart Content leaves the definition of which Sections can be considered
a root for a publication up to system implementation.
Sections can be configured in the following ways:
-
Section type, persisted in XML documents as <section type="mySection">
-
List of Blocks allowed within the section
-
List of inline elements allowed within the section
-
Metadata that applies to the section as a whole, persisted as an XML fragment just
after the start tag: <section type="mySection"><meta>...my metadata fragment</meta><title>...</title></section></listitem>
Note that the definition of a Section does not allow for the control over order and
occurrence of blocks nor inlines. If a block or inline is defined in a Section model,
they can be used in any order and frequency. Under consideration is an exception to
this rule for <bodyDiv> defined below.
Blocks: <p>
The base element for blocks is <p> (as in "paragraph"). Blocks are intended to hold
runs of text and inline elements in any combination.
Blocks can be typed in the following ways:
-
Block type, persisted in the XML document as <p type="myBlock">
-
Future consideration is to allow for Metadata that applies to the block as a whole
and will be persisted as an XML fragment just after the start tag: <p type="myBlock"><meta>...my metadata fragment...</meta><t>...Paragraph text and
inline content...<t><p>
Note that the definition of a block does not allow for the control over order and
occurrence of inlines. Inlines are defined at the section level and apply to all blocks
within a defined section. A potential exception to this rule is <bodyDiv>.
Note that the pattern of <element><meta></meta><t></t></element>
is used consistently in Smart Content for mixed content models. It enables clear
and consistent addressing of the element as a whole, the metadata for the element,
and its mixed content.
Inlines: <tag>
The base element for Inlines is <tag>. Inlines are intended to hold runs of text and
other inlines in any combination. They are used to call out unique semantics for a
phrase within a block or inline.
Inlines can be typed in the following ways:
-
inline type, persisted in the XML document as <tag type="myInLine">
-
Metadata that applies to the paragraph as a whole persisted as an XML fragment just
after the start tag: <tag type="myTag"><meta></meta><t>...inline text and element content...</t></tag>
Note that the definition of an inline does not allow for the control over order and
occurrence of nested inlines. Inlines are defined at the section level and apply
to all blocks and inlines within a defined Section. An exception to this rule is the
use of Block Combinations.
Miscellaneous Elements: <table>, <image>, <video>, <xref>, lists, emphasis, etc.
These elements are common to most XML document markup languages. Smart Content mainly
follows XHTML markup for these elements. The exception is that Table is currently
a modestly modified version of the CALS Exchange table model. The table modification
is to support the capture of additional styling information such as would be generated
when converting an MS Excel table to CALS Exchange.
A few additional notes on objects:
-
Block objects (table, image, lists, etc.) can be used anywhere a block can be used
within a Section
-
inline elements (xref, emphasis) can be used anywhere an inline is allowed: in a text
run
-
<image> is treated as an inline, but can be expressed as a block by being the only
child of a block
-
Lists are <ol> and <ul>; there is a future consideration to allow for types of lists
and additional structures including multiple paragraphs per list item, or specific
content models such as might be used for a definition list
-
A list can be a child of list item so that nested lists can be expressed
Metadata: <meta>
User defined element metadata is defined using XML element structures. The <meta>
element is allowed after the start tag of many base types including sections, inlines,
and tables. The use of XML fragments for capturing metadata removes the need to escape
XML processing tools in order to capture and persist complex metadata structures.
To further simplify implementation, meta has a content model that enables the automatic
creation of a user interface to capture or view metadata.
Note: At this time, XML attributes are limited for use by the Smart Content processing
system to support typing of base elements and other system metadata such as element
ID. Under consideration is the use of additional system-level attributes for specific
Smart Content purposes. One example being considered is a "level" attribute that would
support the free-form use of increase/decrease indent of blocks. This could enable
the creation of outline-like structures without requiring the overhead of additional
nested Section definitions whose only difference is their position in an explicit
or implicit hierarchy. The value of reducing containment structure overhead for authoring
usability and performance may be significant.
Simple attributes are defined:
<meta><attribute name="system"><value>disclosure</value></attribute></meta>
Multi-value attributes:
<meta><attribute name="outputs"><value>print</value><value>web</value></attribute></meta>
A Collection element with Member elements can be used to build a repeating metadata
structure:
<meta>
<collection name="contributors">
<member name="contributor">
<attribute name="role">
<value>Supervisor</value>
</attribute>
<attribute name="name">
<value>Sam Markup</value>
</attribute>
</member>
<member name="contributor">
<attribute name="role">
<value>Legal</value>
</attribute>
<attribute name="name">
<value>Marcia Tag</value>
</attribute>
</member>
</collection>
</meta>
A Group element can be used to allow for a presentation that highlights a set of related
metadata such as drawing a box around them with a shaded background:
<meta>
<group name="CompanyInfo">
<attribute name="company"><value>Quark Software Inc.</value></attribute>
<attribute name="phoneNumber"><value>+1 303-894-8000</value></attribute>
</group>
</meta>
Under Consideration: Block Combinations and Simple Block Combinations: <bodyDiv>
and <simpleBodyDiv>
The base element for Block Combinations is <bodydiv>. Block Combinations, like Section,
defines a group of blocks. Unlike Section, they enable fixed content models of blocks
and inlines with both order and occurrence control. Block Combinations would not follow
the default processing of Sections and therefore would not be used to generate an
overall document navigation structure like a TOC. Block Combinations might be used
to generate a "list of typed Block Combinations."
An example Block Combination is a traditional formal "Figure" which has a title element,
an image element, and a caption element. As a group, Figure should be considered
one object such that add, delete, copy, paste actions always includes all three elements.
Smart Content currently has a figure element:
<bodydiv type="figure">
<p type="title"><t>Title Text</t></p>
<p><t><image cx="80%" cy="" href="qpp://assets/110"/></t></p>
<p type="desc"><t>Description Text</t></p>
</bodydiv>
Under consideration is the generalization of this idea such that Block Combinations
can be typed and the RNG allow for full content model control within a <bodydiv> structure
While Figure is a simple example, Block Combinations could be extended to support
a variety of other models including authoring slide-shows (list of images or figures
with an expected output behavior), input forms, and more. It's a general mechanism
for any processing system to understand that the block combination fragment is special
and has special processing requirements.
For an XML schema developer, the obvious question might be, "Why only allow control
over block and inline order and occurrence in block combinations? In all other document
schemas these are available for the entire model." The answer is twofold: Authoring
usability is a primary and fundamental goal, so Smart Content "encourages" the limited
use of restrictive content models by making the definition of such a model an exception
rather than the norm; second, Block Combinations are a powerful concept for an entire
class of content types that can trigger custom user experiences in authoring, publishing,
and interactive consumption.
Simple Block Combinations are more like HTML Div elements: they would enable authors
to "wrap" any existing collection of blocks and provide a semantic type to the collection.
One example is a Sidebar where the content allowed in a Sidebar is any content that
is allowed within the Section. So there would be no additional content model definitions
within a simple block combination. The markup under consideration is:
<simplebodydiv type="sidebar">[any blocks and inlines allowed in section]<simplebodydiv>
Summary
The Smart Content Schema is both simple in its design and powerful in its flexibility.
With a focus on the user, the model targets a broad adoption of XML tool sets and
the simplification of adding intelligence into authored content. Successfully addressing
the needs of non-technical, business users and occasional writers requires a significant
rethinking of traditional XML content implementations. Some of the changes are difficult
to accept for XML experts and purists [of which, this writer was one]. While there
may be many ways to solve a set of specific problems, the Smart Content methodology
is one opportunity to improve on traditional uses of XML with a new audience in mind.
We invite you to share your feedback on the model as well as any interest in working
with us toward standardization of the model.
Smart Content Schema Sample
The technical details of the Smart Content Schema are less important at this stage
then the goals and methodology. However, codifying the methodology required an evaluation
of multiple schema languages and RNG was selected for its very flexible and powerful
support of inheritance. RNG enables a natural implementation of the base elements
and intended configuration types.
The root schema is currently modularized as Smart Content (base section model, base
p block, and root of the schema), Smart Meta (for meta content model definitions of
attribute, value, collection, member, group), and Smart inlines (for tag definitions).
A sample RNG for a typed content model of a Section named "SOP" is defined:
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<include href="../SOP Purpose/SOP Purpose.rng"/>
<include href="../SOP Scope/SOP Scope.rng"/>
<include href="../Procedure/Procedure.rng"/>
<include href="../SOP Legal/SOP Legal.rng"/>
<include href="../SOP Background/SOP Background.rng"/>
<start combine="choice" >
<ref name="sop"/>
</start>
<define name="sop">
<grammar>
<include href="Smart-Section.rng">
<define name="section-type">
<value>sop</value>
</define>
<define name="para-types">
<parentRef name="sop.para-types"/>
</define>
<define name="tag-types">
<parentRef name="all.tag-types"/>
</define>
<define name="section-tags">
<parentRef name="all.section-tags"/>
</define>
<define name="content-model">
<parentRef name="purpose"/>
<parentRef name="bginfo"/>
<parentRef name="scope"/>
<oneOrMore>
<parentRef name="procedure"/>
</oneOrMore>
<oneOrMore>
<parentRef name="legalnotice"/>
</oneOrMore>
</define>
<define name="section-meta">
<parentRef name="lang"/>
<parentRef name="audience"/>
<parentRef name="keywords"/>
<parentRef name="contribs"/>
<parentRef name="dates"/>
<parentRef name="organization"/>
<parentRef name="permissions"/>
</define>
<define name="para-meta">
<parentRef name="keywords"/>
</define>
</include>
</grammar>
</define>
<!-- to be created in a common file with choice option as in meta tags file-->
<define name="sop.para-types">
<choice>
<value>heading</value>
<value>note</value>
<value>lq</value>
</choice>
</define>
</grammar>