How to cite this paper
Usdin, B. Tommie, Deborah A. Lapeyre, Laura Randall and Jeffrey Beck. “Graceful Tag Set Extension.” Presented at Balisage: The Markup Conference 2016, Washington, DC, August 2 - 5, 2016. In Proceedings of Balisage: The Markup Conference 2016. Balisage Series on Markup Technologies, vol. 17 (2016). https://doi.org/10.4242/BalisageVol17.Usdin01.
Balisage: The Markup Conference 2016
August 2 - 5, 2016
Balisage Paper: Graceful Tag Set Extension
B. Tommie Usdin
Mulberry Technologies, Inc.
B. Tommie Usdin is President of Mulberry Technologies, Inc., a consultancy
specializing in XML and SGML. Ms. Usdin has been working with SGML since 1985
and has been a supporter of XML since 1996. She chairs Balisage:
The Markup Conference conference. Ms. Usdin has
developed DTDs, Schemas, and XML/SGML application frameworks for applications in
government and industry. Projects include reference materials in medicine,
science, engineering, and law; semiconductor documentation; historical and
archival materials. Distribution formats have included print books, magazines,
and journals, and both web- and media-based electronic publications. She is
co-chair of the NISO Z39-96, JATS: Journal Article Tag Suite Working Group. You
can read more about her at http://www.mulberrytech.com/people/usdin/index.html
Deborah A. Lapeyre
Mulberry Technologies, Inc.
Ms Lapeyre is an XML architect; a teacher of XML, XSLT, and Schematron; an expert
in XML vocabulary design and DTD and schema development. She has been developing systems
that manipulate tagged documents since 1980, working with SGML since before it was
standardized, and with XML from the beginning. Ms. Lapeyre was one of the principal
architects and the lead writer of the NLM DTDs and now plays that role for JATS, BITS,
and NISO STS. She has designed tag sets for encyclopedias, semiconductor specifications,
collections of historical materials, and technical documentation for tractors and
heavy equipment.
Laura Randall
Laura Randall is a Technical information Specialist at the National Center for Biotechnology
Information at the US National Library of Medicine. She has been involved with markup
languages since late last century and currently spends her time on the PubMed Central
project. Her most notable achievement of late is receiving the designation Bringer of Food
from her three rescued black cats, Vader, Tater, and Spud.
Jeffrey Beck
Jeff Beck is a Technical information Specialist at the National Center for
Biotechnology Information at the US National Library of Medicine. He has been
involved in the PubMed Central project since it began in 2000. He has been
working in print and then electronic journal publishing since the early 1990s.
Currently he is co-chair of the NISO Z39.96 JATS Standing Committee and is a
BELS-certified Editor in the Life Sciences.
Laura Randall’s and Jeffrey Beck’s contribution to the Work was done as part of their
official duties as NIH employees. Consequently, this Work is in the public domain;
no copyright may be established in the United
States. 17 U.S.C. § 105. If Publisher intends to disseminate the Work
outside the U.S., Publisher may secure copyright to the extent authorized
under the domestic laws of the relevant country, subject to a paid-up,
nonexclusive, irrevocable worldwide license to the United States in such
copyrighted work to reproduce, prepare derivative works, distribute
copies to the public and perform publicly and display publicly the work,
and to permit others to do so.
Abstract
Tag Sets, or XML Vocabularies, are often created from other Tag Sets or
Vocabularies. Users expect significant efficiencies from using derived or
based on
vocabularies, including the ability to intermingle the documents in
databases, to use tools created for the original Tag Set with minimal additional work,
and to adopt
rendering/formatting applications and change only those aspects specific to the new
vocabulary. Some model changes create compatible documents, which can
interoperate with documents tagged to the source specification gracefully. Some model
changes
are disruptive. We discuss what types of changes can be integrated into
existing XML environments and which may be disruptive.
Table of Contents
- Adopt and Adapt
-
- Adoption
- Adaptation
- JATS Compatability Guidelines
- Things that Must Match to Maintain Compatability
-
- Respect the Semantics
- Use the Same Style of Nesting/Recursion for Sections
- Maintain Distinction Between Elements and Attributes
- Whitespace Handling
-
- Element-like whitespace
- Data-like whitespace
- Preserved whitespace
ID
, IDREF
, and IDREFS
- Alternatives or Media-specific Content
- Things that Don’t Seem to Matter in Compatible Modeling
-
EMPTY
Elements versus Contenting-containing Ones
- Has Metadata
- Is Metadata
- Sections and Section-like Structures
- Role in the Document
- Attribute Value Types (other than
ID
and IDREF
)
- Conclusions
We live in a time of Tag Set extensions.
There was a time when organizations planning a conversion to XML, or planning to move
a
new document type to XML, assumed that the process would involve creating a tag set
for that
document type. The costs of creating that new tag set usually included an outside
expert to
create and document the tag set, internal subject experts to assist in document
analysis, and programmers to customize the editing, database, and formatting tools
to work
with the new tag set.
Now, the assumption is that for any new XML application there is an existing public
tag set that meets their needs, or meets them closely enough. Most
organizations don’t consider a new bespoke tag set, and some consider the choice of
public
tag set so obvious that they don’t waste time exploring other options. Even among
those who do explore their options, the default assumption seems to be that there
is a model
they can adopt to meet their needs. Many publishers with older bespoke tag sets have
converted to a public one.
There are a lot of good reasons to adopt instead of developing from scratch. The most
important are:
Cost to Develop and Document
|
Vocabulary development in real-life complex domains is a multi-year
multi-person project that requires the time and skills of subject matter experts
as well as XML expertise. Costs include identifying not only the
structures and types of information that are key to the expected data usage of this
community, but also structures that are common in documents and needed for the applications
and publications to be made from these documents.
A group developing a subject-specialized vocabulary in a subject area is
likely to do a better job modeling aspects relating to the subject matter
than to normal prose structures — partly because specialists are more interested in
their own subject matter and partly because modeling common prose structures
is likely to feel like a waste of time to them. We have seen subject matter
experts sigh, turn on their phones, or even leave the room when a lively
discussion of the metadata needed to identify the subject of one of their
reports turned to a discussion of what types of lists they would need in the
prose portions of the same documents.
Also surprising is the costs and time required
to document a vocabulary well enough that tagging and usage will be consistent.
Adoption of a vocabulary out of the box enables a community to avoid all
of these costs. Adoption and adaptation enable the community to spend its
energy (and time and money) modeling only those structures that are unique
to the community and to document only the new or revised structures.
|
Cost of tool customization
|
While it is possible to create XML documents using XML editing tools out
of the box, and it is possible to store, search, and retrieve XML documents
using an XML database as it is shipped, neither of these provides an
attractive user experience, especially for people who are not very
comfortable with the syntax of XML. There is a significant investment in customizing
tools to work with XML documents. Some of these customizations are specific to the
type of document, but many are
specific to each element, element in context, or element with attribute value.
Users of a new tag set can save a significant amount of time and money if they do
not have to tell their editing tool when elements should be displayed to an editor
as blocks and which as in-line; which are list items, and what text should be generated
on display. Similarly, if they do not have to tell their database which elements contain
non-textual material (such as TeX) and which should be considered higher value for
search result ranking (perhaps titles and table column heads) a lot of set-up time
can be saved.
|
Cost of formatting and display development
|
Technically, it is also possible to format an XML document for human
consumption without customizing the formatting software, but is it unlikely
that the documents will be recognizable or useful. One of the major
advantages people hope for when adopting a vocabulary is to be able to use,
or at least start from, existing formatting applications to make common display formats
such as HTML and PDF.
|
Availability of experienced staff and vendors
|
It is far easier to work in an environment in which one can hire
experienced staff and in which service vendors are familiar with your
requirements. Of course, you could train all of your staff members from
scratch, but that takes time and resources and significantly increases the
loss when they leave. Similarly, if you develop an XML vocabulary from the
bottom up, you will be able to find vendors to create, manage, and host your
documents, but you will have to pay them to learn your vocabulary and needs,
pay them to train their staff, and pay them to customize their tools and
processes. If you adopt an existing vocabulary, you will have to work with
your staff and vendors on any variations you prefer and teach them about any
customizations you have made.
|
Pressure from tool vendors, service suppliers, and XML community
|
XML is rarely created and used strictly in-house any longer. There are numerous
partners who will be involved in creating and using it including: tagging vendors;
publishing
partners; and aggregators. Using a tag set that is familiar to these partners
simplifies these relationships and may significantly reduce costs and errors because
there is less need to explain the XML model and how it is used and less need for exception
processing. (Many organizations choose a particular vocabulary because a particular
vendor requires it or a particular tool creates or ingests it.)
|
Adopt and Adapt
There are some situations in which users, and whole user sectors, can adopt an XML
model and use it comfortably. However, in many cases, it is more accurate to describe
the process as Adopt and Adapt
than simply Adopt
.
Adoption
A user who has exactly the situation envisioned when a tag set was developed may well
be able to simply use it. A user who wants to encode their system manuals in XML may
find
DocBook works well for them as published, and they will gain the added value of being
able to
use existing user interface layers on tools and formatting stylesheets.
Similarly, a user who want to send their journal articles to an archive or document
repository may be required to use JATS (ANSI/NISO z39.96-2015), and may even be provided
with guidelines that specify
which of the JATS tag sets and how they should use optional features.
A user who wants to participate in an existing data interchange process may be
required to use the tag set used by the existing participants regardless of comfort.
For
example, a user who wants to include their poster and pamphlet content in a publication
locator service based on XML-tagged technical reports will have to find a way to tag
those posters and
pamphlets using the vocabulary used for the technical reports.
Adaptation
A community that wants to begin interchanging XML documents may find that there is
no
existing tag set and community of practice that exactly meets their needs. Some members
of the community may be using XML, but if they have not worked together when developing
their
practices it is likely that they have different approaches. Even if the individual
members have adopted public models, they may not have adopted the same public
model.
The Standards Community is an example of a community that is currently working on
developing a shared XML model
for interchange of documents among the participants. For an
excellent overview of this process, see NISO STS Project Overview and Update
[Wheeler et al. 2016]. In this case,
various members of the community already use DocBook-, DITA-, XHTML-, and JATS-based
models, and at least one has done a TEI-based pilot project. None found any of the
public
models met their needs out-of-the-box; all adapted the models they had adopted. This
community is now working to create an interchange tag set that will serve all of their
needs. They are starting with a tag set created by one of the participants (ISO) that
was developed by adopting and adapting JATS [ISO 2016]. This process is,
we believe, typical of the way shared tag sets are being developed now.
Public models have been developed and documented with the assumption that they will
be
adapted. NIEM describes itself as a framework
and provides tools for
domains
to use to develop information exchange
packages
[NIEM 2016]. DITA includes the Specialization
feature, which
enables users to extend the tag set and use DITA processors that are unaware of the
extension [Eberlein et al. 2010]. The Text Encoding Initiative Guidelines describe
clean
and unclean
modifications and provide a tool for
creating extended TEI-based vocabularies [TEI 2016]. JATS documents how the
tag sets can be modified [NCBI 2015] and provides terminology to identify and distinguish between
JATS-Based and JATS-Conforming extensions [ANSI/NISO 2015].
As users adopt (by choice or fiat) a customizable model and begin to adapt that model
to meet their needs, they are faced with decisions that may have far-reaching
consequences. It is not uncommon for users to come to regret customization decisions
made early in the
adaption process. In some cases, there is considerable
discussion of options, and a choice is made between what are known to be imperfect
options. In other cases, however, the customizers do not even know that they
are creating problems for themselves and their users down the line.
If your adapted tag set is for use in isolation, most of these guidelines are
irrelevant to your project and usage. If you intend to craft or customize tools as
needed and are unconcerned
about how your adapted tag set will work with existing tools, others of these guidelines
are irrelevant. If you are going to train all of the people who will create, manage,
use, and archive your documents, others of these guidelines are irrelevant. If you
and
your documents are on a technologically isolated deserted island and expect to remain
so, none of this matters to you; do what you want as you want.
Most tag set adapters want the documents that use their adopted/adapted tag set to
play
nicely with others. They want to able to store their documents in databases alongside
documents tagged with the source tag set or other adaptations of it and to be able
to
search them all as one coherent collection. They want to be able to use tools such
as
editors with customized user interfaces by adding only those features needed for the
new
structures in their documents. They want to be able to use formatting and display
tools
for the existing documents by adding handling for any new structures, if that. (With
a
DITA specialization, even that should be unnecessary).
JATS Compatability Guidelines
We, the authors of this paper, have been inspired by the ways in which JATS is being
extended, and we are occasionally surprised by problems people who have adapted JATS
have
reported. We have been drafting a set of Guidelines [Usdin et al. 2016]
for people extending JATS to help them understand which adaptations will integrate
gracefully into existing JATS environments and how to tell if an adaption might bite
them later. To our surprise, this was not always obvious. Many types of adaptation
that
we initially assumed would be problematic seem to be fine, and a few types of changes
that seem innocuous can create significant surprises at later stages of the document
life
cycle.
The principles articulated in this paper are based on the work done to develop the
JATS Compatibility Guidelines [Usdin et al. 2016], and many of the examples
are taken from JATS and the JATS Compatibility Guidelines. However, readers who intend
to create a JATS-compatible tag set are referred to those Guidelines; this paper is
not
a substitute for those Guidelines. We also hope that the JATS work and the thought
that went into
creating those Guidelines is more widely applicable.
Things that Must Match to Maintain Compatability
Respect the Semantics
Starting from first principles, when using or extending a tag set, respect the
semantics of the starting structures. This should be obvious, but an amazing number
of XML users think that they are doing no harm by repurposing an element or
attribute they would not use for the original purpose.
They don’t call it tag abuse, but that is what it is. Sometimes blatant,
sometimes with a story justifying bending
the meaning of a structure
for convenience, tag abuse is rarely a good short term strategy and virtually always
a bad long term strategy.
Tag abuse
is using an element or attribute for content for which it was not
intended. Tags are abused when users are trying to control display. For example, it
is common
to use several empty <p>
elements in HTML to produce some blank
space on the screen. There are not several empty logical paragraphs in the document,
this is tag abuse to achieve screen formatting. Similarly, using a block-quote
element to emphasize instructions, making them stand out from the prose around them,
may achieve an acceptable display at the cost of junking up searches for
block-quotes and hiding the content from a search for instructions.
If you need to store the country in which some people live and you don’t use
the phone number element for foreigners you could put their country names in the
phone number element. We have seen this done. So, what happens when you start to
validate phone numbers? Or when you decide that you can make phone calls across
state lines and need a place to put the phone numbers for those people? Can your database
list the countries for all authors? What about
when a formatting engine inserts the usual punctuation for a phone number into those
country names and displays them?
If your starting tag set has a tag called <state>
for state
or province
do not create an attribute called @state
with
the possible values solid
, liquid
, gas
,
or plasma
. Your state
does not technically infringe on
the original state
, but it will confuse people. Call your attribute
@state-of-matter
or some such.
Sometimes tag abuse happens from a coincidence of names — when a new user does not
check the semantics and is misled by a homophone. Oh, they think, I need an element for what the witness said at the trial and there is a <statement>
element
, not noticing that <statement>
is defined as a logical proof or hypothesis.
Use the Same Style of Nesting/Recursion for Sections
There are, generally speaking, three styles of modeling nested sections in XML:
In the recursive model, sections contain sections, which can contain sections,
which can contain sections. Display styling of the section headers is based on
analysis of the location of the section in the section hierarchy.
In the nested-with-explicit-levels model, sections level 1 may contain sections
level 2 which may contain sections level 3, etc.
In the non-nested with explicit levels model, sections level 1 may be followed by
sections level 2 which may be followed by sections level 3, but these may come in
any order and are not nested.
The section logic is fundamental to complex prose documents, and mixing section
logic in the same environment creates the opportunity for significant confusion. People,
and
software, can get very confused if it is not clear, i.e., whether
sections are nested or not; whether the level of nesting should be computed from the
level of sections in which a section is contained or derived from the name of the
section. Worst of all is a model in which sections that have explicitly named levels
are sometimes nested at other levels. (Yes, this does occur in real documents.)
Maintain Distinction Between Elements and Attributes
In the XML world there are people who argue that the distinction between elements
and attributes is arbitrary and that, since it is easy to transform one to the other
using XSLT, vocabulary developers should feel free to use either at any time for any
purpose. This may be so if the vocabulary is being developed in a vacuum, but if a
new or modified vocabulary is intended to interoperate with another vocabulary, this
is very much not so! While attributes are often used to control display, and their
values may be used either to prompt selection of generated text or be displayed,
their use in display is significantly different from element content. Similarly, while
there are times when element content is not displayed, the default in most
(text-based) applications is that element content is displayed to the reader.
In most databases, attribute values are indexed, searched, and displayed differently
from element content. Also, in most XML editing systems, attribute values are entered
and
displayed differently from element content.
If content in the source vocabulary is element content, keep it as element
content. If it is attribute content, keep it as attribute content. If there is a
need in a new vocabulary to change the form of content in a source vocabulary from
element to attribute or vice versa, we recommend using a different name for the new
structure and documenting its relationship to the content in the source vocabulary.
Whitespace Handling
In XML, some whitespace is significant and some is insignificant. How whitespace is
handled has serious impact on the
ability to re-use tools among documents in a heterogeneous collection. If elements
in a tag set
extension do not have the same whitespace handling properties as the display tools
were developed to expect, there will be
unfortunate (and in some cases surprising) effects on the display of the document
content.
Three whitespace handling types are listed below. A compatible tag set extension must
not change
the whitespace handling type for any existing element.
Element-like whitespace
Content models that contain only elements (no characters) have insignificant whitespace.
That is,
XML tools may create or destroy whitespace in these models with, by definition, no
effect on the document, how it is handled, or how it is displayed.
Data-like whitespace
Content models that contain character data or mixed content contain significant whitespace.
That is,
XML tools may fold the whitespace (collapse multiple whitespace characters into a
single space character),
but they may not create or destroy any whitespace nodes.
Preserved whitespace
Content models defined as preserve whitespace are character or mixed content models
where the
whitespace nodes must not be folded. Each whitespace character in the XML must be
preserved. Usually
this is used for alignment of code or other preformatted
content.
ID
, IDREF
, and IDREFS
Rendering and behavior, especially the rending and behavior of links, is often dependent
on the ID
/IDREF
relationship. If an attribute that has a type of ID
in the source vocabulary is
changed to any other type, rendering tools may not process the links
appropriately.
Changing from IDREF
to IDREFS
or vice versa is not a concern. The number
of pointers will not affect compatibility. Changing the direction of the pointer or
obscuring the pointer is the
concern here.
We have actually seen one instance in which a user reversed the uses of ID
s and
IDREF
s, creating documents that looked similar to those in the source vocabulary.
The result was chaotic; it turned out that the XSLT that created the HTML version
of
these documents relied on the ID
/IDREF
mechanism MOST of the time, but occasionally
simply treated the attribute values as values. So, SOME of the links worked as
expected and some did not. (On further thought, this is as much the fault of an
inconsistent transformation as a surprising document; all of these links should probably
have failed!)
Alternatives or Media-specific Content
In the world of prose documents, it is assumed that the reader should
have access to all content. However, there are situations in which that is not the
case. For example, it is common to provide several versions of the same graphical
object: one for high resolution or full-screen display, one for display on small
devices such as hand-helds, a thumbnail for navigation, and perhaps a very high
resolution or black & white version for print. In counting the number of figures
in a document, this figure should be counted once — not as many times as there are
media- or use-specific versions — and only the most appropriate for the display media
should be rendered. Similarly, it is becoming common for journals to
publish author names both in the language and script of the journal and in the
language and script of the author’s home environment. This person should be counted
only once in specifying the number of authors of the paper and, more importantly,
this paper should only count once when calculating the author’s influence.
Any structure in the original vocabulary that is provided to wrap two or more alternative
structures, must be used in the same way in all compatible vocabularies.
Things that Don’t Seem to Matter in Compatible Modeling
In drafting the JATS Compatibility Meta-Model
Description [Usdin et al. 2016] we considered quite a few areas of conformance that, on
further examination, proved to be unnecessary to create document models that were
compatible for our purposes. There are recognizable, classifiable distinctions that
just
turn out not to matter for these purposes.
EMPTY
Elements versus Contenting-containing Ones
One obvious element differentiator was EMPTY
elements versus those with #PCDATA
, element, or mixed content.
Element content is indeed unique, but data characters, mixed content, and EMPTY
are all the same, since characters
are, by definintion, optional in XML. An elmement with a #PCDATA
model or mixed content may have nothing in it, and will look the same
as an EMPTY
element in the document. Thus, the following categories are uninteresting in this
context:
Structures that contain character data only
|
Elements that may not have internal markup. In many tag sets, Date may not have internal
markup.
|
Structures that contain character data and phrase-like
structures
|
Paragraph is often allowed to contain character data and phrase-like structures such
as Italic, Place Name, or Cross Reference, but not allowed to contain larger nesting
structures such as lists and figures.
|
Structures that contain character data, phrase-like structures, and
block-level objects
|
In some tag sets there are structures that may contain character data, phrases, and
block-like structures. For example, paragraphs may be allowed to contain lists, boxed
text, display equations, block quotes, tables, or figures.
|
Has Metadata
Some structures (whole documents, authors, boxed-text, appendices) may have
metadata, and there are other structures that are unlikely to have metadata (italic,
break, address-line). However, on analysis, we found that there are circumstances
in
which almost any structure could have metadata (at least an ID
or IDREF
that associates this structure with others), and that this does not affect interoperability
as we were
looking at it.
Is Metadata
In many tag sets, some elements are only used in the metadata of a document (journal
in which published)
while others are only used in the narrative text (figure). But in most tag sets there
are many elements
that can be used both to describe the document in which they occur and to describe
other documents (copyright,
digital identifiers, publication date), so this distinction is not just unimportant,
it often changes over time.
Sections and Section-like Structures
It seemed intuitively obvious that an element that had the section structure in
one vocabulary should have a section structure in a compatible vocabulary. That, for
example, if a Boxed-text could contain not only paragraph-like structures but also
nested headed sections in a source vocabulary, it should in any compatible
vocabularies. But since those nested sections are, or could be, optional in the
source vocabulary, documents without them can clearly be handled by the tools and
formatter because we believe that a subset of an element model is always conforming.
Thus, it is not necessary that compatible vocabularies allow nested
sections in all of the places that the source vocabulary does.
Conversely, we considered that nested sections be allowed only in the places where
they are allowed in the source vocabulary, and found that this, too, is not a
requirement. If a tool or format is data driven (what in XSLT-speak is called
push-processed), it should be able to accommodate sections that have the same style
of
sections as are already present in the vocabulary even in new locations.
Role in the Document
Structures can easily be grouped by their role in a document, and it is tempting to
think that structures must play the same role in all document types in order to be
compatible. We found that this is not so, and that while it might be interesting to
group structures by their roles in documents, these roles do not seem to affect interoperability:
Paragraph-like
|
Elements that may be used at the same structural level as a Paragraph (<p> ), for example, inside of a section. This would include many block-level structures
such as figures and tables.
|
Preformat-like
|
Elements that have the Preserve whitespace model, which is often used for Code and
sometimes for poetry.
|
Emphasis-like with Toggle
|
Inline elements that may be toggled on and off with recursion. In some tag sets, Italic
toggles. That is, if an Italic tagged phrase appears in a context that would be displayed
in italic anyway, the Italic tagged phrase is NOT displayed in italics to retain the
typographic emphasis.
|
Emphasis-like without Toggle
|
Inline elements that do not toggle on and off with recursion. Some structures must
be displayed as tagged even if the context they are in would have that display. For
example, Sans Serif often does not toggle.
|
Bibliographic Identifier-like
|
Structures that identify the document, such as ISSNs, ISBNs, author names, or volume
and issue numbers
|
Grouping-structures
|
Structures that contain several related structures but that have no formatting consequences
themselves. For example, Article Metadata may be grouped separately from Issue Metadata,
and Keywords may be grouped into a Keyword Group
|
Footnote-like
|
Structures that are generally displayed as footnotes are may include Footnote, Author
Note, Funding Source, and Corresponding Author Address.
|
Milestone-like
|
Elements that are used to identify locations in the document or that are used in pairs
to indicate the start and end of some portion of a document, typically that cannot
be simply wrapped in an element because of overlap problems. Milestones may be Revision
Start and Revision End, or simply Pull Quote.
|
Structures that can have labels and/or titles
|
Many, but not all, block type structures can have labels and/or titles. For example,
Block Quotes, Boxed Text, Sections, Bibliographies, Lists, and Figures can have labels
and titles in many tag sets.
|
EMPTY elements
|
Elements that mark a location in the document or that may have attributes but no
element content.
|
Structures that have accessibility data
|
The ability to provide alternate text or long descriptions may be available for Figures,
Graphics, Equations, Tables, and a variety of other structures.
|
Structures that have attribution and/or permissions or licensing
data
|
Structures such as Articles, Boxes, Sections, Tables, and Appendices may have information
about who wrote them or who may use them and under what conditions.
|
Attribute Value Types (other than ID
and IDREF
)
Even in a DTD, it is possible to type attribute values, and in XSD and RNG attribute
value types can be quite strongly specified. We know (see above) that it is
critical that attributes of type ID
remain of type ID
and that
attributes of type IDREF
or IDREFS
remain of type IDREF
or IDREFS
in order for
documents to be compatible. However, that leaves many other attribute types.
Some processing may be tied to specific values of attributes, and if none of the
expected values are present the processing may fail. For example, if a formatter
renders <styled-content view="GrIt">
as green italic, if that value is not
present the formatter will not render the content in green and italic. However, we
see no disruption from:
-
Adding or removing items from a specified value list
-
Changing a CDATA
attribute to one with a specified value list, or vice
verse
-
Changing a NMTOKEN
or NMTOKENS
attribute to CDATA
or vice versa
-
Changing the value of a #FIXED
attribute or changing a #FIXED
attribute
to CDATA
or a specified value list
We came to the conclusion that most attribute typing is useful in the creation of
correct documents as specified by the content creator, but is not essential to the
storage, management, or rendering of the documents.
Conclusions
The first public draft of the JATS Compatibility Meta-Model
Description [Usdin et al. 2016] was released to the public in
July 2016. We anticipate that the assumptions we have made in this work will be tested
through the process of public review and comment. We hope that we will be prompted
to
improve the content of the guidelines to make them more effective and to improve the
descriptions of them to make them clearer and easier to implement.
Although some of the comaptibility principles we describe such as whitespace handling,
ID
/IDREF
consistency, and maintaining the meaning of object names are applicable for testing
tag set compatibility in general, we were working specificly on compatibility of extensions
to ANSI/NISO Z39.96-2015 JATS.
We welcome your comments on this conference paper and, more importantly, on the document
at the NISO site.
References
[Wheeler et al. 2016] Wheeler, Robert, Bruce Rosenblum, and Lesley West. 2016.
NISO STS Project Overview and Update.
In Journal Article
Tag Suite Conference (JATS-Con) Proceedings 2016. Bethesda (MD): National Center for Biotechnology Information (US). http://www.ncbi.nlm.nih.gov/books/NBK350146/.
[ISO 2016] International Organization for Standardization (ISO). 2016.
Welcome to the ISO Standards Tag Set (ISOSTS).
Accessed April 19.
http://www.iso.org/schema/isosts/.
[NIEM 2016] National Information Exchange Model (NIEM). 2016. NIEM. Accessed
April 19. https://www.niem.gov/.
[Eberlein et al. 2010] Eberlein, Kristen James, Robert D. Anderson, and Gershon Joseph,
eds. December 2010. Darwin Information Typing Architecture (DITA) Version
1.2. Organization for the Advancement of Structured Information Standards
(OASIS) Standard. http://docs.oasis-open.org/dita/v1.2/os/spec/DITA1.2-spec.html.
[TEI 2016] Text Encoding Initiative (TEI). 2016. Personalization and
Customization.
In P5: Guidelines for Electronic Text Encoding and
Interchange. Version 3.0.0. Last modified March 29, revision 89ba24e.
http://www.tei-c.org/release/doc/tei-p5-doc/en/html/USE.html#MD.
[NCBI 2015] National Center for Biotechnology Information (NCBI),
National Library of Medicine (NLM). 2015. Modifying This Tag Set.
Journal Archiving and Interchange Tag Library NISO JATS Version 1.1 (ANSI/NISO
Z39.96-2015). Last modified December.
http://jats.nlm.nih.gov/archiving/tag-library/1.1/chapter/implementor.html.
[ANSI/NISO 2015] American National Standards Institute/National Information
Standards Organization (ANSI/NISO). 2015. ANSI/NISO Z39.96-2015, JATS: Journal
Article Tag Suite, version 1.1. Baltimore: National Information Standards
Organization. http://www.niso.org/apps/group_public/download.php/15933/z39_96-2015.pdf.
[Usdin et al. 2016] Usdin, B. Tommie, Deborah A. Lapeyre, Laura Randall, and Jeffrey Beck. 2016. JATS Compatibility Meta-Model Description. Draft Version 0.7. 32 p. http://www.niso.org/apps/group_public/document.php?document_id=16764&wg_abbrev=jats-sc.
×Wheeler, Robert, Bruce Rosenblum, and Lesley West. 2016.
NISO STS Project Overview and Update.
In Journal Article
Tag Suite Conference (JATS-Con) Proceedings 2016. Bethesda (MD): National Center for Biotechnology Information (US). http://www.ncbi.nlm.nih.gov/books/NBK350146/.