Lapeyre, Deborah A. “Customizing JATS (Journal Article Tag Suite).” Presented at Symposium on Markup Vocabulary Customization, Washington, DC, July 29, 2019. In Proceedings of the Symposium on Markup Vocabulary Customization. Balisage Series on Markup Technologies, vol. 24 (2019). https://doi.org/10.4242/BalisageVol24.Lapeyre01.
Symposium on Markup Vocabulary Customization July 29, 2019
Balisage Paper: Customizing JATS (Journal Article Tag Suite)
Deborah A. Lapeyre
Mulberry Technologies
Deborah Aleyne Lapeyre is a Senior Consultant for Mulberry Technologies, Inc., a consulting
firm specializing in helping their clients toward better publishing through XML, XSLT,
and Schematron solutions. She works with Tommie Usdin as architects and Secretariat
for JATS (ANSI NISO Z39.96-2015 Journal Article Tag Suite), BITS (Book Interchange
Tag Suite), and NISO STS (ANSI/NISO Z39.102-2017, STS: Standards Tag Suite). Debbie
teaches JATS customization, hands-on XML, XSLT, Schematron, and schema/DTD construction
courses, as well as numerous technical and business-level introductions to XML, JATS,
Schematron, and NISO STS. Debbie has been working with XML and XSLT since their inception
and with SGML since 1984 (before SGML was finalized as an ISO standard). In a previous
life, she wrote code for systems that put ink on paper and programmed in, taught,
and documented a proprietary generic markup system named “SAMANTHA”. Hobbies include
Balisage: The Markup Conference and hosting pumpkin-carving parties.
Document Type: The core JATS Document Type is a journal article and the ANSI/NISO JATS Tag Sets
are journal article tag sets, which define XML elements and attributes to describe
the content and/or the metadata of journal
articles. Such articles may include: research articles; subject review articles; non-research
articles; editorials;
letters; product, software, and book reviews; obituaries, and the peer reviews or
author responses included with an article.
Although originally just for journal articles, JATS-based tag sets have been built
for: books (BITS: Book Interchange Tag Suite), standards (NISO STS, ISO STS), technical
reports, conference proceedings, magazines and newsletters, and even posters.
Purpose: Provides common XML format to preserve the intellectual
content of journal articles (independent of format of initial publication)
Expected Uses: Conversion target, archival storage, and interchange
Expected Users: Publishers, aggregators, vendors, web-hosts, libraries, and archives who produce,
interchange, and store journal article content
When: ANSI/NISO Z39.96-2019 JATS: Journal Article Tag Suite (current)
Customization Mechanism: The JATS Journal Article Tag Sets are distributed in DTD form, XSD form, and RELAX
NG form, but they are maintained as DTDs. The customization mechanism for DTDs is
modularization and Parameter Entities, with customization-specific information overriding
JATS-default information. This paper will describe, explain, and illustrate this mechanism.
Specific customization samples are provided in the Appendix Sample JATS Customizations
The Journal Article Tag Suite (JATS) is a growing set of modules which define the elements and attributes from
which new tag sets can be developed. Think of the Suite as a build-a-tag-set kit.
Individual
Journal Article Tag Sets are built using the modules of the Suite and necessary new tag-set-specific modules.
Journal Article Tag Suite for Journal Articles
The core JATS document type is a journal article, and the ANSI/NISO JATS Tag Sets
are journal article tag sets. The JATS standard (ANSI/NISO Z39.96-2019 JATS: Journal
Article Tag Suite) defines XML elements and attributes that describe the content and/or
metadata of journal
articles —including research articles; subject review articles; non-research articles;
editorials; letters; product, software, and book reviews; and included peer reviews
or author responses— with the intent of providing a common format in which publishers,
vendors, web hosts, libraries, and archives can produce, exchange, and store journal
article content.
The intent of the Tag Suite is to preserve the intellectual content of journal articles
independent of the format in which that content was originally delivered. Although
it does not model any particular sequence or textual format, the Tag Suite enables
a JATS user to capture structural and semantic components of existing tagged material.
JATS was originally developed as a conversion target, because, at that time, most
of the large journal publishers used their own proprietary formats and a way was needed
both for them to interchange content with each other and for archives, libraries,
aggregators, hosters, and vendors to accept content readily from all of them.
Originally, JATS was designed for STEM articles (Science, Technology, Engineering,
Mathematics), but it is now used internationally for journals of all disciplines.
In addition, we have seen JATS tag sets built for:
books [BITS (Book Interchange Tag Suite)]
standards [NISO STS (ANSI/NISO Z39.102-2017, STS Standards Tag Suite)]
technical reports
conference proceedings
magazines and newsletters
posters
JATS Article Tag Sets
There are (as of August 2019) three official journal article instantiations of the
Suite, loosely called the JATS ‘Tag Sets’. These tag sets are built from the elements
and attributes defined in the Suite and are intended to provide models for archiving,
interchange, processing, publishing, and authoring journal article metadata and/or
article content.
Archiving (Journal Archiving and Interchange Tag Set) was designed to be a conversion target
from other tag sets. The loose models, no required elements, attributes that accept
any text value, user-defined name/value pairs in the metadata, generic escape-hatch
elements in the narrative content, and information-classing attributes, make it easy
(during an XML-to-XML conversion) to rename structures while mostly preserving both
publishing sequence (reading order) and semantic intent. [Archiving is also known
as “Green” from the color palette of its Tag Library documentation.]
Publishing (Journal Publishing Tag Set, aka Blue) is a moderately prescriptive tag set to provide a standard format for publishers
to regularize and optimize data for their internal repositories, for production processing,
and for the initial XML-tagging of journal articles, usually as converted from an
authoring format such as Microsoft Word. Publishing is also used by archives and vendors
that wish to regularize their content, rather than to accept the sequence and arrangement
presented to them by any particular publisher.
Because this Tag Set is intended to regularize data, the model includes fewer elements
and fewer tagging choices than JATS Archiving. These more limited choices produce
more consistent data structures that provide a single location of information in a
document to simplify searching and that make it easier to display and to produce derivative
products.[Publishing is also known as “Blue” from the color palette of its Tag Library
documentation.]
Authoring (Article Authoring Tag Set) was designed for a user to create a new journal article
using model-driven tools (tools that read, interpret, and apply schemas in real time)
and submit that article to journals for publication consideration. Because Authoring
is optimized for use with tools, it is the most prescriptive tag set in the JATS
Suite. It includes many elements whose content must occur in a specified order and
limits the options for formatting. For example, Authoring does not allow explicit
numbering on list items or citations. These are considered to be formatting decisions
determined by a journal’s editorial style and are not appropriate for inclusion in
the XML by an author. [Authoring is also known as “Pumpkin” from the color palette
of its Tag Library documentation.]
These three JATS Journal Article Tag Sets are available in several flavors:
With MathML 2.0 or with MathML 3.0 (never with both)
With or without the OASIS CALS Table Model (All public JATS tag sets can use an XHTML-based
Table Model)
And all the permutations and combinations of those four choices
Thus there are four Archiving Tag Sets, four Publishing Tag Sets, and two Authoring
Tag Sets. (Authoring does not provide the CALS table option.)
Other JATS-based Tag Sets
In addition to the three JATS Journal Article Tag Sets (10 flavors in all), there
are NISO- or NLM-sponsored tag sets, as well as many non-public subsets and supersets.
BITS (Book Interchange Tag Suite
NISO STS (ANSI/NISO Z39.102-2017, STS: Standards Tag Suite)
ISOSTS (ISO Standards Tag Set)
BITS (Book Interchange Tag Suite)
BITS is JATS for books: a superset of JATS Archiving intended for journal publishers
who are already using JATS. BITS has two top-level elements: <book> and <book-part> (a book part is a major component of a book, called something like chapter, module,
lesson, part, etc.) BITS grew from demand for a JATS-compatible Book Model by JATS
users who also publish books and want to maintain their books using the same structure
and semantics when possible. BITS enables JATS users to use familiar (JATS) tools
for books, mix books and articles in databases and presentation systems, use JATS
articles as book content (e.g., as a chapter or a section in a chapter), manage large
books in multiple files, or publish collections of books (e.g., series).
BITS is a much larger tag set than JATS, although the narrative content is largely
the same for both. BITS adds book-specific metadata and is more flexible than JATS,
because there is more variety in books than in articles. BITS adds XInclude to accommodate
larger documents managed in pieces, and BITS can support cut-&-paste from JATS (i.e.,
a JATS <article> can become a BITS <book-part> with a few tweaks).
There are two public BITS Book Tag Sets, one using only XHTML-inspired tables and
one using OASIS CALS tables. BITS only uses MathML 3.0. BITS is funded by NCBI for
NLM and used in the NLM Bookshelf project.
NISO STS (ANSI/NISO Z39.102-2017,
STS: Standards Tag Suite)
NISO STS describes the metadata and the full content of normative standards documents
(international, national, organizational, and SDO-produced). NISO STS is intended
for standards publishing and interoperability and may also be used for non-normative
materials such as guides and handbooks,
although it was not designed for non-normative material. NISO STS has two top-level
elements: <standard> (for standards documents) and <adoption> (for standards documents adopting and embedding other standards documents).
NISO STS was based on ISOSTS, which was based on JATS Publishing. There are two NISO
STS Tag Sets available: the Interchange version (without CALS tables) and the Extended
version (with CALS tables), each of is available with either MathML 2.0 or MathML
3.0. NISO STS is funded by jointly by the American Society of Mechanical Engineers
(ASME) and ASTM International with support by NISO.
JATS History
JATS roots go back to 1985 and the very first DTD (an SGML DTD) ever published. It
was produced (in advance of any SGML parsers) by the Association of American Publishers
as one of three tag sets (journal articles, books, and journals). The AAP DTD (also
known as the AAP Electronic Manuscript Standard and the AAP/EPSIG standard) was ratified as the U.S. standard ANSI/NISO Z39.59 in 1988. By 1993, the AAP Article DTD had metamorphosized into the international
ISO 12083 Electronic Manuscript Preparation and Markup, which was reworked over the next few years and reemerged as ANSI/NISO 12083-1995, which was a nearly complete rewrite of the original ISO 12083.
In December 2001, the Harvard University Library under a grant from the Mellon Foundation
commissioned a report to address the feasibility of developing a common structure
(model/tag set) that could reasonably represent the intellectual content of journal articles. The resulting E-Journal Archival DTD Feasibility Study for the Harvard University E-Journal Archiving
Project came to the conclusion that yes, a single model/tag set was possible and probably
desirable, but that a model to meet that need did not then exist.
In 2002, NCBI (National Center for Biotechnology Information) of the National Library
of Medicine began work on a single model for journal articles, thereafter called the
NLM DTD, based on the 1998-2002 PMC DTD that had been written for PubMed Central.
This DTD was (and is) in use and widely adopted even outside the realm of STEM articles.
NLM gave NISO the NLM DTD to use as a start to JATS and a NISO JATS Working Group
was formed. The first JATS (NISO Z39.96.201x version 0.4) was released by NISO on
March 30, 2011 as a Draft Standard for Trial Use. There was a 6-month public Comment Period and after the comments were resolved,
the JATS Working Group released the JATS version 1.0 in 2012. (ANSI NISO Z39.96-2012).
JATS became a continuous maintenance standard at NISO, under the JATS Standing Committee.
which resolves user requests and publishes Committee Drafts once or twice a year.
JATS 1.1 was issued in 2015 (JATS 1.1 ANSI NISO Z39.96-2015), and JATS 1.2 was issued
in 2019 (JATS 1.2 ANSI/NISO Z39.96-2019)
JATS Adoption
JATS is how much of the world publishes/interchanges journal articles. JATS is in
use in at least 25 countries world-wide (US, UK, Germany, France, Australia, Japan, Russia, Brazil,
Egypt, etc.), and most middle-sized and small publishers world-wide publish journal
articles in JATS. All of the huge publishers, who typically use their own bespoke
tag sets, produce JATS for interchange or archival deposit. Many public archives accept
(or require) JATS, for example Australian National Library, British National Library,
Europe PMC, ITHAKA/JSTOR, Library of Congress (US), PubMed Central, SciELO, and many
others. Conversion vendors all know how to handle JATS; numerous web-hosting and service
vendors require or support JATS; and there are more tools and products written every
day for authoring in JATS and conversion from Microsoft Word to JATS.
As Jeff Beck of NCBI said at JATS-Con (the JATS user conference) in 2017:
JATS is no longer one of the cool kids;
it’s just what you do if you have journal articles.
2. Customizing JATS
Why is JATS Customizable?
The JATS/NLM customization mechanism was designed in the 1980s to synchronize the
maintenance of tag sets for organizations that needed to maintain multiple highly-interrelated
tag sets. The idea was that most structural models (figures, tables, lists, sidebars)
and most inline elements should be the same across all tag sets, with only a few differences
where necessary. The top-level elements and their direct descendants would be different.
As an example, consider the following situation. A single organization needs to develop
and maintain 25 related tag sets:
6 for journals (all different),
two for magazines (one online, one print),
one for newsletters and email,
two for standards,
one for conference proceedings, and
13 more, largely for various types of books and pamphlets.
Each of the 25 document types is defined in a separate DTD; all DTDs
share the modular library of components. The customization mechanism must allow the
organization to change models whenever they need to but only when they want to.
A modular system with modular overrides allows the organization to localize changes
in a very few files. Then a simple find-in-files mechanism, from an editing tool or
the operating system, can tell them that, for example: amongst these 25 related vocabularies, there are 6 variations on this element and
here are the exact Parameter Entities where that change can be examined or changed.
Who Customizes JATS?
While it does take some XML knowledge and an understanding of DTD structure and Parameter
Entities to maintain a JATS-based tag set, it does not require XML experts. The mechanism
is simple to learn. Organizational customizations are typically maintained by XML-aware
programmers in the journal departments of the organization, but some JATS users have
trained their senior editorial staff to do it, and some hire consultants, particularly
if they do not modify their tag sets very often.
The 10 JATS Tag Sets and two BITS Tag Sets are maintained in a modular library using
the customization mechanism described in this paper. Schema maintenance, documentation,
and testing for the 12 are performed by Mulberry Technologies, Inc. and sponsored
by the National Center for Biotechnology Information of the
US National Library of Medicine.
How to Customize JATS
JATS is distributed in XSD, DTD, and RELAX NG format, but is maintained as a DTD.
The customization mechanism for DTDs is modularity and Parameter Entity overrides.
First, a brief syntax reminder on how Parameter Entities look and work. A Parameter
Entity is composed of:
begins with a percent sign (%) on the front,
then an XML name, and
ends with a semicolon (;).
Some examples include: %list.class;, %emphasis.class;, %abstract-model;,
%title-elements;, and
%abstract-atts;, where the bold words are the Parameter Entity names. Parameter Entities (like programming
language parameters) are established to be overridden. Internal Parameter Entities
are for string replacement. First you define a Parameter Entity, then you may use
it for string substitution as many times as you want. For example:
Define a Parameter Entity list.class
list.class "def-list | list"
Now use the Parameter Entity %list.class; in a content model
term-list (title?, (%list.class;)+ )
and the parser or other XML processor will see the Parameter Entity as though it were
replaced by its string:
term-list (title, (def-list | list)+ )
Precedence is Paramount in Parameter Entities
If you reference two Parameter Entities of the same name
the first one encountered is used and
all other definitions are ignored.
This is how DTD customization overrides work. External Parameter Entities allow DTDs
to be modularized. You define Parameter Entities in your-tag-set-specific customization
modules that can be called in first and override the same-named Parameter Entities
in the JATS default modules. For example:
In the your own DTD’s customization module (called in first), list.class is defined as
list.class 'def-list | list | var-list | term-list'
In a JATS default module (called in second), list.class is defined as
%list.class 'def-list | list'
So the operational value of %list.class; in your customized tag set is
'def-list | list | var-list | term-list'
Parameter Entities as Far as the Eye Can See!
JATS DTDs define parameter entities for almost everything defined in the DTD modules:
Almost all content models
%element-name-model; for element content (%abstract-model;)
%element-name-elements; to be combined with #PCDATA
for elements with mixed content models (%article-title-elements;)
All attribute lists (%element-name-atts;)
Lots of element classes (logical grouping of elements, such as
%lists.class;, %citation.class;, %person-name.class;)
Many element mixes, commonly occurring functional or structural groupings of elements
(%para-level;)
Lists of attribute values
All those Parameter Entities allow almost anything to in a JATS Tag Set to be changed very easily! Why not everything? Why
aren’t all content models represented as Parameter Entities? There are only a very
few of the #PCDATA-only models that are not Parameter Entities, and these are typically models for identifiers.
The lack of a Parameter Entity is our subtle way of saying Please don’t redefine this model.. Not providing the Parameter Entity makes it much more difficult to modify that portion
of the tag set. An expert could probably make that modification; a beginner probably
could not. And that is fine!
As a result of all these Parameter Entities, most JATS Element Declarations look like
the one below, with the basic Element Declaration only showing the Parameter Entities,
and all actual models and attribute lists located somewhere above the Element Declaration
in the document.
<!ENTITY % ref-list-model "some model or other" >
<!ENTITY % ref-list-atts "a list of attributes" >
<!ELEMENT ref-list %ref-list-model; >
<!ATTLIST ref-list
%ref-list-atts; >
While this indirection-on-every-content-model-and-attribute-list style may be slightly
harder to learn to read; once you concentrate on the Parameter Entities, rather than the Element Declarations,
the expressive power of this approach becomes clear. All these Parameter Entities
make possible nearly infinite customization. They also make it very simple to compare
customizations across related-families of tag sets. Have any of our 27 DTDs changed
the model of the abstract element? Check out all the %abstract-model: Parameter Entities
and find out which ones and what they have changed.
JATS Customization Conventions
Your Tag Set will be easier to construct, maintain, and keep in sync with JATS as
the Suite is revised, if you follow a few simple Parameter Entity construction mechanisms.
These conventions make it easier for someone else to read and understand your JATS
customization as they also make it easier for you to find and make single-circumstance
changes. In brief:
Content model choices (OR bar) should never contain
element names:
term-list (list | def-list) BAD EXAMPLE
Instead, content models should name Parameter Entity classes or mixes of elements:
term-list ((%list.class;)+ )
title (#PCDATA | %my-inline.mix;)*
Even choices inside sequences should use Parameter Entities instead of element names:
body ((%para-level;)*, (%sec-level;)*, sig-block?)
JATS is a Modular DTD System
Each JATS DTD is built using a library of DTD-fragment modules. Each module contains
definitions for a group of related declarations. Each DTD is free to call in only
the library modules it needs. If your target tag set has no bibliographic references,
then leave out the modules defining those elements. That said, most JATS-based DTDs
use most of the base JATS modules, adding a few modules of their own.
Three non-JATS exemplar models are shown below:
Modules are called into a DTD using External Parameter Entities, one external entity
per module called. This is string substitution on a grand scale, where the entire
contents of a module file can be called into the DTD at once.
Best practice in using a modular DTD library is to set up a system catalog and use
it to provide access to the modules of the Suite. Catalogs provide an indirection
mechanism to associates an established identifier with a URI — typically a file name.
In this example, I will be using an OASIS XML Catalog specification catalog, but any
method of catalog indirection would work.
OASIS XML catalogs use formal public identifiers (fpis) and establish one fpi and
its filename (URI) equivalent for each module used in the tag set. So my customization
process would be:
Assign each DTD-specific file module a formal public identifier (fpi)
Name each DTD-specific files in a catalog entry (an OASIS XML catalog shown)
in the same catalog that already includes all the JATS modules.
Then reference (call) your module using a Parameter Entity in your DTD. This places
the entire file logically into the DTD at the point of reference.
%the-custom-models.ent;
JATS-based Custom DTD Assembled from Modules
There is one very important rule when making new customizations using this method.
Never, ever copy a JATS module and edit it. Or, more succinctly:
Never change the published JATS modules — Override them.
Each new document type is a new DTD that defines, in its DTD-specific modules, the
Parameter Entities that override the JATS default Parameter Entities. Each DTD may
use as many of the JATS base modules (or as few) as necessary. The DTD module typically
defines only the top-level element (document element)
and (maybe) its immediate children, and calls in all the Suite modules it needs to
define the rest of the elements. New (non-JATS) elements (Learning Objectives, parts
lists, product codes, taxonomic descriptions) are typically defined in their own modules
or in the DTD module. In this way, all customizations are isolated in a few DTD-specific modules. When a new version of JATS is issued, the user can just plug in the new
JATS modules and use the new version, unless they have overridden something that
has changed in the new version, and the organization wants those new JATS changes.
In such as case, add the contents of the new JATS Parameter Entit(ies) to your custom
override Parameter Entit(ies).
Typical Scenario for a New JATS Customization
A JATS user wanting to make a new document type (for Reports for example), typically
creates at least five new modules and adds them to a local JATS library. And, of course,
adds the names of all these modules to a catalog.
Table I
Report DTD
Names and models the new top-level element (<report>)and calls in all the needed modules,
in order, first Report-DTD-specific modules and then JATS modules
Report-custom-modules
Names any new Report-DTD-specific modules created just for this DTD
Report-custom-classes
Overrides for JATS element collections (classes)
Report-custom-mixes
Override for JATS structures (mixes)
Report-custom-models
Overrides for JATS content models and attribute lists
Any new element modules
As many modules as necessary to define Report-DTD-specific new elements (Taxonomic
material, parts list, whatever is entirely new and cannot be found in ordinary JATS)
Since the DTD module must perform specific functions in a specific sequence, the structure
of a customized JATS-based DTD typically looks like the following. This sequence
is important because parameter Entities must be defined before they are used, and
the first declaration found is the definitive definition.
The JATS-based DTD module:
names and describes itself and its purpose in an initial comment,
names the DTD-specific Module of Modules, then invokes it,
names the JATS Suite Module of Modules, then invokes it, and then
calls in the rest of the modules:
invokes the DTD-specific class and mix override modules and the default classes and
mixes they override,
invokes the Model customization module that overrides the element modules,
invokes all the necessary element modules, and then
defines the top-level element and its components (as needed).
As an example, here is a Report-DTD customized DTD fragment
Modules have been named with external Parameter Entities.
The DTD fragment call in all Report-DTD-specific modules, followed by the JATS modules
being overridden.
Report-DTD-specific Parameter Entities override JATS Parameter Entities. Anything
the new Report DTD did not change is left alone; it is standard JATS.
%Report-custom-modules.ent;
%JATS-modules.ent;
%Report-custom-classes.ent;
%JATS-default-classes.ent;
%Report-custom-mixes.ent;
%JATS-default-mixes.ent;
%Report-custom-models.ent;
%JATS-common.ent;
%JATS-articlemeta.ent;
%JATS-backmatter.ent;
%Report-new-stuff.ent;
and so on for the other modules the Report DTD needs
Now, with all the component modules in place, the DTD can define the new document
element (<report>) and maybe new children such as <report-metadata> and <report-body>.
Caution: you may define a lot you never use by including a module. If you include
a whole module (it defines 20 elements) and your tag set only uses one element from
the module, you have just defined a lot of elements that you will not use. So what?
This is not a problem, so do not be concerned that some tools will warn you about
these orphan declarations. The condition defined but not used is legal in XML and very handy in modular DTDs!
3. JATS Interchange and Interoperability
JATS was written for the interchange of articles. The expectation in the early years was that each publisher/archive/library
would use their own schema (DTD/XSD/RNG) to produce or store journal articles, but
then they needed to get their articles into the same form of XML:
to put information into a single repository,
to exchange information with each other,
to sell/display items on the same hosting platform,
so vendors do not need to learn another unique tag set, and
to share tools and resources.
Therefore, JATS was written as a conversion target and storage format, designed to
maximize the number of journal styles and formats that could be usefully tagged as
JATS XML. The idea, particularly behind the Archiving Tag Set, was to enable translation
into JATS from as many XML journal tag sets and precursor word-processing formats
as possible, without semantic loss and with minimal structural impact (rearrangement).
The result is journal article tag sets that are very functional for archives and libraries
and (with JATS Publishing) for publishing production, vendors, and web-hosters.
By design, JATS is descriptive rather than prescriptive, enabling rather than enforcing.
JATS allows multiple ways to tag the same structure, how-to-tag and how-much-to-tag
are assumed to be editorial decisions not a JATS-level requirements. In the default
JATS Tag Sets, little is required, but much is possible. For example JATS allows all
of the following for bibliographic reference tagging:
very granular markup inside references, e.g., tagging over 40 mostly semantic elements,
very little markup inside references, e.g., tagging only face markup,
no markup at all inside references, just a reference start and reference end, and
end notes mixed in with references in a bibliography (or prohibited from being intermixed.)
JATS-based tag sets can record very detailed metadata, but are not required to do
so. For example, a JATS Tag Set could record:
unique identifiers for authors (ORCID),
unique identifiers for institutions (RINGGOlD, Crossref Open Funder Registry),
IDs on all elements, not required, except for internal link targets,
detailed publication metadata (e.g., events and publishing history),
detailed funding reporting (with ability to map to Crossref Open Funder Registry)
linking terminology to tie terms in the text to ontologies/taxonomies,
numeration (i.e., numbers for list items or sections) present in the XML (or not),
and
Adding Rules for Interchange
As the preceding paragraphs illustrate, the JATS-XML produced by one organization
may be significantly different from the JATS-XML produced by a different organization, even when they are using
the same flavor of the same base tag set. XML-to-XML transformation may be necessary
for seamless interchange and integration.
Specific best practice recommendations for JATS are outside the scope of the ANSI/NISO
standard and even outside the scope of the non-normative DTD, XSD, RELAXNG schemas,
and the Tag Library documentation. Basic interoperability lies in being a related
family of specifications, with changes isolated so differences can be easily determined
and resolved if possible.
Detailed recommendations for interchange, how to use JATS in the same way, and what
is best tagging practice are being developed by groups such as Pub Med Central and
JATS4R (JATS for Reuse).
PubMed Central Tagging Guidelines
The PMC/NLM/NIH Guidelines describe PubMed Central’s preferred XML tagging style for
submitting articles to PubMed Central. PMC accepts article submissions in the NLM Journal Publishing DTD or the NISO JATS Journal Publishing DTD. This site includes links to tools and resources (such as a style checker, fully-tagged
samples, fully-tagged citations, etc.) as well as an email distribution subscription
list for updates to the guidelines.
JATS4R (JATS for Reuse) Best Practice Recommendations
JATS4R is a NISO-sponsored industry consortium of JATS users who develop Best Practice
Recommendations for JATS. In their own words, JATS4R is an inclusive group of publishers, vendors, and other interested organizations who
use the NISO Journal Article Tag Suite (JATS) XML standard..The organization is based on the principle that JATS is broad and inclusive, but
reuse and interchange of JATS-tagged documents would be facilitated if JATS users
agreed on a single best practice for tagging (or at worst a small number of variations).
Therefore the JATS4R active working subgroups are devoted to optimizing the reusability of scholarly content by developing best-practice recommendations
for tagging content in JATS XML.
4. Conclusion: JATS (The Suite) is a “Build-a-Model” Kit
The JATS modules can be thought of as a giant "build-your-own-JATS-model" kit. The
input to the process is the JATS Suite library and your-customized new DTD and customized
supporting DTD-fragment modules. The JATS Suite modules will provide:
most of the structural components for your new model (paragraphs, lists, footnotes,
sections, figures, tables, etc.),
most of the inline components for your new model (face markup, inline math, abbreviation,
custom-styling, etc.), and
publishing metadata objects (authors and affiliations, ISBN, copyright and licenses,
funding information, etc.)
Parameter Entities are used to make new tag sets. Your custom DTD file and its modules
provide:
the top-level structure (your document element),
all user-specific metadata and semantic elements,
any all-new structures, and
any changes you want to JATS default models or attribute lists.
The output of the process is a customized semantically fit-to-purpose DTD or several
(one document type per tag set).
Once your tag-set-specific DTD has been created, it can be easily transformed into
to an XSD Schema or a RELAX NG schema as needed, to make a JATS Tag Set all tools
can use.
Appendix A. The JATS Compatibility Meta-Model
Many people who create vocabularies based on JATS assume that documents tagged according
to their new JATS-based models will be compatible with existing JATS documents and
the systems that manipulate them. This is not necessarily the case.
Building JATS-Compatible Vocabularies* provides guidance for customizing the JATS Tag Suite in ways that are:
predictable (know where to find information),
consistent (no semantic surprises), and
generally non-destructive (purpose of the individual elements is not compromised).
The goal of this document is to enable creators and maintainers of JATS-based document
models to know when the extensions they make to JATS models are JATS-compatible, and
to suggest ways in which they can achieve their modeling goals in a JATS-compatible
way.
Tagging consistency and best practices in document creation are outside the scope
of this document.
JATS compatibility is evaluated on the element/attribute and tag set levels. A structure
in a JATS-based model that uses an existing JATS name must have the same semantic
meaning as in JATS. Additionally, there are a number of “Properties” that a structure
might or might not have. For example: an element might or might not be allowed to
contain character data; an attribute might or might not be an XML ID or an XML IDREF;
a structure might or might not have a recursive section-like model.
An element or attribute defined by a JATS extension is “JATS-compatible” if it has
the same semantic meaning as the object of the same name in JATS and the object matches
the corresponding JATS object on all of the Compatibility Properties identified in
this document. A tag set that is an extension of JATS is “JATS-compatible” if all
of the shared elements are JATS-compatible.
This document is intended to help developers of new JATS-related XML vocabularies
create those vocabularies in ways that usefully extend the reach of the JATS vocabularies
without conflicting with current JATS vocabularies. It describes those things that
must not change about a model for it to be consistent with the JATS models and some
best practices to follow when extending JATS.
The highpoints of the compatibility model are:
Table II
Rule
Implications of the Rule
Respect the Semantic
The first and most important rule of customizing JATS is to respect the semantics
of the existing elements and attributes. Use a named structure to mean the same thing
JATS means by that named structure. If you change the structure’s meaning, give it
a new name.
Linking Direction
Links in JATS go from the many to the one, not the other way around. So, a reference
to a section, table, figure, or equation uses an IDREF to point to the ID of the section,
table, figure, or equation [Note: Links in both directions in a user interface can
be built from one-way ID/IDREF attributes in the XML files.]
Use Recursive Section Models
JATS uses a recursive Section Model. Sections contain titles, paragraph-level things,
and (optionally) sections. So Section levels are computed by context, not indicated
in the XML.
Subsetting
A proper subset of any content model or model of attribute values is always allowed.
Elements may be removed from an “or” group with many elements. Elements that are required
in JATS may be removed or made optional. Values may be removed from the list of specified
values of an attribute. Attributes may be removed from elements.
Model as Element or Attribute?
Model it the way JATS does, or use a different name (make your own).
Whitespace Handling
A compatible tag set extension must not change the whitespace handling type for any existing element:
Element-like whitespace (element contains only elements)
Data-like whitespace (element contains characters and or mixed content)
Preserved whitespace (model specifies whitespace should be preserved)
Alternatives Elements
An alternatives element is a wrapper that says all of these things are equivalent (name-alternatives, aff-alternatives, etc.) For display or counting, you typically
want to use only one of the (possibly many) supplied versions, or you may want to
treat one as the preferred version, while any others are synonyms.
Appendix B. Sample JATS Customizations
The next few sections will show simple customizations of the
JATS, illustrating how to:
remove a block element,
remove an inline element,
add a new inline element,
add a new block element,
constrain an attribute value,
constrain the data type of an element,
constrain the content model of a block element, and
define a new top-level document type.
How to Remove a Block Element in a Class (choice)
Here the JATS users decides there are no poems in their corpus, so they want to delete
all mentions of the <verse-group> element.
Find-in-files shows all places verse-group is used
In the user’s own customization modules, they redefine each class, mix, or model Parameter
Entity that includes verse-group
Here is the Report-DTD Parameter Entity (defined first) that will override the JATS default:
Each attribute list is defined by Parameter Entity (Inside that Parameter Entity,
each attribute and its values are defined)
The attribute values are also defined in a Parameter Entity
<!ENTITY % person-group-types
"author | compiler | curator | director |
editor | inventor" >
... used later in the attribute lists:
person-group-type (%person-group-types;) #IMPLIED
Redefine the value-defining Parameter Entity and call it (before the default)
<!ENTITY % person-group-types
"author | compiler | curator | director |
editor | illustrator | inventor" >
... used later in the attribute lists:
person-group-type (%person-group-types;) #IMPLIED
Notice no attribute list needs to change
How to Constrain an Attribute Value
(Case 2: Attribute is a type like CDATA, not a defined list)
Each attribute list defined by Parameter Entity. (Inside that Parameter Entity, each
attribute and its values are defined)
Constrain the data type of an element using Schematron (Using XSLT 2.0+)
<rule context="event-desc[uri]">
<assert test="uri castable as xs:anyURI">Element <uri> is not
of type xs:anyURI </assert>
</rule>
Constrain the data type of an attribute using Schematron (Using XSLT 2.0+)
<rule context="pub-date[@iso-8601-date]">
<assert
test="normalize-space(@iso-8601-date)">Empty @iso-8601-date
attribute</assert>
</rule
<rule
<assert
test="@iso-8601-date castable as xs:date">The attribute
@iso-8601-date is not in ISO date format</assert>
</rule>
How to Constrain the Content of a Block Element
Block element models come in three types:
Choice groups of other elements
Sequences of other elements (include ordered choices and elements)
Data characters, with or without intermixed elements
How to Constrain the Content of a Block Element (sequence)
Parameter Entity for content-model named %element-name-model;
To take elements out, edit the class Parameter Entities or the -elements Parameter Entity
How to Constrain the Content of an Inline Element
Most inline elements are data characters or mixed content
Parameter Entity for mixed elements named %element-name-elements;
%element-name-elements; may be empty ("")
Model is #PCDATA plus any classes naming the mixed-in elements
<!-- Within a citation, the title of a
cited data source such as a dataset or spreadsheet. -->
<!ENTITY % data-title-elements
"| %address-link.class; | %emphasis.class; |
%phrase-content.class;" >
<!ELEMENT data-title
(#PCDATA %data-title-elements;)* >
In your modules, Write an override Parameter Entity
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Article Authoring Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/articleauthoring/tag-library/1.2/.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Archiving and Interchange Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/archiving/tag-library/1.2/.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Article Tag Suite. https://jats.nlm.nih.gov/. Splash page.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Publishing Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/publishing/tag-library/1.2/.
Rosenblum, Bruce, and Irina Golfman. E-Journal Archival DTD Feasibility Study. December 5, 2001. https://old.diglib.org/preserve/hadtdfs.pdf. Newton: MA: Inera Incorporated. Prepared for the Harvard University Library Office
for Information Systems E-Journal Archiving Project.
Schwarzman, Alexander B. JATS Subset and Schematron: Achieving the Right Balance. Presented at JATS-Con 2017, Bethesda, MD, April 25-26, 2017. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017. Bethesda (MD): National Center for Biotechnology Information (US), 2017. https://www.ncbi.nlm.nih.gov/books/NBK425543/.
Schwarzman, Alexander B. Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate
Layer Validation. Presented at JATS-Con 2010, Bethesda, MD, November 1-2, 2010. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010. Bethesda (MD): National Center for Biotechnology Information (US), 2010. https://www.ncbi.nlm.nih.gov/books/NBK47084/.
Usdin, B. Tommie, and Deborah Aleyne Lapeyre. JATS/BITS/NISO STS. Presented at Symposium on Markup Vocabulary Ecosystems, Washington, DC, July 30,
2018. In Proceedings of the Symposium on Markup Vocabulary Ecosystems. Balisage Series on Markup Technologies, vol. 22 (2018). doi:https://doi.org/10.4242/BalisageVol22.Usdin01.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Article Authoring Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/articleauthoring/tag-library/1.2/.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Archiving and Interchange Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/archiving/tag-library/1.2/.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Article Tag Suite. https://jats.nlm.nih.gov/. Splash page.
National Center for Biotechnology Information (NCBI), National Library of Medicine
(NLM). Journal Publishing Tag Library, NISO JATS Version 1.2 (ANSI/NISO Z39.96-2019). February 2019. https://jats.nlm.nih.gov/publishing/tag-library/1.2/.
Rosenblum, Bruce, and Irina Golfman. E-Journal Archival DTD Feasibility Study. December 5, 2001. https://old.diglib.org/preserve/hadtdfs.pdf. Newton: MA: Inera Incorporated. Prepared for the Harvard University Library Office
for Information Systems E-Journal Archiving Project.
Schwarzman, Alexander B. JATS Subset and Schematron: Achieving the Right Balance. Presented at JATS-Con 2017, Bethesda, MD, April 25-26, 2017. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2017. Bethesda (MD): National Center for Biotechnology Information (US), 2017. https://www.ncbi.nlm.nih.gov/books/NBK425543/.
Schwarzman, Alexander B. Superset Me—Not: Why the Journal Publishing Tag Set Is Sufficient if You Use Appropriate
Layer Validation. Presented at JATS-Con 2010, Bethesda, MD, November 1-2, 2010. In Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010. Bethesda (MD): National Center for Biotechnology Information (US), 2010. https://www.ncbi.nlm.nih.gov/books/NBK47084/.
Usdin, B. Tommie, and Deborah Aleyne Lapeyre. JATS/BITS/NISO STS. Presented at Symposium on Markup Vocabulary Ecosystems, Washington, DC, July 30,
2018. In Proceedings of the Symposium on Markup Vocabulary Ecosystems. Balisage Series on Markup Technologies, vol. 22 (2018). doi:https://doi.org/10.4242/BalisageVol22.Usdin01.
JATS; Journal Article Tag Suite; Journal Archiving and Interchange Tag Set; Journal Publishing Tag Set; Journal Authoring Tag Set; BITS; Book Interchange Tag Suite; BITS; NISO STS; NISO Standards Tag Suite; Parameter Entities; modular DTD; Customization modules; classes and mixes customization, JATS; ANSI NISO Z39.96-2015 Journal Article Tag Suite; ANSI/NISO Z39.102-2017