Beck, Jeffrey. “JATS Superhighway: Onramp to a Backward-incompatible Version.” Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Beck01.
Balisage: The Markup Conference 2022 August 1 - 5, 2022
Balisage Paper: JATS Superhighway
Onramp to a Backward-incompatible Version
Jeffrey Beck
National Center for Biotechnology Information, National Library of
Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Jeff Beck is the Program Head for Literature at the National Center for
Biotechnology Information at the US National Library of Medicine. He has been
involved in the PubMed Central project since it began in 2000. He has been
working in print and then electronic journal publishing since the early 1990s.
Currently he is co-chair of the NISO Z39.96 JATS Standing Committee and is a
BELS-certified Editor in the Life Sciences.
Author's contribution to the Work was done as part of the Author's official duties
as an NIH employee and is a Work of the United States Government. Therefore, copyright
may not be established in the United States. 17 U.S.C. § 105. If Publisher intends
to disseminate the Work outside the U.S., Publisher may secure copyright to the extent
authorized under the domestic laws of the relevant country, subject to a paid-up,
nonexclusive, irrevocable worldwide license to the United States in such copyrighted
work to reproduce, prepare derivative works, distribute copies to the public and perform
publicly and display publicly the work, and to permit others to do so.
Abstract
Tag sets change over time. Tag set designers manage a complex system where
everything is connected to everything else and new user requirements continue to
surface. Tag set users manage complex systems to create, manage, and archive
documents. Users strongly resist backward-incompatible change, so as JATS has grown
we have made compromises in the design to meet new requirements while maintaining
backward-compatability. We think it is time to consolidate redundant models, remove
deprecated items, and generally reduce confusion. Can we guide users towards a new,
backwards incompatible version in a way that they'll find palatable?
We have a plan. We're going to extend the JATS 1.x schema so that it contains the
new, 2.0 models in addition to the old models. Then we'll make an "Onramp" subset
of
1.x that has the deprecated items removed. Documents valid against the onramp subset
of 1.x will also be valid against 2.0.
JATS is a NISO Standard that describes XML elements and attributes and three models
for defining journal articles. The work started in 2002 as an extension of the PubMed
Central DTD that became the "NLM DTDs".
Backward Compatibility
As we were planning to move the NLM DTDs to NISO - to become NISO JATS - in 2007,
the
NLM DTD Steering committee decided to do a "cleanup" version of the article models
to
fix all of the infelicities we had introduced since 2003 by keeping the models
backward-compatible.
For example, we had introduced the <permissions> element in version 2.1 (2005) as
a
container for copyright and license information. In earlier versions,
<copyright-statement> and <license> were available in <article-meta>. In
version 2.1, they were available within the new <permissions> wrapper but we could
not remove them from <article-meta> because of backward-compatibility. When we
released NLM DTD 3.0 in 2008, all license and copyright elements had to be enclosed
in a
<permissions> element.
The backward-incompatible release of NLM 3.0 was a game-changer for many users, and
some PMC submitters are still using NLM DTD version 2.3 or earlier to submit their
articles. NISO Z39.96-1012 version 1 became official in August 2012. It, and all
following JATS versions, was backward compatible with NLM DTD 3.0.
Since 2012, as the JATS Standing Committe considered requests for enhancements, we
sometimes found that design decisions that were appropriate in the past are non-optimal
given current requirements. In the normal maintenance mode, we find ourselves making
compromises in the tag set in order to meet new requirements while maintaining
backwards-compatibility.
By 2019, we believed that there were enough of these compromises in JATS 1.2 that
it
was time to consider a non-backwards-compatible version in which, for example,
structures that have been deprecated are removed, redundant models are consolidated,
some confusion is eliminated, and some structures are tidied up to strengthen JATS
for
the future.
With the experice we have had with the strong reaction to the backward-incompatible
release in the past, the JATS SC is trying to smooth the way to a JATS 2.0 version.
We
have been updating and maintainng the JATS 1.x line while we address the larger "2.0
issues", and have released a JATS version 1.3 in 2019 and are working on addressing
public comments for JATS 1.4. Many of the decisions that are being made on the "2.0
issues" can be applied in the 1.x line as long as newly unpopular elements and
structures are decprecated.
Subsetting and the "JATS Lite" Movement
The article models described in JATS have always been complex, with multiple ways
to
tag structures even within the more prescriptive "Journal Publishing" model. May users
develop subsets of the models by writing their own more restrictive schemas to control
usage within their organization (lapeyre). Other practically subset
the models by controlling usage with a validation layer on top of DTD validation with
something like Schematron or XSLT. For PubMed Central, we use a subset of JATS defined
by the PMC Tagging Guidelines (pmc1) that we control with the PMC
Style Checker (pmc2, beck1).
We hear discussion of an official JATS Lite version, similar to the TEI Lite version
that is a "customization of the TEI tagset, designed to meet '90% of the needs of
90% of
the TEI user community'". So far, the JATS Standing Committee has not worked on
developing a JATS Lite version, but there is a lot of interest in the community,
including a presentaiton at Balisate 2021 that tried to define the elements necessary
to
be in a Lite Subset (imsieke). So far we can get agreement that an
official Lite Subset would be a good idea, but we can't get agreement on what should
be
excluded.
The Swelling Tagset
Imagine a Tag Set in version 1.0 (see Fig. 1). It has the elements and attributes
defined in it. As time goes by and additions are requested, if the updates are made
to
be backward-compatible, the Tag Set will grow (see Fig. 2). New elements, attributes,
and structures are added, but nothing ever gets take away.
Nothing gets taken away, not because the old structures should not be used, but
because of the fear of breaking backward compatibility and the effect it will have
on
the update of a new version.
An Interesting Way Forward
Currently the JATS Standing Committee is working on the "2.0 list", which is
revisiting existing structures and coming up wiht solutions like developing a reasonable
way to represent multi-language articles. We are also working through user comments
that
are submitted through the NISO Public Comment form to keep the JATS 1.x line meeting
the
users' needs.
With the experience we have had with the slow uptake by some users of the last
backward-incompatible version over a decade ago, we have been wary of making such
deep
changes to the article models although we are sure that the 2.0 structures will be
"better."
Tommie Usdie made an intresting proposal on one of our JATS Standing Committee calls
as a way to make the new 2.- structures less frightening to new users. Most of the
new
things that we are adding - like all of the new elements for multi-language articles
-
can be added into the 1.x line to get users familiar with them.
This will be just like all previous JATS updates - adding new structures to the
existing, swelling Tag Set. As a way to ease the community into a JATS 2.0 version
using
the newer structures, once the 2.0 models in place in the 1.x line, we will make a
Subset of the 1.x schema with all deprecated elements removed. This means that any
instances valid to this Subset are valid against 1.x but will also be working toward
the
2.0 models.
The JATS 1.x release will be backward-compatible with the current JATS line. If users
start using this "Onramp" subset, they will get the benefit of the newer structures.
The
"Onramp" schema will not be backward-compatible with the current JATS line, but any
instances written against it will be because there are no elements or attributes in
the
Onramp schema that are not in 1.x.
Conclusion
This seems like a reasonable way to get users familiar with newer structures that
are
being added in 2.0 without forcing a wholesale upgrade of systems. As Wendell Piez
pointed out when this was presentat at JATS-Con this year (usdin), the
difficult thing is going to be how to name it. It is not just another schema in the
1.x
line; that one is as bloated with old and new structures as you would expect.
Acknowledgments
This work was supported by the National Center for Biotechnology Information of the
National Library of Medicine (NLM), National Institutes of Health
References
[lapeyre] Lapeyre DA. “Why Create a Subset of a Public Tag Set.” In:
Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda
(MD): National Center for Biotechnology Information (US); 2010. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK47099/
[beck1] Beck, Jeffrey D. “How many hamsters does it take? Under the hood
at PMC.” Presented at Balisage: The Markup Conference 2017, Washington, DC, August
1 -
4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:https://doi.org/10.4242/BalisageVol19.Beck01.
[imsieke] Imsieke, Gerrit, and Nina Linn Reinhardt. “JATS Blue Lite: The
Quest for a Compact Consensus Customization.” Presented at Balisage: The Markup
Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021).
https://doi.org/10.4242/BalisageVol26.Imsieke01.
[usdin] Usdin, Tommie. “Thinking about a Convenience Subset of JATS.”
JATS-Con Open Session. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings
2022 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US);
2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK580693/
Lapeyre DA. “Why Create a Subset of a Public Tag Set.” In:
Journal Article Tag Suite Conference (JATS-Con) Proceedings 2010 [Internet]. Bethesda
(MD): National Center for Biotechnology Information (US); 2010. Available from:
https://www.ncbi.nlm.nih.gov/books/NBK47099/
Beck, Jeffrey D. “How many hamsters does it take? Under the hood
at PMC.” Presented at Balisage: The Markup Conference 2017, Washington, DC, August
1 -
4, 2017. In Proceedings of Balisage: The Markup Conference 2017. Balisage Series on Markup Technologies, vol. 19 (2017). doi:https://doi.org/10.4242/BalisageVol19.Beck01.
Imsieke, Gerrit, and Nina Linn Reinhardt. “JATS Blue Lite: The
Quest for a Compact Consensus Customization.” Presented at Balisage: The Markup
Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021).
https://doi.org/10.4242/BalisageVol26.Imsieke01.
Usdin, Tommie. “Thinking about a Convenience Subset of JATS.”
JATS-Con Open Session. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings
2022 [Internet]. Bethesda (MD): National Center for Biotechnology Information (US);
2022. Available from: https://www.ncbi.nlm.nih.gov/books/NBK580693/