How to cite this paper
Beck, Jeffrey. “Rules for the Rulemakers: JATS4R’s Self Guidance on Attributes.” Presented at Balisage: The Markup Conference 2019, Washington, DC, July 30 - August 2, 2019. In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). https://doi.org/10.4242/BalisageVol23.Beck01.
Balisage: The Markup Conference 2019
July 30 - August 2, 2019
Balisage Paper: Rules for the Rulemakers
JATS4R’s Self Guidance on Attributes
Jeffrey Beck
Jeff is a Technical information Specialist at the National Center for
Biotechnology Information at the US National Library of Medicine. He has been
involved in the PubMed Central project since it began in 2000. He has been
working in print and then electronic journal publishing since the early 1990s.
Currently he is co-chair of the NISO Z39.96 JATS Standing Committee and is a
BELS-certified Editor in the Life Sciences.
Author’s contribution to the Work was done as part of the Author’s official duties
as an NIH employee and is a Work of the United States Government. Therefore, copyright
may not be established in the United States. 17 U.S.C. § 105. If Publisher intends
to disseminate the Work outside the U.S., Publisher may secure copyright to the extent
authorized under the domestic laws of the relevant country, subject to a paid-up,
nonexclusive, irrevocable worldwide license to the United States in such copyrighted
work to reproduce, prepare derivative works, distribute copies to the public and perform
publicly and display publicly the work, and to permit others to do so.
Abstract
Maximal flexibility of rules, or ease of reuse — choose one. The tighter the
rules, the more consistent documents will be and the easier it will be to reuse
them, but only if the rules are reasonable enough to be adopted. (If all the data
creators ignore the rules, reuse doesn’t get easier.) JATS4R (JATS for Reuse) is a
NISO working group devoted to optimizing the reusability of scholarly content by
developing best-practice recommendations for tagging content in JATS XML. The group
has devoted particular attention to the flexibility/reuse tradeoff for rules on
attribute use and controlled values, and we eventually decided that we needed some
rules for ourselves, on how to write rules for attributes in our recommendations.
In
the process of developing our guidance document for writing rules for attribute
values in our recommendations, we learned (or at least articulated) some things
along the way.
Table of Contents
- Research
- Types of Attribute Recommendations
-
- Identification
- Prescribed Usage
- Controlled Values
- Suggested Values
- Early Version Workaround
- Testing the Recommendations
-
- Identification
- Prescribed Usage
- Controlled and Suggested Values
- Clean Value Test
- JATS4R Will
- Appendix A. Survey of Attributes in Recommendations
-
- Article Publication and History Dates
- Authors and Affiliations
- General Recommendations
- Citations
- Clinical Trials
- Disclosure of conflicts of interest/competing interests
- Data citations
- Data availability statements
- Display Objects
- Math Recommendations
- Permissions
At the end of 2018, the JATS4R Steering Committee was reviewing the Roadmap to plan
our
work when we realized that several of the topics on the list to be discussed were
related to
defining and controlling a list of values for attributes.
One example was to define a list of approved values for @fn-type. We knew that we
had been
making recommendations for attribute usage and had defined some values, but we had
no
understanding of how and when these recommendations were created. Restricting attribute
values to a controlled list has a great positive effect on the reusefulness of XML
because
future users know what to expect and have an understanding of what the values mean.
But controlled lists of attribute values can have a negative effect on adoption. A
user
who has a need for a value that is not on the controlled list has two choices: petition
the
restricting agent to add her value to the list of acceptable values or ignore the
recommendation. Restricting agents think that they can respond to user’s requests
to keep
the controlled values lists up to date, but in reality the pace of work for any restricting
agency cannot keep up with XML user’s needs who are under real publication deadlines.
This does not mean that we should not write any controlled value lists into JATS4R
recommendations, but we should be aware of the costs and tradeoffs of doing so.
Research
By early 2019, JATS4R had published eleven recommendations. We reviewed these published
recommendations to get an idea of what guidance we had already given. The results
of
this research are presented in the Appendix. Reviewing the current recommendations,
we
were able to classify the attribute rules we had written into:
-
Identification: Define the
attribute:value combination as an Identifier for an object type.
-
Prescribed Usage: Prescribe how an
attribute should be used in a given circumstance.
-
Prescribe which values may/must be used:
-
Controlled Values: Values
must be from a list, either defined in the Recommendation or
from an Outside Authority.
-
Suggested Values: Values
should be from a list, either defined in the Recommendation or
from an Outside Authority.
Types of Attribute Recommendations
Identification
This is a attribute:value pair that will identify the object as a specific
thing. This is important to JATS4R because we can apply specific tests to general
JATS objects if we know what the object is.
For example: If we can identify a citation as a
data citation (with @publication-type="data"), then we can apply tests for rules
specific to data citations to the citation element and all of its descendants if
necessary. Is there an <article-title> but no <data-title>?
Testing: There is no way to test whether an
Identification attribute is set appropriately. We will have to assume that the
identification is correct. Then we can make other tests related to that specific
object. We can also set values of other attributes related to that object, given the
knowledge of that object’s identity. For example, if a <contrib> is an author
(@contrib-type="author"), we should be able to control values in another attribute
on that <contrib>, but not other <contrib>s.
Prescribed Usage
This is a recommendation that describes how, when, and for what an attribute may,
should, or must be used. The recommendation for usage of the attribute is separate
from any values that we suggest or require be used. Prescribed Usage should not be
conflated with value lists, although most of the time Prescribed Usage is paired
with either a Controlled Values or Suggested Values list.
For example: The recommendation for
<institution-id> says to use “@institution-id-type to indicate the type of ID;
e.g., ‘orcid’ or ‘ringgold’”.
Controlled Values
A controlled value list describes an attribute whose content must be from a list,
either defined in the Recommendation or from an Outside Authority. An attribute that
has a value not in the list would be an ERROR. We can write rules and apply
validations to controlled value lists on general attributes for given circumstances
once we have identified the element with an Identification attribute.
For example: @date-type on <pub-date>: Use
value “original-publication” to indicate that the date is the original date of
publication, and “update” for dates that represent published updates to the
publication.
Suggested Values
A suggested value list describes an attribute whose content should be from a list,
either defined in the Recommendation or from an Outside Authority.
For example: Use “journal” or “book” as the
value of @publication-type to indicate that the citation is to a journal or book,
respectively. Other examples are: “letter”, “review”, “patent”, “report”,
“standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from
the JATS guidelines. This is not a limited field so others can be used as
appropriate, for example, “website”. However, in the interests of standardisation,
JATS4R requests publishers to contact JATS4R if using additional values so we can
create a definitive list and reduce variation across XML sources. “Other” is not a
preferred value.
Early Version Workaround
This is not a type of attribute recommendation, but it is worth mentioning here.
This is the “Early Version Workaround” where we describe a way to meet the
Recommendation with an earlier version of the DTD that does not have the attribute
that we recommend as the best solution. An example of this is from Article and
Publication Dates: “use article/@specific-use for article version values from NISO
JAV for JATS 1.1 and earlier schemas”.
Testing the Recommendations
Identification
Attributes used for Identification are not tested. Instead, then inform the
validator WHAT an object is so that tests specific to that object type can be
applied. But we can run a Clean Value Test on the values of attributes that we have
defined as Identification.
Example: We can’t test to see whether every
<sec> in a document has a type that conforms to a list of values, but if a user
has <sec sec-type="data_availability_statement">, we could have an error. In
testing, we need to anticipate possible things that people would do which are close
to the value we want.
Prescribed Usage
Tests for attribute usage will be situational and may be based on identifying the
object that we are in. Attributes will be tested:
-
That they exist if they are defined as REQUIRED (these may be
situational based on any property that can be found in the article XML
type of object, existence of other attributes:attribute values);
-
That they don’t exist if they are declared DO NOT USE (I don’t think
we have any of these);
-
Values can be tested if the Prescribed Usage is paired with a
Controlled Values or Suggested Values list.
Controlled and Suggested Values
At first it seems that the difference in tests for a Controlled Value list and a
Suggested Value list would be ERROR vs. WARNING. But it is not that simple. We have
already defined Controlled Value lists and Suggested Value lists in our
Recommendations. We have examples that have been defined in the Recommendation
itself and examples that refer to an outside controlled list like the JAV. The
proper values will be circumstantial in many cases, so the tests are not as simple
as “on the good list” or “not on the good list”.
We need to be aware of how our attribute recommendations can be tested so that
they can be most effective.
Should out-of-list values be errors or warnings? That is: Do we allow things in a
JATS4R-compliant article that we have not defined?
We prescribe @sec-type="data-availability" for Data Availability statements, but
do we want to exclude all other possible values of @sec-type?
We will either need to:
-
Define all values of @sec-type that we expect to see in JATS4R-conforming
articles (the Maloney proposal,
https://github.com/JATS4R/JATS4R-Participant-Hub/issues/118), or
-
Test only “circumstantial” values (it is possible that
@person-group-type may be allowed to have different values if it is in a
Data Citation than if it is in some other citation) or
-
come up with a “Forbidden list” of values related to the ones that we
define. This is the “Clean Value Test”.
Clean Value Test
A Clean Value Test is a way to try to control an attribute value without
restricting that attribute to a controlled set of allowed values. The test involves
thinking up as many ways for the value you want to be written and then explicitly
excluding those. This can be frustrating, require ongoing maintenance, and be wholly
frustrating.
A good example of this is when we are defining values for Identification. We
prescribe @sec-type="data-availability" to identify Data Availability Statements.
We cannot exclude all values for @sec-type except for “data-availability”. Nor can
we provide a list of all “approved” values for @sec-type. Instead we must write an
error for any value that is approaching but not equal to “data-availability”. The
list of values that would generate an ERROR would include but not be limited to:
data_availability, data availability, Data-Availability, data-statement,
dataavailbility.
This is a list that will grow as people find new ways to misrepresent
“data-availability”. The question may come up about “normalizing” the values before
testing, but this would weaken our recommendations because any normalizing we do in
the validator must then be done for any future application that is looking for
“data-availability”.
JATS4R Will
-
Define as many Identification
attribute:values as we can. This allows us to identify a given object in the
XML for testing or for later text mining. Usually these are set using
general attributes like @content-type or @fn-type. We can define the value
we want used, but we cannot exclude any other value. We can check
Identification values with a Clean Value Test.
-
Define Prescribed Usage for attributes.
The existence of attributes under certain circumstances can be tested (with
the aid of Identification attributes) easily. These can be ERRORS or
WARNINGS depending on the cases.
-
Be careful about defining Controlled
Value lists; we should not use them generally. So no rules
like “@fn-type must be one of (corresp | reference | suppdata)”. But we
could make up a rule something like “a footnote referenced from a contrib
with a role of ‘illustrator’ must have a @fn-type from (media | style |
corresp).” Not matching a value in a Controlled Value list (under the
appropriate circumstances) will be an ERROR.
-
Use Suggested Value lists, which are a
little more forgiving. Not matching a value in a Suggested Value list will
be a WARNING. These will be tempting to use when we don’t want to commit to
a decision. I think we should avoid this and define them circumstantially
like the Controlled Value lists. Because there is no ERROR, Suggested Value
lists have no teeth. I suggest that we use them to do two things: strongly
encourage usage to move in a certain way and/or test out values that we may
want to control in a future version of the Recommendation. If this is our
intent, we should list this in the recommendation.
-
Use the following style for attribute recommendations:
-
All attributes we define should follow a style: All letters are lowercase
and where a space to differentiate words would be used in text, a hyphen
(U+002D) is added, for example, “original-publication”.
Appendix A. Survey of Attributes in Recommendations
ID |
Identification |
PU |
Prescribed Usage |
CV |
Controlled Values |
SV |
Suggested Values |
Article Publication and History Dates
https://jats4r.org/article-publication-and-history-dates
-
PU/CV - use /article/@specific-use for article version values from
NISO JAV for JATS 1.1 and earlier schemas.
-
CV - @date-type on <pub-date>: Use value “original-publication” to
indicate that the date is the original date of publication, and “update”
for dates that represent published updates to the publication.
-
PU/CV - Use @date-type on <date> in <event> to indicate what
stage of publication this version was in. Use NISO JAV values.
Authors and Affiliations
https://jats4r.org/authors-and-affiliations
-
CV - @ref-type (on <xref>). When linking a <contrib> to its <aff> use @ref-type=”aff”
on <xref> [[Validator tool result: error if @ref-type on <xref> != “aff” if @rid references
an <aff> element ]]
-
PU - <institution-id>. Capturing the institutional ID is not mandatory at this time.
However, if the publisher does capture it, they should make every effort to ensure
that it is accurate. Use <institution-id> to contain the ID, and @institution-id-type
to indicate the type of ID; e.g., “grid” or “ringgold” the <institution-id> and <institution>
elements within <institution-wrap>.
-
PU/CV - @country. If <country> is used, then @country must also be used and must be
set to the 2-digit country code, as specified in ISO 3166-1 (recommended in the JATS
tag library)binary
-
ID - <contrib>, @contrib-type. Contain each author within a <contrib> element. If
a <contrib> contains an author, then @contrib-type must be set to “author”
-
PU - @corresp. Use the corresp attribute on <contrib>, set to value “yes”, to identify
the corresponding author(s).binary
Citations
https://jats4r.org/citations
-
PU/SV - @publication-type=”…” on <mixed-citation> or <element-citation>. Use “journal”
or “book” as the value of @publication-type to indicate that the citation is to a
journal or book, respectively. Other examples are: “letter”, “review”, “patent”, “report”,
“standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from
the JATS guidelines. This is not a limited field so others can be used as appropriate,
for example, “website”. However, in the interests of standardisation, JATS4R requests
publishers to contact JATS4R if using additional values so we can create a definitive
list and reduce variation across XML sources. “Other” is not a preferred value
-
PU/CV - <person-group> and @person-group-type. Use the <person-group> element to specify
authors and other contributors in a citation. Use the @person-group-type attribute
to specify the role of a contributor, when it is possible to identify them with a
role. A separate <person-group> element should be used for each role. This attribute
has a fixed list of allowed values in the Journal Publishing tag set: all-authors;
assignee; author; compiler; curator; director; editor; guest-editor; inventor; transed;
translator.
-
PU/CV - @pub-id-type on <pub-id>. Use this attribute to specify the type of the identifier.
For example, a DOI would have the @pub-id-type value of “doi”. The value should be
one of the valid values from the list in the Tag Library.
Clinical Trials
https://jats4r.org/clinical-trials
-
PU/CV (Should be ID) - related-object/@content-type. Use the content-type attribute
optionally to indicate which stage of the trial the publication is reporting on. Since
this information is intended for content providers submitting linked clinical trial
information to Crossref, if @content-type is used, its value must be “pre-results”,
“results”, or “post-results”, as defined in the crossref schema. [[Validator result:
If absent, no message. If present, ERROR if values not equal “pre-results” or “results”
or “post-results” ]]
-
PU - related-object/@source-id. The source-id attribute must be used to identify the
clinical trial registry. Crossref curates a list of WHO-approved registries and assigns
them a DOI. Content providers are encouraged to select an appropriate registry from
this list and supply the registry DOI or the WHO registry name as the source-id value.
-
PU/CV - related-object/@source-id-type. The source-id-type attribute must be used
to identify the type of ID provided in @source-id. The value of @source-id-type should
be “crossref-doi” or “registry-name”, as appropriate (see Recommendation 3.)
-
PU/CV - related-object/@document-id-type: The document-id-type attribute is required
and must identify the kind of @document-id. The value must be either “clinical-trial-number”
or “doi”.
Data citations
https://jats4r.org/data-citations
-
ID - @publication-type=”data” on <mixed-citation> or <element-citation>. Use “data”
as the value of @publication-type to indicate that the citation is to a data set,
even if that data set is the entire data repository. binary
Data availability statements
https://jats4r.org/data-availability-statements
-
ID - @sec-type=”data-availability”. Use this attribute on the <sec> containing the
DAS. binary
-
ID - <element-citation> or <mixed-citation>, @publication-type=”data”. Use this attribute
on all <element-citation> or <mixed-citation> elements that contain references to
data.binary
-
PU/CV - @specific-use on <element-citation> or <mixed-citation>. For publishers who
elect to collect such granularity in their workflow, see the table below for four
@specific-use attributes recommended for JATS XML.