How to cite this paper

Beck, Jeffrey. “Rules for the Rulemakers: JATS4R’s Self Guidance on Attributes.” Presented at Balisage: The Markup Conference 2019, Washington, DC, July 30 - August 2, 2019. In Proceedings of Balisage: The Markup Conference 2019. Balisage Series on Markup Technologies, vol. 23 (2019). https://doi.org/10.4242/BalisageVol23.Beck01.

Balisage: The Markup Conference 2019
July 30 - August 2, 2019

Balisage Paper: Rules for the Rulemakers

JATS4R’s Self Guidance on Attributes

Jeffrey Beck

NCBI/NLM/NIH

Jeff is a Technical information Specialist at the National Center for Biotechnology Information at the US National Library of Medicine. He has been involved in the PubMed Central project since it began in 2000. He has been working in print and then electronic journal publishing since the early 1990s. Currently he is co-chair of the NISO Z39.96 JATS Standing Committee and is a BELS-certified Editor in the Life Sciences.

Author’s contribution to the Work was done as part of the Author’s official duties as an NIH employee and is a Work of the United States Government. Therefore, copyright may not be established in the United States. 17 U.S.C. § 105. If Publisher intends to disseminate the Work outside the U.S., Publisher may secure copyright to the extent authorized under the domestic laws of the relevant country, subject to a paid-up, nonexclusive, irrevocable worldwide license to the United States in such copyrighted work to reproduce, prepare derivative works, distribute copies to the public and perform publicly and display publicly the work, and to permit others to do so.

Abstract

Maximal flexibility of rules, or ease of reuse — choose one. The tighter the rules, the more consistent documents will be and the easier it will be to reuse them, but only if the rules are reasonable enough to be adopted. (If all the data creators ignore the rules, reuse doesn’t get easier.) JATS4R (JATS for Reuse) is a NISO working group devoted to optimizing the reusability of scholarly content by developing best-practice recommendations for tagging content in JATS XML. The group has devoted particular attention to the flexibility/reuse tradeoff for rules on attribute use and controlled values, and we eventually decided that we needed some rules for ourselves, on how to write rules for attributes in our recommendations. In the process of developing our guidance document for writing rules for attribute values in our recommendations, we learned (or at least articulated) some things along the way.

Research

Types of Attribute Recommendations

Identification
Prescribed Usage
Controlled Values
Suggested Values
Early Version Workaround

Testing the Recommendations

Identification
Prescribed Usage
Controlled and Suggested Values
Clean Value Test

JATS4R Will

Appendix A. Survey of Attributes in Recommendations

Article Publication and History Dates
Authors and Affiliations
General Recommendations
Citations
Clinical Trials
Disclosure of conflicts of interest/competing interests
Data citations
Data availability statements
Display Objects
Math Recommendations
Permissions

At the end of 2018, the JATS4R Steering Committee was reviewing the Roadmap to plan our work when we realized that several of the topics on the list to be discussed were related to defining and controlling a list of values for attributes.

One example was to define a list of approved values for @fn-type. We knew that we had been making recommendations for attribute usage and had defined some values, but we had no understanding of how and when these recommendations were created. Restricting attribute values to a controlled list has a great positive effect on the reusefulness of XML because future users know what to expect and have an understanding of what the values mean.

But controlled lists of attribute values can have a negative effect on adoption. A user who has a need for a value that is not on the controlled list has two choices: petition the restricting agent to add her value to the list of acceptable values or ignore the recommendation. Restricting agents think that they can respond to user’s requests to keep the controlled values lists up to date, but in reality the pace of work for any restricting agency cannot keep up with XML user’s needs who are under real publication deadlines.

This does not mean that we should not write any controlled value lists into JATS4R recommendations, but we should be aware of the costs and tradeoffs of doing so.

Research

By early 2019, JATS4R had published eleven recommendations. We reviewed these published recommendations to get an idea of what guidance we had already given. The results of this research are presented in the Appendix. Reviewing the current recommendations, we were able to classify the attribute rules we had written into:

Identification: Define the attribute:value combination as an Identifier for an object type.
Prescribed Usage: Prescribe how an attribute should be used in a given circumstance.
Prescribe which values may/must be used:
1. Controlled Values: Values must be from a list, either defined in the Recommendation or from an Outside Authority.
2. Suggested Values: Values should be from a list, either defined in the Recommendation or from an Outside Authority.

Types of Attribute Recommendations

Identification

This is a attribute:value pair that will identify the object as a specific thing. This is important to JATS4R because we can apply specific tests to general JATS objects if we know what the object is.

For example: If we can identify a citation as a data citation (with @publication-type="data"), then we can apply tests for rules specific to data citations to the citation element and all of its descendants if necessary. Is there an <article-title> but no <data-title>?

Testing: There is no way to test whether an Identification attribute is set appropriately. We will have to assume that the identification is correct. Then we can make other tests related to that specific object. We can also set values of other attributes related to that object, given the knowledge of that object’s identity. For example, if a <contrib> is an author (@contrib-type="author"), we should be able to control values in another attribute on that <contrib>, but not other <contrib>s.

Prescribed Usage

This is a recommendation that describes how, when, and for what an attribute may, should, or must be used. The recommendation for usage of the attribute is separate from any values that we suggest or require be used. Prescribed Usage should not be conflated with value lists, although most of the time Prescribed Usage is paired with either a Controlled Values or Suggested Values list.

For example: The recommendation for <institution-id> says to use “@institution-id-type to indicate the type of ID; e.g., ‘orcid’ or ‘ringgold’”.

Controlled Values

A controlled value list describes an attribute whose content must be from a list, either defined in the Recommendation or from an Outside Authority. An attribute that has a value not in the list would be an ERROR. We can write rules and apply validations to controlled value lists on general attributes for given circumstances once we have identified the element with an Identification attribute.

For example: @date-type on <pub-date>: Use value “original-publication” to indicate that the date is the original date of publication, and “update” for dates that represent published updates to the publication.

Suggested Values

A suggested value list describes an attribute whose content should be from a list, either defined in the Recommendation or from an Outside Authority.

For example: Use “journal” or “book” as the value of @publication-type to indicate that the citation is to a journal or book, respectively. Other examples are: “letter”, “review”, “patent”, “report”, “standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from the JATS guidelines. This is not a limited field so others can be used as appropriate, for example, “website”. However, in the interests of standardisation, JATS4R requests publishers to contact JATS4R if using additional values so we can create a definitive list and reduce variation across XML sources. “Other” is not a preferred value.

Early Version Workaround

This is not a type of attribute recommendation, but it is worth mentioning here. This is the “Early Version Workaround” where we describe a way to meet the Recommendation with an earlier version of the DTD that does not have the attribute that we recommend as the best solution. An example of this is from Article and Publication Dates: “use article/@specific-use for article version values from NISO JAV for JATS 1.1 and earlier schemas”.

Testing the Recommendations

Identification

Attributes used for Identification are not tested. Instead, then inform the validator WHAT an object is so that tests specific to that object type can be applied. But we can run a Clean Value Test on the values of attributes that we have defined as Identification.

Example: We can’t test to see whether every <sec> in a document has a type that conforms to a list of values, but if a user has <sec sec-type="data_availability_statement">, we could have an error. In testing, we need to anticipate possible things that people would do which are close to the value we want.

Prescribed Usage

Tests for attribute usage will be situational and may be based on identifying the object that we are in. Attributes will be tested:

That they exist if they are defined as REQUIRED (these may be situational based on any property that can be found in the article XML type of object, existence of other attributes:attribute values);
That they don’t exist if they are declared DO NOT USE (I don’t think we have any of these);
Values can be tested if the Prescribed Usage is paired with a Controlled Values or Suggested Values list.

Controlled and Suggested Values

At first it seems that the difference in tests for a Controlled Value list and a Suggested Value list would be ERROR vs. WARNING. But it is not that simple. We have already defined Controlled Value lists and Suggested Value lists in our Recommendations. We have examples that have been defined in the Recommendation itself and examples that refer to an outside controlled list like the JAV. The proper values will be circumstantial in many cases, so the tests are not as simple as “on the good list” or “not on the good list”.

We need to be aware of how our attribute recommendations can be tested so that they can be most effective.

Should out-of-list values be errors or warnings? That is: Do we allow things in a JATS4R-compliant article that we have not defined?

We prescribe @sec-type="data-availability" for Data Availability statements, but do we want to exclude all other possible values of @sec-type?

We will either need to:

Define all values of @sec-type that we expect to see in JATS4R-conforming articles (the Maloney proposal, https://github.com/JATS4R/JATS4R-Participant-Hub/issues/118), or
Test only “circumstantial” values (it is possible that @person-group-type may be allowed to have different values if it is in a Data Citation than if it is in some other citation) or
come up with a “Forbidden list” of values related to the ones that we define. This is the “Clean Value Test”.

Clean Value Test

A Clean Value Test is a way to try to control an attribute value without restricting that attribute to a controlled set of allowed values. The test involves thinking up as many ways for the value you want to be written and then explicitly excluding those. This can be frustrating, require ongoing maintenance, and be wholly frustrating.

A good example of this is when we are defining values for Identification. We prescribe @sec-type="data-availability" to identify Data Availability Statements. We cannot exclude all values for @sec-type except for “data-availability”. Nor can we provide a list of all “approved” values for @sec-type. Instead we must write an error for any value that is approaching but not equal to “data-availability”. The list of values that would generate an ERROR would include but not be limited to: data_availability, data availability, Data-Availability, data-statement, dataavailbility.

This is a list that will grow as people find new ways to misrepresent “data-availability”. The question may come up about “normalizing” the values before testing, but this would weaken our recommendations because any normalizing we do in the validator must then be done for any future application that is looking for “data-availability”.

JATS4R Will

Define as many Identification attribute:values as we can. This allows us to identify a given object in the XML for testing or for later text mining. Usually these are set using general attributes like @content-type or @fn-type. We can define the value we want used, but we cannot exclude any other value. We can check Identification values with a Clean Value Test.
Define Prescribed Usage for attributes. The existence of attributes under certain circumstances can be tested (with the aid of Identification attributes) easily. These can be ERRORS or WARNINGS depending on the cases.
Be careful about defining Controlled Value lists; we should not use them generally. So no rules like “@fn-type must be one of (corresp | reference | suppdata)”. But we could make up a rule something like “a footnote referenced from a contrib with a role of ‘illustrator’ must have a @fn-type from (media | style | corresp).” Not matching a value in a Controlled Value list (under the appropriate circumstances) will be an ERROR.
Use Suggested Value lists, which are a little more forgiving. Not matching a value in a Suggested Value list will be a WARNING. These will be tempting to use when we don’t want to commit to a decision. I think we should avoid this and define them circumstantially like the Controlled Value lists. Because there is no ERROR, Suggested Value lists have no teeth. I suggest that we use them to do two things: strongly encourage usage to move in a certain way and/or test out values that we may want to control in a future version of the Recommendation. If this is our intent, we should list this in the recommendation.
Use the following style for attribute recommendations:
- All attributes we define should follow a style: All letters are lowercase and where a space to differentiate words would be used in text, a hyphen (U+002D) is added, for example, “original-publication”.

Appendix A. Survey of Attributes in Recommendations

ID	Identification
PU	Prescribed Usage
CV	Controlled Values
SV	Suggested Values

Article Publication and History Dates

https://jats4r.org/article-publication-and-history-dates

PU/CV - use /article/@specific-use for article version values from NISO JAV for JATS 1.1 and earlier schemas.
CV - @date-type on <pub-date>: Use value “original-publication” to indicate that the date is the original date of publication, and “update” for dates that represent published updates to the publication.
PU/CV - Use @date-type on <date> in <event> to indicate what stage of publication this version was in. Use NISO JAV values.

Authors and Affiliations

https://jats4r.org/authors-and-affiliations

CV - @ref-type (on <xref>). When linking a <contrib> to its <aff> use @ref-type=”aff” on <xref> [[Validator tool result: error if @ref-type on <xref> != “aff” if @rid references an <aff> element ]]
PU - <institution-id>. Capturing the institutional ID is not mandatory at this time. However, if the publisher does capture it, they should make every effort to ensure that it is accurate. Use <institution-id> to contain the ID, and @institution-id-type to indicate the type of ID; e.g., “grid” or “ringgold” the <institution-id> and <institution> elements within <institution-wrap>.
PU/CV - @country. If <country> is used, then @country must also be used and must be set to the 2-digit country code, as specified in ISO 3166-1 (recommended in the JATS tag library)binary
ID - <contrib>, @contrib-type. Contain each author within a <contrib> element. If a <contrib> contains an author, then @contrib-type must be set to “author”
PU - @corresp. Use the corresp attribute on <contrib>, set to value “yes”, to identify the corresponding author(s).binary

General Recommendations

https://jats4r.org/general-recommendations

NONE

Citations

https://jats4r.org/citations

PU/SV - @publication-type=”…” on <mixed-citation> or <element-citation>. Use “journal” or “book” as the value of @publication-type to indicate that the citation is to a journal or book, respectively. Other examples are: “letter”, “review”, “patent”, “report”, “standard”, “data”, “working-paper”. This list is not exhaustive and is sourced from the JATS guidelines. This is not a limited field so others can be used as appropriate, for example, “website”. However, in the interests of standardisation, JATS4R requests publishers to contact JATS4R if using additional values so we can create a definitive list and reduce variation across XML sources. “Other” is not a preferred value
PU/CV - <person-group> and @person-group-type. Use the <person-group> element to specify authors and other contributors in a citation. Use the @person-group-type attribute to specify the role of a contributor, when it is possible to identify them with a role. A separate <person-group> element should be used for each role. This attribute has a fixed list of allowed values in the Journal Publishing tag set: all-authors; assignee; author; compiler; curator; director; editor; guest-editor; inventor; transed; translator.
PU/CV - @pub-id-type on <pub-id>. Use this attribute to specify the type of the identifier. For example, a DOI would have the @pub-id-type value of “doi”. The value should be one of the valid values from the list in the Tag Library.

Clinical Trials

https://jats4r.org/clinical-trials

PU/CV (Should be ID) - related-object/@content-type. Use the content-type attribute optionally to indicate which stage of the trial the publication is reporting on. Since this information is intended for content providers submitting linked clinical trial information to Crossref, if @content-type is used, its value must be “pre-results”, “results”, or “post-results”, as defined in the crossref schema. [[Validator result: If absent, no message. If present, ERROR if values not equal “pre-results” or “results” or “post-results” ]]
PU - related-object/@source-id. The source-id attribute must be used to identify the clinical trial registry. Crossref curates a list of WHO-approved registries and assigns them a DOI. Content providers are encouraged to select an appropriate registry from this list and supply the registry DOI or the WHO registry name as the source-id value.
PU/CV - related-object/@source-id-type. The source-id-type attribute must be used to identify the type of ID provided in @source-id. The value of @source-id-type should be “crossref-doi” or “registry-name”, as appropriate (see Recommendation 3.)
PU/CV - related-object/@document-id-type: The document-id-type attribute is required and must identify the kind of @document-id. The value must be either “clinical-trial-number” or “doi”.

Disclosure of conflicts of interest/competing interests

https://jats4r.org/conflict-of-interest-statements

ID - @fn-type on <fn>. Use @fn-type=”COI-statement” to identify the footnote as a COI statement.binary

Data citations

https://jats4r.org/data-citations

ID - @publication-type=”data” on <mixed-citation> or <element-citation>. Use “data” as the value of @publication-type to indicate that the citation is to a data set, even if that data set is the entire data repository. binary

Data availability statements

https://jats4r.org/data-availability-statements

ID - @sec-type=”data-availability”. Use this attribute on the <sec> containing the DAS. binary
ID - <element-citation> or <mixed-citation>, @publication-type=”data”. Use this attribute on all <element-citation> or <mixed-citation> elements that contain references to data.binary
PU/CV - @specific-use on <element-citation> or <mixed-citation>. For publishers who elect to collect such granularity in their workflow, see the table below for four @specific-use attributes recommended for JATS XML.

Display Objects

https://jats4r.org/display-objects

NONE

Math Recommendations

https://jats4r.org/math

NONE

Permissions

https://jats4r.org/permissions

NONE

BalisageThe Markup Conference2019

Balisage Paper: Rules for the Rulemakers

JATS4R’s Self Guidance on Attributes

Abstract

Table of Contents

Research

Types of Attribute Recommendations

Identification

Prescribed Usage

Controlled Values

Suggested Values

Early Version Workaround

Testing the Recommendations

Identification

Prescribed Usage

Controlled and Suggested Values

Clean Value Test

JATS4R Will

Appendix A. Survey of Attributes in Recommendations

Article Publication and History Dates

Authors and Affiliations

General Recommendations

Citations

Clinical Trials

Disclosure of conflicts of interest/competing interests

Data citations

Data availability statements

Display Objects

Math Recommendations

Permissions

Balisage Series on Markup Technologies