Introduction

Scholars Portal (SP) was founded in 2002 to provide the shared technology infrastructure for the Ontario Council of University Libraries (OCUL). A main service is the Journals platform, an XML-based digital repository hosted in a MarkLogic database which contains over 66 million articles from 26 thousand journals. The platform provides access to these articles for faculty and students from universities across Ontario. Additionally, in 2013 SP became the first Canadian Trustworthy Digital Repository (TDR) certified by the Center for Research Libraries, ensuring the long-term preservation of the journals purchased by OCUL libraries. As a province-wide access point and preservation service, SP has a significant responsibility to ensure the accuracy and clarity of our content.

To meet this responsibility and to fulfill OCUL's mission of advancing research by seeking out innovative strategies for preserving and curating research resources (OCUL), we have implemented a new process of utilizing the <related-article> element of the Journal Article Tag Suite (JATS) metadata schema to connect articles with their retractions and corrections. This paper presents the results of the implementation of this process as well as challenges and future opportunities. It will also discuss how our project fits into other standards and communities of practice, introduce some history of handling retractions at SP, and present four case studies of the process in our database.

Background

When a published article is found to have an error, publishers will issue a correction. This is commonly in the form of a published statement in the journal that published the original article. When an error is found that calls into question the validity of the research results, publishers will instead issue a retraction. Retraction is the process through which a published article is removed from a journal, and thus the scholarly record, and involves publishing a retraction notice. Other reasons for retractions include cases of duplicate publishing or cases of academic misconduct.

Much of the existing research in this area has a stronger focus on the decision to retract an article and how the publisher accomplishes this, than on how to handle the retraction in distribution, access, and preservation infrastructures. This focus on publisher responsibilities is seen in the retraction guidelines created and maintained by The Committee on Publication Ethics (COPE) (COPE).

In the past few years, a number of communities have begun to address the role that aggregators such as SP can have in this space. A detailed report titled Recommendations from the Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda Project (Schneider et al., 2021) outlines the importance of continued work in this area and provides a number of recommendations for future directions. Additionally, NISO created the Communication of Retractions, Removals, and Expressions of Concern (CREC) working group (CREC) in the summer of 2022 to guide the creation of standards and best practices for what to do after the decision to retract an article has been made. Our project has aimed to align with the early findings and suggestions of these groups by indicating the status of retracted and corrected articles in our database and displaying that status for human users. As the development of standards in this area is ongoing, we have also aimed to keep our project flexible and adaptable to accommodate future changes in standards and partner workflows.

Scholars Portal either receives data from publishers via FTP or uses scripts to pull content from publishers' FTP sites. Most commonly this content is received in the JATS metadata format. JATS is the Journal Article Tag Suite and is a standard of NISO Z39.96-2021 that provides a common XML format in which publishers and archives can exchange journal content (NISO). The SP team then uses programs referred to as "loaders" to ingest the content into the database. An average of 12,000 articles are loaded each day, often as soon as they are published. Upon initial loading each article is assigned a collection to determine user entitlements and a corresponding record is created in a secondary database as part of the Trustworthy Digital Repository (TDR). This TDR record contains basic bibliographic information about the article as well as information about its preservation history. The TDR ensures content is preserved and reliable and also serves as a method by which an article's history in the database can be investigated.

Through this loading process, SP has adopted a number of strategies to deal with retracted content. A small number of publishers provide occasional lists of retractions that we then delete from the database or replace with metadata indicating the retraction. We also have a process for DOI verification to prevent duplicates within the database. With this process, if publishers issue a correction or retraction notice with the same DOI as the original article, the notice would overwrite the original article in our database. All other cases were handled by periodic database cleanup projects which involved searching text fields such as Title or Abstract for strings that could indicate a retraction such as "retraction:" or "retracted:". These strategies are labour-heavy, do not have clear results for SP users, and are limited to retractions. Any retractions missed during this process, and all corrections, remained in our database with no indication of the changes.

To improve this process, SP has begun utilizing the <related-article> JATS metadata element which is meant to be used for the description of a journal article related to the content but published separately (JATS). For this process, the <related-article> element must also include the ext-link attribute ="doi" and the xlink:href attribute to create the link between articles, and the related-article-type attribute to identify the type of relationship. JATS has described some suggested usage of the related-article-type attribute, but inconsistent use of this attribute created some challenges.

Another possible approach could have been cross-referencing SP holdings against retraction databases such as RetractionWatch (RetractionWatch) or open-retractions (open-retractions). To ensure reliability, save processing time, and conserve server space however, it is preferable to work with data already in the SP database. Additionally these external retraction databases identify retractions through many of the same methods that SP utilized prior to this project (Cheng 2019). Not only would this approach exclude other types of related articles such as corrections, it would also miss anything that is not labeled explicitly as a retraction by the publisher.

Case One

The most straightforward case is when both of the paired articles are sent to SP with a <related-article> element in the metadata. This is a common case when publishers retract an article and then send SP a retraction notice along with a revised PDF with a "retracted" watermark. In this case, the data needed to create the link between the articles is already present in the metadata and only needs to be displayed by the website. The website code reads from an XML mapping file to determine what to display based on the related-article-type and xlink:href attributes. Additionally, all display text has been translated into French to allow for bilingual service.

For example:

A retraction notice is sent with:

                     
                <related-article related-article-type="retracted-article" 
                    ext-link="doi" 
                    xlink:href="10.4103/0019-5413.139860"/>

And a retracted article with watermarked PDF is sent with:

                     
                <related-article related-article-type="retraction-forward" 
                    ext-link-type="doi" 
                    xlink:href="10.4103/0019-5413.189615"/>

The mapping file for the HTML display indicates the English and French text that will be displayed for articles that contain each related-article-type attribute in each <related-article> element

                     
                <dataset-type name="PublisherA">
                    <related-article type="retracted-article">
                        <DisplayText lang="en">This article is a retraction notice for:</DisplayText>
                        <DisplayText lang="fr">Cet article est un avis de rétraction pour:</DisplayText>
                    </related-article>
                    <related-article type="retraction-forward">
                        <DisplayText lang="en">This article has been retracted:</DisplayText>
                        <DisplayText lang="fr">Cet article a été retiré:</DisplayText>
                    </related-article>
                </dataset-type>

The DOI in each <related-article> element indicates which article in our database it will be linked to.

On the platform this is displayed as:

Figure 1: Display of a Retraction Notice

Figure 2: Display of a Retracted Article

A limitation found in this case is that due to publishers' inconsistent usage of values for the related-article-type attribute, it was impossible to iterate across all <related-article> elements at once. Articles were checked manually to determine how the attribute was used by each publisher. A <dataset-type> element was then added to the mapping file to distinguish a specific publisher and <DisplayText> for each related-article-type attribute value was nested under <dataset-type> so that different display texts could be specified based on each publisher's use of the attribute value. Additionally, creating the link between articles is only possible if both articles have DOI. Possible next steps for this project could investigate how to recreate this process with other publisher specific IDs.

Case Two

Case 2 occurs when only one article of the pair includes the <related-article> element. In order to create the link between articles, the <related-article> element must be inserted into the record of the corresponding article. This is a common case when publishers retract an article and send a retraction notice without making any changes to the original article.

For example, an original article is loaded to the database with no <related-article> element. The publisher then retracts this article and sends SP a retraction notice that includes the <related-article> element to indicate the relationship to the original article which it is retracting.

Element in retraction notice:

                     
                <related-article related-article-type="retracted-article" 
                    id="d24e93" 
                    ext-link-type="doi" 
                    xlink:href="10.7759/cureus.6741">
                    <article-title>Comparison of Oral versus Intravenous Proton Pump Inhibitors
                        in Preventing Re-bleeding from Peptic Ulcer after Successful Endoscopic Therapy
                    </article-title>
                </related-article>

To create the link between this retraction notice and the retracted article, SP has created a program to insert the <related-article> element into the article that is indicated by the DOI in the <related-article> element of the retraction notice.

                     
                <related-article related-article-type="retraction-forward" 
                    ext-link-type="doi" 
                    xlink:href="10/7759/cureus.r33" 
                    xmlns:xlink="http://www.w3.org/1999/xlink">
                </related-article>

In order to determine the related-article-type attribute to use in the inserted <related-article> element, a <matching-article-type> element was added to the mapping file

                     
                <dataset-type name="PublisherA">
                    <related-article type="retracted-article">
                        <DisplayText lang="en">This article is a retraction notice for:</DisplayText>
                        <DisplayText lang="fr">Cet article est un avis de rétraction pour:</DisplayText>
                        <matching-article-type>retraction-forward</matching-aritcle-type>
                    </related-article>
                    <related-article type="retraction-forward">
                        <DisplayText lang="en">This article has been retracted:</DisplayText>
                        <DisplayText lang="fr">Cet article a été retiré:</DisplayText>
                        <matching-article-type>retracted-article</matching-article-type>
                    </related-article>
                </dataset-type>

A limitation found in this case is due to the fact that the use of each related-article-type attribute is sometimes inconsistent within a single publisher. This places limitations on the specificity and the accuracy that is possible with our labels. For example, if a publisher uses the related-article-type attribute "retracted-article" for all retraction notices, correction notices, and retracted and corrected original articles, it is impossible to differentiate these in the display. In this case, the general display text "Additional materials:" is used.

Case Three

Because the data in the mapping file is separated by publisher, it was also necessary to create a default case as a catch-all for publishers that begin to use this element, or start using related-article-type attributes that are not yet added to the mapping file.

                     
                <dataset-type name="default">
                    <related-article type="default">
                        <DisplayText lang="en">Additional materials:</DisplayText>
                        <DisplayText lang="fr">Matériaux additionnels:</DisplayText>
                        <matching-article-type>default-forward</matching-article-type>
                    </related-article>
                    <related-article type="default-forward">
                        <DisplayText lang="en">Additional materials:</DisplayText>
                        <DisplayText lang="fr">Matériaux additionnels:</DisplayText>
                        <matching-article-type>default</matching-article-type>
                    </related-article>
                </dataset-type>

After the program is run, these default cases can be located in the log file, analyzed, and manually added to the mapping file.

A limitation found in this case is that it does not account for changes to a publisher's usage of the related-article-type attribute after that publisher and attribute value have already been added to the mapping file. In this case, we rely on user feedback to identify these errors.

Case Four

Unlike the first three cases which involve only two articles, this case describes a connection between three articles. In this example an original article was delivered to SP as usual. The publisher then issued an expression of concern to indicate that the article was under review for retraction and then later issued a retraction notice to indicate that a decision was made to retract the article. The <related-article> element in the expression of concern creates a link between it and the original article, and then the <related-article> element in the retraction notice creates a second link to the original article.

Example:

Original article is received with no <related-article> element and is loaded to the database as usual.

SP then receives an expression of concern that includes a <related-article> element with an xlink:href attribute with the DOI of the original article.

                     
                <related-article related-article-type="object-of-concern" 
                    id="d38e76" 
                    ext-link-type="doi" 
                    xlink:href="10.1042/BSR20200225">10.1042/BSR20200225</related-article>

A matching <related-article> element is then added to the original article with the related-article-type attribute of "object-of-concern-forward" from the mapping and the DOI from the matching <related-article> element in the expression of concern, creating a link between the two articles.

                     
                <related-article related-article-type="object-of-concern-forward" 
                    ext-link-type="doi" 
                    xlink:href="10.1042/BSR-20200225_EOC"></related-article>

SP then receives a retraction notice that includes a <related-article> element with an xlink:href attribute with the DOI of the original article.

A second <related-article> element is then added to the original article with the related-article-type attribute of "retraction-forward" from the mapping and the DOI from the matching <related-article> element in the retraction notice, creating a second link.

                     
                <related-article related-article-type="retraction-forward" 
                    ext-link-type="doi" 
                    xlink:href="10.1042/BSR-20200225_RET"></related-article>

On the platform this is displayed as:

Figure 3: Display

Final Results

The final results of this project include an XML mapping file which is loaded to the MarkLogic database, a program to insert <related-article> elements into article XML records that is run once every three months, and some alterations to the code of the SP website to display the link and descriptive text based on the mapping file. Maintaining data in the mapping file instead of directly in the code of the program or the website allows for ease of changes and updates as well as permitting faster loading and processing.

The program to insert <related-article> elements is currently being run for six of the 24 publishers that include the element in their metadata. Nine publishers either do not send their data in the JATS format or do not include the <related-article> element and required attributes, and so are not included in this project.

The mapping file includes 26 unique related-article-type attribute values (Appendix 1) but due to varied usage, many are repeated under different publishers resulting in 73 unique mappings. To display these, the mapping includes 10 unique DisplayTexts which are included in both French and English (Appendix 2).

Other attribute values for related-article-type that are used by publishers but are not included in the mapping file due to inconsistent usage or definitions outside the scope of this project include:

  • article

  • article-reference

  • author-rejoinder

  • author-response

  • continues

  • data-paper

  • editor-report

  • in-focus

  • in-this-issue

  • journal

  • letter

  • other-specified

  • patientsummary-article

  • point-of-view

  • preprint

  • refers-to

  • related

  • reply-article

  • see-also

  • subset-article

  • wiki

These values are not yet fully evaluated but include relationships between articles such as peer review, related datasets, letters to the editor, and companion articles. Until these can be further explored, they are set to display as "Additional Materials:/Matériaux additionnels:"

Conclusion

Scholars Portal loads an average of 12 thousand articles each day and serves over 500 thousand users across the province of Ontario and so has a responsibility to ensure accurate and reliable content. Improving the clarity of the connection between retractions, corrections, and original articles not only ensures SP users are receiving the correct information but also offers users transparency into the process of scholarly publishing. Utilizing the <related-article> JATS metadata element was an effective approach to this project because it already exists in the metadata of many of the corrections and retractions that are received from publishers. The challenges included automating the process so that it could be implemented within existing workflows for the high volume of data that SP handles daily, and dealing with the inconsistency in attribute usage. Increased use of this JATS element, and improved consistency in usage of attribute values would allow for increased accuracy in this project as well as expansion to include other types of related content such as peer review information, letters to the editor, and comments.

To improve results and further align with CREC and RISRS recommendations, future steps to this project could include adjusting the search function to filter out retracted articles and investigating how this work applies to other SP domains such as datasets and other supplementary materials.

Acknowledgements

We would like to thank Sabina Pagoto and Jonathan Dorey for their French translations, and Wei Zhao for her priceless historical knowledge of the SP database.

Appendix 1. Unique related-article-type Attribute Values

Note

This Appendix contains a list of the unique related-article-type attribute values that are included in the SP mapping file

  • addended-article

  • addended-article-forward

  • addendum

  • addendum-article

  • companion

  • companion-forward

  • concerning-article

  • concerning-article-forward

  • corrected-article

  • correction

  • correction-forward

  • default

  • default-forward

  • expression-of-concern

  • expression-of-concern-article

  • object-of-concern

  • object-of-concern-forward

  • original

  • republished-article

  • retracted-article

  • retraction

  • retraction-article

  • retraction-forward

  • update-to-article

  • withdrawn-article

  • withdrawn-article-forward

Appendix 2. Display Text

Note

This appendix contains a description and English and French DisplayText values of each of the 10 unique DisplayTexts included in the SP mapping file

Table I

Unique DisplayText values

Description DisplayText lang=”en” DisplayText lang=”fr”
Default value for any related-article-type that has not been mapped - temporary until evaluated and added to the mapping Additional materials: Matériaux additionnels :
An original article for which an addendum has been published - link leads to notice of the addendum There has been an addendum published for this article: Un addendum a été publié pour cet article :
Published notices of addendum - link leads to the original article This article is an addendum to: Cet article est un addendum à :
Notices of correction or published comments - link leads to the original article This article is a correction notice or comment for: Cet article est un avis de correction ou un commentaire pour :
An original article for which comments or corrections have been published - link leads to the comment or notice of correction There have been published comments or corrections to this article: Des commentaires ou des corrections ont été publiés pour cet article :
Published corrections - link leads to original article that is being corrected This article is a correction notice for: Cet article est un avis de correction pour :
An original article for which a correction has been published - link leads to notice of the correction There has been a correction published for this article: Une correction a été publiée pour cet article :
Published notice that an article is under review for potential correction or retraction - link leads to the original article This article is an expression of concern for: Cet article est une manifestation de préoccupations pour :
Published notices of retraction - link leads to the original article that has been retracted This article is a retraction notice for: Cet article est un avis de rétraction pour :
An original article that is currently under review for potential correction or retraction - link leads to the expression of concern There has been an expression of concern published for this article: Une manifestation de préoccupations a été publiée pour cet article :
An original article that has been retracted - link leads to the published notice of retraction This article has been retracted: Cet article a été retiré :
An article that was retracted and then corrected and re-published - link leads to the original version that had been retracted This article is a corrected and re-published version of a previously retracted article: Cet article est une version corrigée et republiée d'un article précédemment rétracté

References

[Cheng 2019] Cheng, Y., Parulian, N., Hsiao, T., Dinh, L., Sarol, J., Schneider, J. (19-23 October 2019). ReTracker: Actively and Automatically Matching Retraction Metadata in Zotero. 82nd Annual Meeting of the Association for Information Science and Technology, Melbourne Australia. doi:https://doi.org/10.1002/pra2.32

[JATS] JATS. <related-article> Related Article Information. [online] [cited 5 April 2023]. https://jats.nlm.nih.gov/archiving/tag-library/1.2/element/related-article.html

[MarkLogic] Marklogic. Administrator's Guide - Chapter 23. Forests. [online] [cited 5 April 2023]. https://docs.marklogic.com/guide/admin/forests

[NISO] NISO. ANSI/NISO Z39.96-2021, JATS: Journal Article Tag Suite, version 1.3. [online] [cited 5 April 2023]. https://www.niso.org/publications/z3996-2021-jats

[OCUL] OCUL. Strategic Plan 2022-25. [online] [cited 5 April 2023]. https://ocul.on.ca/strategic-plan

[open-retractions] open-retractions Github repository. [online] [cited 5 April 2023]. https://github.com/open-retractions/open-retractions

[RetractionWatch] RetractionWatch. The Retraction Watch Database version 1.0.6.0 [online] [cited 5 April 2023]. http://retractiondatabase.org/RetractionSearch.aspx?

[CREC] NISO. CREC (Communication of Retractions, Removals, and Expressions of Concern) Working Group [online] [cited 13 July 2023]. https://www.niso.org/standards-committees/crec

[COPE] COPE. Retraction Guidelines [online] [cited 13 July 2023]. https://publicationethics.org/retraction-guidelines

[Schneider et al., 2021] Schneider, J., Woods, N.D., Proescholdt, R., Fu, Y., & RISRS Team. (2021, July 29). Recommendations from the Reducing the Inadvertent Spread of Retracted Science: Shaping a Research and Implementation Agenda Project [online] [cited 13 July 2023]. doi:https://doi.org/10.31222/osf.io/ms579

Author's keywords for this paper:
Scholars Portal; scholarly communication; article retractions; article corrections; JATS; XML