Note: Acknowledgement
This paper builds on the work done by the OCUL feedback group. I am deeply grateful to all the members of the group for the opportunity to work with them and especially for their insightful analysis of metadata for gov info: Frank van Kalmthout, Archives of Ontario; Graeme Campbell, Queens U; Helene LeBlanc, Wilfrid Laurier U; Martha Murphy, Ontario Workplace Tribunals Library; Sandra Craig, Legislative Assembly of Ontario; Simone O’Byrne, Ministry of the Environment, Conservation and Parks.
Introduction
Scholars Portal (SP) was established in 2007 and is funded by the Ontario Council of University Libraries (OCUL) consortium. It is the technological body of OCUL. Among its primary services is an Ebook platform that provides a single interface for accessing and preserving digital texts (licensed and digitized public domain materials) from the world’s most important scholarly publishers. Publishers deliver full-text content to SP, and the SP Ebook team uses programs, referred to as “loaders,” to ingest the content within the SP platform. This process, which we refer to as “Ebook local loading” at SP, is a process that is designed to meet the agreements made on behalf of the twenty-one university members of its consortia, including managing various levels of access to the content. The SP Ebook platform is similar to a federated provider or aggregator, such as OhioLINK and HathiTrust.
In 2018 we finished redesigning our Ebook platform, and one of the critical decisions in the redesign was what metadata schema we would use as our target format. To what metadata standard will we normalize or transform all the different source data we get from publishers and aggregators, so we maintain. Searching for the Ebook standard format made it clear that no book DTD/schema is dominant in the Ebook publishing industry. SP Ebook platform loads Ebooks from around 30 publishers. Each publisher delivers the content in its format and packaging system. The metadata can be MARC, ONIX, excel, MARC XML, TEI, or in other various DTD/Schema, some unique to and developed in-house by the publisher. The full text can be XML and PDF. Some publishers deliver the Ebooks as individual chapters, and some provide the book’s full text in one PDF or XML file.
Since our Ejournal platform maps all publishers’ data to JATS, we became interested in its new sibling, BITS. For those who don’t know, BITS — The Book Interchange Tag Suite (BITS) — is an XML document model for STEM books based on JATS (the Journal Article Tag Suite, ANSI/NISO Z39-96-2015). BITS is a named collection of XML elements and attributes for describing the structural and semantic content of books and book components, as well as a packaging element for interchange of book parts. BITS provides a robust book model that is compatible with JATS, making it easy for publishers of both journals and books to publish them using the same system. [Lapeyre 2019]
Due to the similarities between JATS and BITS, if you already have expertise in JATS, getting into BITS is easy. You could easily add BITS books if your display system were built for JATS articles. If your search system were built for JATS articles, it would search BITS books with minor adjustments.
As part of redesigning the Ebook service, we developed an SP BITS profile as the destination format for all the publishers’ data we get. Our BITS format was created with scholarly publishers, collection development departments of academic libraries, and E-resources workflows in mind. We didn’t want to create a tag for uncommon values or to miss information that serves academic librarians in reviews of inventory or assessments. Our profile took advantage of BITS being a flexible XML. It aimed to capture rich descriptive metadata at the book-meta level and minimal metadata at the book-part, i.e., chapter level.
A word on metadata: quality and gaps
Although there are many definitions of quality metadata, I would like to focus on one. Marieke Guy, Andy Powell and Michael Day state that “quality is about fitness for purpose,” and this purpose may be internal and external. [Guy, Powell, and Day 2004]
This definition may lead us to discuss metadata gaps across various standards created due to differences in the purpose of the standard.
By gaps, we talk about areas in bibliographic metadata that are not transferable across the various standards. ONIX product types allow distinctions between 10- and 13-digit ISBNs but not between paperback and online ISBN. BITS enables you to define the type of identifiers, and in SP, it was essential for us to capture and differentiate between print, online and Epub ISBNs.
However, it doesn’t mean that when we transfer data from ONIX to our custom BITS profile, we can know which 13-digit ISBN is a print ISBN and which is an online ISBN. This gap between the two metadata standards needs to be considered when we map data, which has nothing to do with our choice of BITS as the service’s metadata format. It’s essential to remember that gaps between metadata standards are common and do not in any way testify to the quality of the metadata or the appropriateness of the selected metadata standard.
SP BITS Profile: uses and challenges
SP BITS profile includes collection-meta, book-meta and book-part-meta sections. Still, I will focus on the first two because gov info content usually doesn’t have the book-part sections that scholarly monographs have. And already here, you may see that we have a problem with our BITS profile: if we think about quality metadata as fitness for purpose, why would we have book-parts in our target format if none of the source formats contains chapters?
However, Debbie Lapeyre reminds us that “The BITS book models are not intended to describe trade books, cookbooks, grade-school textbooks, legal works, historical editions, or any of the wide variety of books outside the current scientific, technical, engineering, and medical realms in which JATS is used for journals.” [Lapeyre 2019]
With that in mind, we can say that BITS is a perfect fit for our scholarly content and Ebook service, and really, the model was never intended to be helpful for any type of content. And yet, our Ebook service is a home for more than just scholarly publications in the realm of JATS coverage and academic Ebooks. Scholars Portal has a long history of collecting government documents through several channels. And since most of these documents are in PDF format and are stored and accessible through the Scholars Portal Books platform, the question of whether we could use BITS for these collections came up from the inception of the SP profile. If you look at the profile declaration of content, you can see that we tried to get ready for other types of content:
xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:sp="http://scholarsportal.info/metadata" xml:lang="en" book-type="{monograph|govdoc|thesis|protocol|proceeding|encyclopedia|dictionary|atlas|score}">
When we first created the profile, we didn’t have scores but later added them since the music librarian purchased scores that needed a local loading space. Did we map the scores metadata to BITS? We did.
As for govdocs, this content lived on our Ebook platform long before switching to BITS. We have an agreement with the Ontario Legislative Library and regularly load their content. We have a gov info OCUL community with a small annual budget to digitize at-risk government info and load it on our platform. Then, there are specific requests from Ontario institutions. Since the Ontario government doesn’t invest in libraries or librarians, the remaining few found Scholars Portal a good alley and partnered with us to load content from specific ministries or bodies, again, either at-risk content or content that they thought needed to be preserved and accessible.
The unique nature of Government Information
Most of the gov info collections had some level of metadata, usually MARC records. Since the Ebook industry also used MARC records for a long time, there wasn’t any problem in mapping the MARC records for gov info into the new SP BITS profile. At least not until we took a close look at the data. It started with one of the most popular collections in the area of govdocs that we had, a public and health policy collection that most of the universities in OCUL purchased. One year during renegotiation before the renewal of this collection, it became clear that there were many challenges on the way to renewal. OCUL wanted to evaluate the impact of losing this collection. And how do you assess a specific collection? You look for usage stat, of course. Which public policy documents were high in demand? What health organizations were covered by the collection, and whether a different licence could recover this content.
So how do you learn such details? You go to the metadata. You hope that when you look at all the values under the publisher field, you’ll discover which providers were covered in the collection. If you check usage, you want to have parameters to look at. Publication year, content from specific provinces? Do users prioritize provincial or federal content in their searches?
At this point, it became clear that the regular bibliographic fields we consult to
assess or discover scholarly publications could become useless when we want to analyze
gov info content. It also made clear that metadata for govdoc needs to be more than
book-type="govdoc"
. At the same time, OCUL’s strategic decision to focus on gov info collections required
high-quality metadata that could tell us a complete story about our gov info collections.
If we have content from Statistic Canada, what portion of their publications do we
have? A specific time slice? Unlike scholarly Ebooks, gov info doesn’t come with ONIX,
catalogues or title lists. The only way to know what’s in a collection is to have
robust metadata to count on, so it can be queried and give us the details we are looking
for.
In October 2020, Scholars Portal asked for feedback on how best to present and arrange government information and related content on our E-book platform to take a closer look at the challenges mentioned above. The call for participation went out to the OCUL-GIC mailing list and other OCUL forums. A small group of government information librarians from OCUL and the Ontario government met regularly to discuss best practices to describe government information content.
Since SP Ebook service uses BITS, the conversation revolved mainly around possibilities in the BITS standard. Still, the participants reviewed examples from MARC records and consulted with essential documents such as the Ontario metadata guidelines.
Since SP Ebook service uses BITS, the conversation revolved mainly around possibilities in the BITS standard. Still, the participants reviewed examples from MARC records and consulted with essential documents such as the Ontario metadata guidelines.
BITS for Government Information?
Metadata is one of the main issues with gov info content. When partnering with government Ontario Government or Federal departments and other bodies and organizations to digitize at-risk content, it is usually hard to find the resources to create metadata for the digitized content. The Ebook service requires high-quality metadata to add collections. Still, unlike content from scholarly publishers, where high-quality metadata is written into the licence agreement, metadata for government information comes in various forms and levels. The feedback group identified several metadata areas they thought would be significant for the discovery of government information:
-
Book Type attribute:
book-type
is an attribute of books the top-level BITS tag:Are govdocs publications or information? Maybe both? The feedback group thought government information would encompass all types of documents and publications in this area. However, their idea of defining the book type was a bit more complex than what is allowed by the book-type attribute in the image. Since the gov info collections live in the same service as the scholarly monographs, the first level indeed should flag the record as
book-type="gov info"
.However, there seems to be a need to be more specific and have a second level of book type that will allow users to filter their searches on particular government publications. These GovInfo Types are based on a list initially prepared by the Ontario Government Libraries Council Working Group on Government Publications for describing types of information posted on Ontario.ca. Examples from this list include Annual Reports, Backgrounders, Budgets, Expenditure Estimates and Public Accounts, Bulletins and Notices, Mandate Letters and News Releases.
This way, the first level separates gov info from other types of content loaded on the platform. The second level helps the user identify what content they are dealing with: at what level of authority was the document written? Is it a new obligatory policy or a preliminary discussion with stakeholders? Users need this level of information when it comes to gov info.
-
Publisher vs. Corporate Author: As seen in the following image, BITS defines the publisher as the entity responsible for the work. In government information, it is often the corporate author.
Sometimes, the publisher will legitimately be the same as the corporate author. And if there are different values for both publisher and contributor, BITS has the flexibility to choose the contributor type and so
contrib-type="corporate author"
. In such cases, there isn’t any problem, nor in cases where the corporate author is different from the publisher; BITS has both fields.But some examples could complicate things: The case for Kyoto: the failure of voluntary corporate action. By Matthew Bramley; prepared by Pembina Institute; in cooperation with David Suzuki Foundation. Another example: Metro Toronto remedial action plan: environmental conditions and problem definition. Author: Canada-Ontario Agreement on Great Lakes Water Quality. Other Author(s): International Joint Commission. Ontario. Ministry of Natural Resources.
Add a sponsoring body or two to the above examples, as often seen in Think Tank and NGO publications. How do all the involved organizations fit into the contributor and publisher elements in BITS? BITS has conference sponsors but not publication sponsors. “Prepared by” and “in cooperation” are not regular contributors. The closest, perhaps, is the option to use
<collab>
in<contrib>
as BITS has the opportunity for an organization contributor:<contrib-group> <contrib contrib-type="author"> <collab collab-type="committee">Accredited Standards Committee S3, Bioacoustics</collab> </contrib> </contrib-group>
[1]But the more robust option for
<collab>
is in the reference or bibliography sections; thus, it might be suitable for the flexibility needed around sponsors and other contributors.In many government documents, the value for the publisher could be “Queen’s printer,” which is typically a bureau of the national, state, or provincial government responsible for producing official documents issued by the Queen-in-Council, Ministers of the Crown, or other departments to identify the responsible body under the crown.
This, however, tells us very little about the ministry or department that created the document or publication and relates more to copyright statements. If we want to query the publisher field for a gov info collection, “Queen’s Printer” won’t give us the information we want. Suppose we wanted to learn what portion of a given collection was created by the Ontario Ministry of Education. While we could cross-search the corporate author field, we could also allow more than one publisher in the BITS, thus having “Queen’s Printer” as one value and “Ministry of Education, Ontario” as the second value. Either way, the world of Ebook publishing is very different from the gov info meaning of “publisher.”
-
Identifiers: Identifiers are highly significant for Ebooks, and BITS allows one to include any identifier one wishes to capture:
The SP BITS profile captures identifiers that are commonly associated with Ebooks:
<!-- zero, one or many book-ids per document --> <!-- doi: from crossref, use format 10.XXXXX/XXXXXXXX; lcc: library of congress classification; docID: --> <!-- publisherID unique ID from source data --> <!-- ismn [print|online|obsolete]: these are used by the escore --> book-id-type="{doi|lcc|docid|publisherid|ismn_print|ismn_online|ismn_obsolete|oclc|utlcat}">{book id}</book-id>
And of course it allows for ISBN and ISNN capturing:
<!-- optional: use either issn or isbn, depending on publication --> <!-- remove all hyphens --> <issn publication-format="online|print|unknown">{series issn}</issn> <isbn publication-format="online|print|epub|unknown">{series isbn}</isbn>
However, when it comes to gov info, many documents don’t have ISSN or ISBN, and DOI is even rarer. At the same time, some identifiers are significant for this type of content. Since BITS has the kind of book-id open, it is possible to add identifiers such as “Government Document Classification Number” or “The CODOC classification system” (both were explicitly developed for government documents).
A gap occurs when the identifier is meant to identify the body that created the metadata record. Unlike content that comes directly from publishers and vendors, identifying the origin of gov info is not always straightforward, mainly if harvested from the web. Bringing in the information of the record creator could provide helpful information for users who want to track back the origin of a given document. The name of the organisation(s) that created the original bibliographic record usually appears as a specific code. In MARC records, it is common for government librarians to add the code of their unit to the 040 fields. Bringing this code as part of the mentioned identifiers would be helpful for librarians who work with gov info.
-
Jurisdiction, level of jurisdiction and type of organization as attributes for the corporate author: The feedback group agreed that the corporate author is more significant than the publisher. So ideally, we would like to be able to describe the corporate author according to 3 levels:
-
The type of organizational author: governmental, intergovernmental, nongovernmental
-
Level (of government): the level(s) of government describing the organizational author: country, province, municipality (for counties, townships, cities, towns), nongovernmental (for organizational authors that are outside of government)
-
Jurisdiction: the geographic region represented by the organizational author (this is not necessarily the same as the Subject/Topic; this is also not the value under “Publisher-Location,” which is mainly understood as the physical location of the body responsible for the publication)
According to gov info librarians in the feedback group, users who visit the reference desk often ask about a specific government level or jurisdiction. Adding the above attributes could allow filtering on such parameters and get users the desirable results. BITS
<contrib-group>
allows several attributes though none of them could be a perfect fit for understanding better the type of organization or organizations we are dealing with in the corporate author field: -
To sum up, the model that separates publishers from other entities that participate
in the creation, presentation and sponsoring of government information is lacking.
The term “corporate author,” while it had a designated field in the MARC system, requires
some adaptation in BITS and specific attributes that are currently hard to fit under
the <contrib-group>
but that, according to gov doc librarians, are very significant for searches.
Final Discussion
While BITS has not been created with gov info in mind, it is a highly flexible and diverse standard to describe this content. Yet, different types of content bring other demands and challenges. If quality metadata needs to fulfill its purpose, users of government information require metadata that differs from the descriptive metadata we are accustomed to seeing for academic Ebooks. Hosting government information on the Ebooks service of Scholars Portal could undoubtedly build on the bibliographic metadata fields available on the SP BITS profile. And yet, if we wanted to serve users of government information, developing a new BITS profile or custom schema to reflect their needs might be the preferred solution.
References
[Lapeyre 2019] Debbie Lapeyre, Introduction to BITS (Book Interchange Tag Suite). January 18, 2019. https://www.xml.com/articles/2019/01/18/introduction-bits-book-interchange-tag-suite/ (Accessed July 11, 2022).
[Guy, Powell, and Day 2004] Marieke Guy, Andy Powell and Michael Day, “Improving the Quality of Metadata in Eprint Archives,” Ariadne 38 (2004), http://www.ariadne.ac.uk/issue/38/guy/ (Accessed July 14, 2022).
[1] https://jats.nlm.nih.gov/extensions/bits/tag-library/2.1/element/contrib.html, last accessed on July 14 2022.