How are dependent works realized?

Jacob Jett; David Dubin

Abstract

When a work of authorship is published in a new edition, what exactly is the relationship between the edition and the contribution of the author or authors? Specifications in the FRBR family offer contrasting accounts of how we should understand the relationships among an edition, its text, and the work of authorship they realize. Resolving these puzzles could improve the integration of digital resources across widely distributed and increasingly heterogeneous projects, but simple recognition that these puzzles exist may prove as useful as (or even more so than) arguments that one account is more correct than the others.

Introduction

When a work of authorship is published in a new edition, what exactly is the relationship between the edition and the contribution of the author or authors? The FRBR^[1] family of bibliographic models presents this as two relationships: a realization relationship that obtains between the work and one or more expressions and a embodiment relationship that obtains between one or more expressions and one or more manifestations. The FRBR expression entity is understood to generalize the notion of a text: whatever arrangement or structure of symbols encodes authorial choices. The manifestation entity generalizes the edition concept: whatever physical pattern and properties are common to those copies of a work that exemplify the manifestation. Such copies are termed items, and the four abstraction levels (work, expression manifestation, and item) are classified as the FRBR Group 1 Entities (commonly referred to as the WEMI entities in today's literature).

This division of levels presupposes that all (or at least many) works of authorship retain their identity across differences in typography, pagination, orthographic standard, and even language in the case of translated works. But each specification in the FRBR family has, in one way or another, recognized the enrichment that editorial attention brings to a work; clarifications of language, correction of errors, selection of illustrations, the addition of commentary and the preparation of glossaries and indexes are improvements not merely in their own right, but also in how their participation and level of detail serve the work's expected uses by its intended audience. An edition may therefore be understood as a derivative or dependent work, but its precise status with respect to its basis is characterized differently from one FRBR family standard to another.

Over the past fifteen years, a series of papers by Allen Renear and his colleagues have examined questions of digital documents' status with respect to the WEMI entities, such as whether XML documents are best understood as expressions, manifestations, or both (Renear et al., 2003) whether Group 1 attributes are really inherited across the WEMI levels (Renear and Choi, 2007), and what, if anything, corresponds to a FRBR item in the digital world (Floyd and Renear, 2008). Following the example of those studies, in this paper we use FRBR family models as high-middle-range theories (Renear and Dubin, 2008a), providing hypotheses to guide analysis with the goal of explaining current practice. Our approach to the problem is rationalized descriptive ontology, one that begins with practitioners' distinctions and assertions and proposes model revisions to resolve competing intuitions and logical inconsistencies (Renear and Dubin, 2008b; Renear et al., 2012). Digital editions prepared using XML-based languages offer evidence of relationships between authorial and editorial contributions, and the WEMI abstraction levels provide a starting point for framing questions and proposing explanations.

Digital editions: a case study

Our running example for this discussion is Molly O'Hagan Hardy's TEI^[2] transcription of Absalom Jones and Richard Allen's 1794 A Narrative of the Proceedings of the Black People, During the Late Awful Calamity in Philadelphia, in the Year 1793^[3] published via Northeastern University Library's TAPAS^[4] Project. The narrative describes how the African American community of 1793 Philadelphia responded to an epidemic of yellow fever that gripped the city from the month of August through November. Hardy's transcription is based on a reprint of the original pamphlet by the London publisher Darton and Harvey, and much of her encoding documents manifestation level properties, such as the line breaks and typographic emphasis of that particular edition. But Hardy has also compiled a personography section, drawing on city directories of the period for records of the occupations and addresses of people mentioned in the narrative. Extracts from the digital edition showing an example from the personography and the name reference in the narrative text are shown below.

<person xml:id="sarah_bass">
    <persName>
        <surname>Bass</surname>
        <forename>Sarah</forename>
    </persName>
    <residence source="#directory_1794">
        <address>
         <street>13 Shippen Street</street></address>
        <location><geo>39.940223 -75.145112</geo></location>
    </residence>
    <occupation source="#directory_1794">
        <rs type="role">Washerwoman</rs>
    </occupation>
</person>

<p><lb ed="ed1"/><persName ref="#sarah_bass">Sarah Bass</persName>,
a poor black widow, gave all the assist-
<lb ed="ed1"/>ance she could, in several families, for which she did
<lb ed="ed1"/>not receive any thing; and when any thing was offer-
<lb ed="ed1"/>ed her, she left it to the option of those she served.</p>

Hardy's digital edition is (and is obviously meant to be) a realization of Jones and Allen's 1794 work, but it preserves a record of physical features from a particular print edition, and is augmented by demographic data that Hardy drew from other sources. More broadly, projects like Hardy's exemplify scholarly editions' evolution along trends predicted by van Zundert and Boot, 2011, such as:

drawing from widely distributed digital resources;
combining even more heterogeneous data types (including demographic and geographic data);
developing as versioned, rather than fixed-state documents;
involving more widely distributed teams of scholars, collaborating with each other remotely.

For a sense of how these trends complicate encoding and description standards for digital editions, consider some of the IDE reviewing criteria listed in Sahle and Vogeler, 2014; they represent information of interest to users that tagging should explicate:

relationships to other printed and digital resources;
specific contributor roles;
methodological focus on “work” vs. “document”;
identifiers for objects within the edition that specify the levels of content structure addressed;
links among objects within and outside the edition that distinguish content from contextual information;

Standards for making these levels, relationships, and distinctions available for machine processing ought to be based (in principle) on agreements about exactly what those things are. But, as discussed in the following sections, even those specifications within the FRBR family offer contrasting accounts for understanding relationships among editions, their texts, and the works of authorship they realize.

Aggregation and augmentation in group one models

Over the past 20 years the FRBR family of specifications has grown from just the original FRBR conceptual model published in 1998 (and hereafter referred to as FRBR-1998) to include supplements and responses addressing requirements for authority entities, such as FRAD^[5] and FRSAD^[6]. It has also been reconciled with the leading cultural heritage description standard, CIDOC-CRM (Crofts et al.,, 2008) via the Object-Oriented FRBR model (FRBR_OO). Recently FRBR-1998, FRAD, and FRSAD have all informed the production of IFLA's new Library Reference Model (IFLA-LRM) (IFLA, 1998, Riva et al., 2017, Bekiari et al., 2016).

FRBR-1998 is very clear that each of the Group 1 entities has its own part/whole relationships obtaining at their respective abstraction levels. That is to say, aggregate works can be composed of dependent works, expressions can be composed of expressions, and manifestations can have smaller manifestations as parts (such as a volume or soundtrack) and items can be physically concatenated, as in the example of journal issues bound together. According to section 3.3:

For the purposes of the model, entities at the aggregate or component level operate in the same way as entities at the integral unit level; they are defined in the same terms, they share the same characteristics, and they are related to one another in the same way as entities at the integral unit level

Unfortunately, the FRBR specification does not lay out precisely how the whole part relationships at one WEMI level support or determine the whole/part relationships at the others. Suppose, for example, we understand Hardy's TEI encoding of Jones and Allen's Narrative to realize a new aggregate work combining their original authorship, the supporting demographic data, and Hardy's editorial contributions. We would understand the XML document as an aggregate expression which has Hardy's tags as one part and Jones and Allen's text as another part. But although both the tags and the text are clearly parts of the XML document, it's not clear at all that each is a FRBR expression.

Our Figure 1 illustrates the ambiguity. W1 is an aggregate of works W1a and W1b, and this aggregate work is realized by the aggregate expression E1. But are we to understand that component expressions E1c and E1d realize W1a and W1b, respectively? Suppose that E1c is Jones and Allen's text, and E1d is Hardy's tagging. Can E1d be an entire expression, even though it's not a syntactically complete XML document? Perhaps it can, and we should imagine Hardy's tags as stand-off markup standing in some kind of intentional relationship to the 1794 text, the London reprint edition, and the directory contents used for the personography. But in that case, how are we to understand the distinction between works W1 and W1b? Wouldn't the tags in that case realize the new derivative work? If not then what work, exactly, do they realize?

At the root of this infelicity is a tension present in all of the FRBR family specifications, specifically a tension between domain constraints on unary properties and participation constraints on relational properties (Wickett and Renear, 2009). On the one hand, only certain kinds of things can be WEMI entities: expressions must be abstract things, for example, and items must be physical things. But on the other hand, items, manifestations, and expressions are all essentially linked to entities at the higher levels; a component expression can't be just any arbitrary part of a representational structure, but must realize a work in its own right. So (as shown in figure 1) although there may be unrealized works, there are no expressions that fail to realize some work. Unfortunately, FRBR-1998 offers us no guidance on how parthood relationships at one level relate to parthood relationships at the other levels.

FRBR_OO takes a different view of the situation described above. Under the FRBR_OO account, Hardy's derivative work would be modeled as an aggregate type that Bekiari et al. (Bekiari et al., 2016) call a complex work. In this case Hardy's derivative work is based in both Hardy's tags and Jones and Allen's text via the realization relationship. As Figure 2 illustrates, the relationships among the different works involved are much clearer than in the FRBR-1998 model, but significantly, the dotted arrows do not represent a mereological relationship. In FRBR_OO aggregate works do not have the works they aggregate as parts. The complex work model offers our running example a clearer distinction between our work of authorship and the editorial work: they differ in conceptual content because Jones and Allen's work is about an epidemic and Hardy's work is about a work of authorship, its realization in a text, and its embodiment in an 18th century edition. On this understanding, the demographic data encode assertions about allusions to persons in the text. Unfortunately, FRBR_OO offers little explanation for what the dotted arrows in figure 2 represent. If aggregation at the work level isn't a part/whole relationship, then what kind of relationship is it?

Finally the recent IFLA-LRM conceptual model proposes yet another interpretation. In this case neither works nor expressions participate in mereological or meronymic relationships. On the LRM account the only relationship from whole to parts is the embodies relationship that obtains between FRBR manifestations and expressions, respectively. IFLA-LRM editors seem to have based this decision on two observations. First of all, embodiment is the only WEMI relationship that obtains across the manifestation and expression levels. FRBR-1998 lists a variety of distinct relationships that may obtain from Expression to Work (e.g., summarizing or adaptation) and from Manifestation to Item (reproduction). Secondly, embodiment is the only many-to-many relationship across the WEMI levels: an Item can exemplify only one Manifestation, and an Expression can realize only one Work, but according to FRBR-1998, a Manifestation can embody multiple expressions. Those earlier modeling decisions leave the embodiment relationship as a candidate for overloading, and the IFLA Working Group on Aggregates decided that aggregates are manifestations that embody more than one expression (FRBR WGA, 2011). How this model accommodates our running example isn't entirely clear, but presumably some of the physical patterns governing electrical/magnetic energy in memory (or monitor light) in a computer processing an XML document would be interpreted as the embodiment of Hardy's tags, while other patterns correspond to Jones and Allen's text.

Mereology or dual aspect?

Renear et al., 2003 introduced a "dual aspect" theory for XML, in which, depending on application context, markup may either cue rendering effects such as creating line breaks and headings, or else encode assertions about expression-level content objects (e.g., claims that a block element is a paragraph or that a text span functions as a header). We can therefore understand Hardy's TEI document as both a work about a particular FRBR expression and about a particular FRBR manifestation. That would suggest Hardy's work is realized not by the entire XML document, but just the augmenting tags. If the text of A Narrative... was stored separately, we could imagine TEI tags directed into Jones and Allen's expression via XLink/XPointer or Web Annotation.

But there's at least one other option for understanding the text of our running example with respect to Hardy's editorial contribution. it's possible that very same text might have two distinct expression roles. Such a situation is most familiar in the case of disputed works, such as the question of whether the Greek texts of pastoral epistles realize first century or second century works of authorship. But when modern scholars encode a text with tools like TEI we can understand the very same text as both realizing the earlier work of authorship and simultaneously expressing claims about earlier authors' language. On this view, Hardy's contribution isn't just in the TEI tags. Her transcription of the text represents a complex conjunctive assertion concerning everything she claims that Jones and Allen wrote. That role is most obvious when an encoder's tag expresses uncertainty about a word, but even where the text is unambiguous we can understand the transcription as encoding a confident claim, and so it may be the case that markup and core text are both essential parts of an editorial work's expression. Although dual expression roles are unlikely to contribute ordinary descriptive metadata, they should inform machine-readable provenance records for scholarship in which details of texts' transmission over time are essential for understanding variation emerging over that history.

Implications for encoding standards

Resolving questions such as whether existing works stand in part/whole relationships to their editions or serve instead as the subject or focus of editorial claims could help to make processing of digital resources more consistent and integratable across widely distributed and increasingly heterogeneous projects. However, in the long run simple recognition that these puzzles exist may prove as useful as (or even more so than) arguments that one account is more correct than the others. In an earlier paper, we proposed that data models play two distinct and often competing roles in information interchange: a descriptive role of modeling a domain and a prescriptive role of documenting stakeholder decisions for uniformity of practice (Dubin et al., 2013). We stated that although descriptive adequacy reduces the need for arbitrary choices, stipulative definitions are inevitably required to fill in a model's representational gaps.

That a group of standards as interrelated as the FRBR family should have such different accounts of aggregation suggests that the assumptions and use cases contributing to our intuitions for complex or dependent works may be difficult to reconcile in a single consistent model. For example, one's intuition that works maintain their identity across multiple editions poses challenges for a view that the editorial labor of producing a scholarly edition brings a distinct new work into existence. These specifications also illustrate how competing explanations of texts and works (such as the domain constraints vs. participation constraints mentioned above) complicate attempts to model relationships across abstraction levels like the WEMI/Group 1 entities. Although we hope that further analysis of these problems can contribute models accommodating many applications, in the medium term it may be best if prescriptive and stipulative approaches to standards development are informed by recognition of the descriptive complexities. If, for now, no single model can capture the full richness of how we understand bibliographic aggregates, among the first steps for choosing and documenting a reasonable scope for a standard are to recognize when our definitions are overloaded, and to craft language for mitigating that problem with more precise distinctions.

Acknowledgments

The authors thank their colleagues at the University of Illinois School of Information Sciences and the anonymous Balisage reviewers for discussions and feedback on earlier versions of this paper.

References

[Bekiari et al., 2016] Bekiari, Chryssoula, Martin Doerr, Patrick Le Boeuf, and Pat Riva. 2016. “FRBR Object-Oriented Definition and Mapping from FRBRER, FRAD and FRSAD (Version 2.4).” International working group on FRBR; CIDOC CRM harmonisation. http://www.ifla.org/files/assets/cataloguing/FRBRoo/frbroo_v_2.4.pdf.

[Crofts et al.,, 2008] Crofts, Nick, Martin Doerr, Tony Gill, Stephen Stead, and Matthew Stiff. 2008. “Definition of the CIDOC Conceptual Reference Model.” ICOM/CIDOC Documentation Standards Group. CIDOC CRM Special Interest Group, 2008.

[Dubin et al., 2013] Dubin, David, Megan Senseney and Jacob Jett. 2013. “What it is vs. how we shall: complementary agendas for data models and architectures.” Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10. doi:https://doi.org/10.4242/BalisageVol10.Dubin01.

[FRBR WGA, 2011] “Final Report of the Working Group on Aggregates.” 2011. IFLA Working Group on Aggregates. https://www.ifla.org/files/assets/cataloguing/frbrrg/AggregatesFinalReport.pdf.

[Floyd and Renear, 2008] Floyd, Ingbert R., and Allen H. Renear. 2008. “What Exactly Is an Item in the Digital World?” Proceedings of the American Society for Information Science and Technology 44 (1): 1–7. doi:https://doi.org/10.1002/meet.1450440374.

[IFLA, 1998] IFLA Study Group on the Functional Requirements for Bibliographic Records. 1998. Functional Requirements for Bibliographic Records, Final Report. Berlin, Boston: De Gruyter Saur. doi:https://doi.org/10.1515/9783110962451.

[Renear et al., 2012] Renear, Allen H, Dave Dubin, Karen Wickett, Simone Sacchi, Richard Urban, and Yunseon Choi. 2012. “Taking Modeling Seriously.” Knowledge Organization and Data Modeling in the Humanities 47. https://datasymposium.wordpress.com/renear/.

[Renear and Choi, 2007] Renear, Allen H., and Yunseon Choi. 2007. “Modeling Our Understanding, Understanding Our Models - The Case of Inheritance in FRBR.” Proceedings of the American Society for Information Science and Technology 43 (1): 1–16. doi:https://doi.org/10.1002/meet.14504301179.

[Renear and Dubin, 2008a] Renear, Allen H., and David Dubin. 2008a. “FRBR as an Interdisciplinary High-Middle-Range Theory for Information Science—a Theoretical Perspective.” In Proceedings of iConference 2008, edited by Angela De Cenzo. Los Angeles. http://www.ischools.org/conference08/pc/PA7-3_iconf08.doc.

[Renear and Dubin, 2008b] ———. 2008b. “Three of the Four FRBR Group 1 Entity Types Are Roles, Not Types.” Proceedings of the American Society for Information Science and Technology 44 (1): 1–19. doi:https://doi.org/10.1002/meet.1450440248.

[Renear et al., 2003] Renear, Allen, Christopher Phillippe, Pat Lawton, and David Dubin. 2003. “An XML Document Corresponds to Which FRBR Group 1 Entity?” In Proceedings of Extreme Markup Languages 2003, edited by B. T Usdin. Montreal, Quebec. http://hdl.handle.net/2142/11885.

[Riva et al., 2017] Riva, Pat, Patrick Le Boeuf, and Maja Žumer. 2017. “IFLA Library Reference Model A Conceptual Model for Bibliographic Information.” The Hague, Netherlands: International Federation of Library Associations; Institutions.

[Sahle and Vogeler, 2014] Sahle, Patrick, and Georg Vogeler. 2014. “Criteria for Reviewing Scholarly Digital Editions, Version 1.1.” Institut Für Dokumentologie Und Editorik.

[van Zundert and Boot, 2011] van Zundert, Joris, and Peter Boot. 2011. “The Digital Edition 2.0 and the Digital Library: Services, Not Resources.” Digitale Edition Und Forschungsbibliothek (Bibliothek Und Wissenschaft) 44: 141–52.

[Wickett and Renear, 2009] Wickett, Karen, and Allen Renear. 2009. “A First Order Theory of Bibliographic Objects.” Proceedings of the American Society for Information Science and Technology 46 (1): 1–8. doi:https://doi.org/10.1002/meet.2009.1450460378.

^[1] Functional Requirements for Bibliographic Records

^[2] Text Encoding Initiative

^[3] http://tapasproject.org/proceedingsofblackpeople/files/narrative-proceedings-black-people-during-late-awful-calamity.

^[4] TEI Archiving, Publishing, and Access Service

^[5] Functional Requirements for Authority Data

^[6] Functional Requirements for Subject Authority Data

Jacob Jett

University of Illinois

`<jjet2@illinois.edu>`

Jacob is a PhD candidate at the University of Illinois School of Information Sciences. His research interests include the conceptual foundations of information access, organization, and retrieval, web and data semantics, knowledge representation, data modeling, ontology development and conceptual modeling.

David Dubin

University of Illinois

`<ddubin@illinois.edu>`

David Dubin is a Teaching Associate Professor at the University of Illinois School of Information Sciences. His research interests include the foundations of information representation and description and issues of expression and encoding in documents and digital information resources.

BalisageThe Markup Conference

Balisage Paper: How are dependent works realized?

Jacob Jett

`<jjet2@illinois.edu>`

David Dubin

`<ddubin@illinois.edu>`

Table of Contents

Introduction

Digital editions: a case study

Aggregation and augmentation in group one models

Mereology or dual aspect?

Implications for encoding standards

Acknowledgments

References

`<jjet2@illinois.edu>`

`<ddubin@illinois.edu>`

Balisage Series on Markup Technologies