Jett, Jacob, and David Dubin. “How are dependent works realized?” Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). https://doi.org/10.4242/BalisageVol21.Dubin01.
Balisage: The Markup Conference 2018 July 31 - August 3, 2018
Jacob is a PhD candidate at the University of Illinois
School of Information Sciences. His research interests include
the conceptual foundations of information access,
organization, and retrieval, web and data semantics, knowledge
representation, data modeling, ontology development and
conceptual modeling.
David Dubin is a Teaching Associate Professor at
the University of Illinois School of Information
Sciences. His research interests include the
foundations of information representation and
description and issues of expression and encoding in
documents and digital information resources.
When a work of authorship is published in a new edition,
what exactly is the relationship between the edition and the
contribution of the author or authors? Specifications in the
FRBR family offer contrasting accounts of how we should
understand the relationships among an edition, its text, and the
work of authorship they realize. Resolving these puzzles could
improve the integration of digital resources across widely
distributed and increasingly heterogeneous projects, but simple
recognition that these puzzles exist may prove as useful as (or
even more so than) arguments that one account is more correct
than the others.
When a work of authorship is published in a new edition, what
exactly is the relationship between the edition and the contribution
of the author or authors? The FRBR[1] family of bibliographic models presents this as two
relationships: a realization relationship that
obtains between the work and one or more
expressions and a
embodiment relationship that obtains between
one or more expressions and one
or more manifestations. The FRBR
expression entity is understood to generalize the notion of a text:
whatever arrangement or structure of symbols encodes authorial
choices. The manifestation entity generalizes the edition concept:
whatever physical pattern and properties are common to those copies
of a work that exemplify the manifestation.
Such copies are termed items, and
the four abstraction levels (work, expression manifestation, and
item) are classified as the FRBR Group 1
Entities (commonly referred to as the WEMI entities in
today's literature).
This division of levels presupposes that all (or at least many)
works of authorship retain their identity across differences in
typography, pagination, orthographic standard, and even language in
the case of translated works. But each specification in the FRBR
family has, in one way or another, recognized the enrichment that
editorial attention brings to a work; clarifications of language,
correction of errors, selection of illustrations, the addition of
commentary and the preparation of glossaries and indexes are
improvements not merely in their own right, but also in how their
participation and level of detail serve the work's expected uses by
its intended audience. An edition may therefore be understood as a
derivative or dependent work, but its precise status with respect to
its basis is characterized differently from one FRBR family standard
to another.
Over the past fifteen years, a series of papers by Allen Renear and
his colleagues have examined questions of digital documents' status
with respect to the WEMI entities, such as whether XML documents are
best understood as expressions, manifestations, or both (Renear et al., 2003)
whether Group 1 attributes are really inherited across
the WEMI levels (Renear and Choi, 2007), and what, if anything,
corresponds to a FRBR item in the digital world (Floyd and Renear, 2008).
Following the example of those studies, in this paper we use
FRBR family models as high-middle-range theories (Renear and Dubin, 2008a),
providing hypotheses to guide analysis with the
goal of explaining current practice. Our approach to the problem is
rationalized descriptive ontology, one that
begins with practitioners' distinctions and assertions and proposes
model revisions to resolve competing intuitions and logical
inconsistencies (Renear and Dubin, 2008b; Renear et al., 2012).
Digital editions prepared using XML-based languages offer
evidence of relationships between authorial and editorial
contributions, and the WEMI abstraction levels provide a starting
point for framing questions and proposing explanations.
Digital editions: a case study
Our running example for this discussion is Molly O'Hagan Hardy's
TEI[2] transcription of Absalom Jones and Richard Allen's 1794
A Narrative of the Proceedings of the Black People, During
the Late Awful Calamity in Philadelphia, in the Year
1793[3] published via Northeastern University Library's
TAPAS[4] Project. The narrative describes how the African
American community of 1793 Philadelphia responded to an epidemic of
yellow fever that gripped the city from the month of August through
November. Hardy's transcription is based on a reprint of the
original pamphlet by the London publisher Darton and Harvey, and
much of her encoding documents manifestation level properties, such
as the line breaks and typographic emphasis of that particular
edition. But Hardy has also compiled a personography section,
drawing on city directories of the period for records of the
occupations and addresses of people mentioned in the narrative.
Extracts from the digital edition showing an example from the
personography and the name reference in the narrative text are shown
below.
<p><lb ed="ed1"/><persName ref="#sarah_bass">Sarah Bass</persName>,
a poor black widow, gave all the assist-
<lb ed="ed1"/>ance she could, in several families, for which she did
<lb ed="ed1"/>not receive any thing; and when any thing was offer-
<lb ed="ed1"/>ed her, she left it to the option of those she served.</p>
Hardy's digital edition is (and is obviously meant to be) a
realization of Jones and Allen's 1794 work, but it preserves a
record of physical features from a particular print edition, and is
augmented by demographic data that Hardy drew from other sources.
More broadly, projects like Hardy's exemplify scholarly editions'
evolution along trends predicted by
van Zundert and Boot, 2011, such as:
drawing from widely distributed digital resources;
combining even more heterogeneous data types
(including demographic and geographic data);
developing as versioned, rather than fixed-state documents;
involving more widely distributed teams of scholars,
collaborating with each other remotely.
For a sense of how these trends complicate encoding and
description standards for digital editions, consider some of the IDE
reviewing criteria listed in Sahle and Vogeler, 2014;
they represent information of interest to users that
tagging should explicate:
relationships to other printed and digital resources;
specific contributor roles;
methodological focus on “work” vs. “document”;
identifiers for objects within the edition that specify the levels of content structure
addressed;
links among objects within and outside the edition that distinguish content from contextual
information;
Standards for making these levels, relationships, and
distinctions available for machine processing ought to be based
(in principle) on agreements about exactly what those things
are. But, as discussed in the following sections, even those
specifications within the FRBR family offer contrasting accounts
for understanding relationships among editions, their texts, and
the works of authorship they realize.
Aggregation and augmentation in group one models
Over the past 20 years the FRBR family of specifications has grown
from just the original FRBR conceptual model published in 1998 (and
hereafter referred to as FRBR-1998) to include supplements and
responses addressing requirements for
authority entities, such as FRAD[5]
and FRSAD[6].
It has also been reconciled with the leading
cultural heritage description standard, CIDOC-CRM (Crofts et al.,, 2008)
via the Object-Oriented FRBR model
(FRBROO). Recently FRBR-1998, FRAD, and FRSAD
have all informed the production of IFLA's
new Library Reference Model (IFLA-LRM)
(IFLA, 1998, Riva et al., 2017, Bekiari et al., 2016).
FRBR-1998 is very clear that each of the Group 1 entities has its
own part/whole relationships obtaining at their respective
abstraction levels. That is to say, aggregate works can be composed
of dependent works, expressions can be composed of expressions, and
manifestations can have smaller manifestations as parts (such as a
volume or soundtrack) and items can be physically concatenated, as
in the example of journal issues bound together. According to
section 3.3:
For the purposes of the model, entities at the aggregate or
component level operate in the same way as entities at the
integral unit level; they are defined in the same terms, they
share the same characteristics, and they are related to one
another in the same way as entities at the integral unit level
Unfortunately, the FRBR specification does not lay out precisely how
the whole part relationships at one WEMI level support or determine
the whole/part relationships at the others. Suppose, for example, we
understand Hardy's TEI encoding of Jones and Allen's
Narrative to realize a new aggregate work
combining their original authorship, the supporting demographic data, and Hardy's
editorial
contributions. We would understand the XML document as an
aggregate expression which has Hardy's tags as
one part and Jones and Allen's text as another part. But although
both the tags and the text are clearly parts of the XML document,
it's not clear at all that each is a FRBR expression.
Our Figure 1 illustrates the ambiguity.
W1 is an aggregate of works
W1a and
W1b, and this aggregate work is
realized by the aggregate expression
E1. But are we to understand that
component expressions E1c and
E1d realize
W1a and
W1b, respectively? Suppose that
E1c is Jones and Allen's text,
and E1d is Hardy's tagging. Can
E1d be an entire expression, even though it's
not a syntactically complete XML document? Perhaps it can, and we
should imagine Hardy's tags as stand-off markup standing in some kind of
intentional relationship to the 1794 text, the London reprint
edition, and the directory contents used for the personography.
But in that case, how are we to understand the distinction
between works W1 and
W1b? Wouldn't the tags in that
case realize the new derivative work? If not then what work,
exactly, do they realize?
At the root of this infelicity is a tension present in all of the
FRBR family specifications, specifically a tension between domain
constraints on unary properties and participation constraints on
relational properties (Wickett and Renear, 2009). On the one hand,
only certain kinds of things can be WEMI entities: expressions must
be abstract things, for example, and items must be physical things.
But on the other hand, items, manifestations, and expressions are
all essentially linked to entities at the higher levels; a component
expression can't be just any arbitrary part of a representational
structure, but must realize a work in its own right. So (as shown in
figure 1) although there may be unrealized works, there are no
expressions that fail to realize some work. Unfortunately, FRBR-1998
offers us no guidance on how parthood relationships at one level
relate to parthood relationships at the other levels.
FRBROO takes a different view of the
situation described above. Under the FRBROO
account, Hardy's derivative work would be modeled as an aggregate
type that Bekiari et al. (Bekiari et al., 2016) call a
complex work. In this case Hardy's derivative
work is based in both Hardy's tags and Jones and Allen's text via
the realization relationship. As Figure 2 illustrates, the
relationships among the different works involved are much clearer
than in the FRBR-1998 model, but significantly, the dotted arrows do
not represent a mereological relationship. In
FRBROO aggregate works do not have the works
they aggregate as parts. The complex work model offers our running
example a clearer distinction between our work of authorship and the
editorial work: they differ in conceptual content because Jones and
Allen's work is about an epidemic and Hardy's work is about a work
of authorship, its realization in a text, and its embodiment in an
18th century edition. On this understanding, the demographic data
encode assertions about allusions to persons in the text.
Unfortunately, FRBROO offers
little explanation for what the dotted arrows in figure 2 represent.
If aggregation at the work level isn't a part/whole relationship,
then what kind of relationship is it?
Finally the recent IFLA-LRM conceptual model proposes yet another
interpretation. In this case neither works nor expressions
participate in mereological or meronymic relationships. On the LRM
account the only relationship from whole to parts is the
embodies relationship that obtains between FRBR
manifestations and expressions, respectively. IFLA-LRM editors seem
to have based this decision on two observations. First of all,
embodiment is the only WEMI relationship that obtains across the
manifestation and expression levels. FRBR-1998 lists a variety of
distinct relationships that may obtain from Expression to Work
(e.g., summarizing or adaptation) and from Manifestation to Item
(reproduction). Secondly, embodiment is the only many-to-many
relationship across the WEMI levels: an Item can exemplify only one
Manifestation, and an Expression can realize only one Work, but
according to FRBR-1998, a Manifestation can embody multiple
expressions. Those earlier modeling decisions leave the embodiment
relationship as a candidate for overloading, and the IFLA Working
Group on Aggregates decided that aggregates are manifestations that
embody more than one expression (FRBR WGA, 2011).
How this model accommodates our running
example isn't entirely clear, but presumably some of the physical
patterns governing electrical/magnetic energy in memory (or monitor
light) in a computer processing an XML document would be interpreted
as the embodiment of Hardy's tags, while other patterns correspond
to Jones and Allen's text.
Mereology or dual aspect?
Renear et al., 2003 introduced a "dual aspect" theory
for XML, in which, depending on application context, markup may either
cue rendering effects such as creating line breaks and headings, or else encode
assertions about expression-level content objects (e.g., claims that a block
element is a paragraph or that a text span functions as a header).
We can therefore understand Hardy's TEI document as both a work about a
particular FRBR expression and about a particular FRBR
manifestation. That would suggest Hardy's work is realized
not by the entire XML document, but just the augmenting tags. If the
text of A Narrative... was stored separately,
we could imagine TEI tags directed into Jones and Allen's expression
via XLink/XPointer or Web Annotation.
But there's at least one other option for understanding the text of
our running example with respect to Hardy's editorial contribution.
it's possible that very same text might have two distinct expression
roles. Such a situation is most familiar in the case of disputed
works, such as the question of whether the Greek texts of pastoral
epistles realize first century or second century works of
authorship. But when modern scholars encode a text with tools like
TEI we can understand the very same text as both realizing the earlier work of
authorship and simultaneously expressing claims about earlier
authors' language. On this view, Hardy's contribution isn't just in
the TEI tags. Her transcription of the text represents a complex
conjunctive assertion concerning everything she claims that Jones
and Allen wrote. That role is most obvious when an encoder's tag
expresses uncertainty about a word, but even where the text is
unambiguous we can understand the transcription as encoding a
confident claim, and so it may be the case that markup and core text
are both essential parts of an editorial work's expression. Although
dual expression roles are unlikely to contribute ordinary
descriptive metadata, they should inform machine-readable provenance
records for scholarship in which details of texts' transmission over
time are essential for understanding variation emerging over that
history.
Implications for encoding standards
Resolving questions such as whether existing works stand in
part/whole relationships to their editions or serve instead as the
subject or focus of editorial claims could help to make processing
of digital resources more consistent and integratable across widely
distributed and increasingly heterogeneous projects. However, in the
long run simple recognition that these puzzles exist may prove as
useful as (or even more so than) arguments that one account is more
correct than the others. In an earlier paper, we proposed that data
models play two distinct and often competing roles in information
interchange: a descriptive role of modeling a domain and a
prescriptive role of documenting stakeholder decisions for
uniformity of practice (Dubin et al., 2013). We
stated that although descriptive adequacy reduces the need for
arbitrary choices, stipulative definitions are inevitably required
to fill in a model's representational gaps.
That a group of standards as interrelated as the FRBR family
should have such different accounts of aggregation suggests that the
assumptions and use cases contributing to our intuitions for complex
or dependent works may be difficult to reconcile in a single
consistent model. For example, one's intuition that works maintain
their identity across multiple editions poses challenges for a view
that the editorial labor of producing a scholarly edition brings a
distinct new work into existence. These specifications also
illustrate how competing explanations of texts and works
(such as the domain constraints vs. participation constraints
mentioned above) complicate attempts to model relationships across
abstraction levels like the WEMI/Group 1 entities. Although we hope
that further analysis of these problems can contribute models
accommodating many applications, in the medium term it may
be best if prescriptive and stipulative approaches to standards
development are informed by recognition of the descriptive
complexities. If, for now, no single model can capture the full
richness of how we understand bibliographic aggregates, among the
first steps for choosing and documenting a reasonable scope for a standard
are to recognize when our definitions are overloaded, and to craft language
for mitigating that problem with more precise distinctions.
Acknowledgments
The authors thank their colleagues at the University of
Illinois School of Information Sciences and the anonymous Balisage
reviewers for discussions and feedback on earlier versions of this
paper.
[Crofts et al.,, 2008]
Crofts, Nick, Martin Doerr, Tony Gill, Stephen Stead, and
Matthew Stiff. 2008. “Definition of the CIDOC Conceptual Reference
Model.” ICOM/CIDOC Documentation Standards Group. CIDOC CRM Special
Interest Group, 2008.
[Dubin et al., 2013]
Dubin, David, Megan Senseney and Jacob Jett. 2013. “What it is vs. how
we shall: complementary agendas for data models and
architectures.” Presented at Balisage: The Markup Conference 2013,
Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage:
The Markup Conference 2013. Balisage Series on Markup
Technologies, vol. 10.
doi:https://doi.org/10.4242/BalisageVol10.Dubin01.
[IFLA, 1998]
IFLA Study Group on the Functional Requirements for Bibliographic
Records. 1998. Functional Requirements for Bibliographic
Records, Final Report. Berlin, Boston: De Gruyter Saur.
doi:https://doi.org/10.1515/9783110962451.
[Renear and Choi, 2007]
Renear, Allen H., and Yunseon Choi. 2007. “Modeling Our
Understanding, Understanding Our Models - The Case of Inheritance in
FRBR.” Proceedings of the American Society for Information
Science and Technology 43 (1): 1–16.
doi:https://doi.org/10.1002/meet.14504301179.
[Renear et al., 2003]
Renear, Allen, Christopher Phillippe, Pat Lawton, and David Dubin.
2003. “An XML Document Corresponds to Which FRBR Group 1 Entity?” In
Proceedings of Extreme Markup Languages 2003,
edited by B. T Usdin. Montreal, Quebec.
http://hdl.handle.net/2142/11885.
[Riva et al., 2017]
Riva, Pat, Patrick Le Boeuf, and Maja Žumer. 2017. “IFLA Library
Reference Model A Conceptual Model for Bibliographic Information.”
The Hague, Netherlands: International Federation of Library
Associations; Institutions.
[Sahle and Vogeler, 2014]
Sahle, Patrick, and Georg Vogeler. 2014. “Criteria for Reviewing
Scholarly Digital Editions, Version 1.1.” Institut Für
Dokumentologie Und Editorik.
[van Zundert and Boot, 2011]
van Zundert, Joris, and Peter Boot. 2011. “The Digital Edition 2.0
and the Digital Library: Services, Not Resources.”
Digitale Edition Und Forschungsbibliothek (Bibliothek Und
Wissenschaft) 44: 141–52.
Bekiari, Chryssoula, Martin Doerr, Patrick Le Boeuf, and Pat Riva.
2016. “FRBR Object-Oriented Definition and Mapping from FRBRER, FRAD
and FRSAD (Version 2.4).” International working group on FRBR; CIDOC
CRM harmonisation.
http://www.ifla.org/files/assets/cataloguing/FRBRoo/frbroo_v_2.4.pdf.
Crofts, Nick, Martin Doerr, Tony Gill, Stephen Stead, and
Matthew Stiff. 2008. “Definition of the CIDOC Conceptual Reference
Model.” ICOM/CIDOC Documentation Standards Group. CIDOC CRM Special
Interest Group, 2008.
Dubin, David, Megan Senseney and Jacob Jett. 2013. “What it is vs. how
we shall: complementary agendas for data models and
architectures.” Presented at Balisage: The Markup Conference 2013,
Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage:
The Markup Conference 2013. Balisage Series on Markup
Technologies, vol. 10.
doi:https://doi.org/10.4242/BalisageVol10.Dubin01.
Floyd, Ingbert R., and Allen H. Renear. 2008. “What Exactly Is an
Item in the Digital World?” Proceedings of the American
Society for Information Science and Technology 44 (1):
1–7.
doi:https://doi.org/10.1002/meet.1450440374.
IFLA Study Group on the Functional Requirements for Bibliographic
Records. 1998. Functional Requirements for Bibliographic
Records, Final Report. Berlin, Boston: De Gruyter Saur.
doi:https://doi.org/10.1515/9783110962451.
Renear, Allen H, Dave Dubin, Karen Wickett, Simone Sacchi, Richard
Urban, and Yunseon Choi. 2012. “Taking Modeling Seriously.”
Knowledge Organization and Data Modeling in the
Humanities 47.
https://datasymposium.wordpress.com/renear/.
Renear, Allen H., and Yunseon Choi. 2007. “Modeling Our
Understanding, Understanding Our Models - The Case of Inheritance in
FRBR.” Proceedings of the American Society for Information
Science and Technology 43 (1): 1–16.
doi:https://doi.org/10.1002/meet.14504301179.
Renear, Allen H., and David Dubin. 2008a. “FRBR as an
Interdisciplinary High-Middle-Range Theory for Information Science—a
Theoretical Perspective.” In Proceedings of iConference
2008, edited by Angela De Cenzo. Los Angeles.
http://www.ischools.org/conference08/pc/PA7-3_iconf08.doc.
———. 2008b. “Three of the Four FRBR Group 1 Entity Types Are Roles,
Not Types.” Proceedings of the American Society for
Information Science and Technology 44 (1): 1–19.
doi:https://doi.org/10.1002/meet.1450440248.
Renear, Allen, Christopher Phillippe, Pat Lawton, and David Dubin.
2003. “An XML Document Corresponds to Which FRBR Group 1 Entity?” In
Proceedings of Extreme Markup Languages 2003,
edited by B. T Usdin. Montreal, Quebec.
http://hdl.handle.net/2142/11885.
Riva, Pat, Patrick Le Boeuf, and Maja Žumer. 2017. “IFLA Library
Reference Model A Conceptual Model for Bibliographic Information.”
The Hague, Netherlands: International Federation of Library
Associations; Institutions.
van Zundert, Joris, and Peter Boot. 2011. “The Digital Edition 2.0
and the Digital Library: Services, Not Resources.”
Digitale Edition Und Forschungsbibliothek (Bibliothek Und
Wissenschaft) 44: 141–52.
Wickett, Karen, and Allen Renear. 2009. “A First Order Theory of
Bibliographic Objects.” Proceedings of the American
Society for Information Science and Technology 46 (1):
1–8.
doi:https://doi.org/10.1002/meet.2009.1450460378.