How to cite this paper
Beshero-Bondar, Elisa E. “Text Encoding and Processing as a University Writing Intensive Course.” Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27 - 31, 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol. 25 (2020). https://doi.org/10.4242/BalisageVol25.Beshero-Bondar01.
Balisage: The Markup Conference 2020
July 27 - 31, 2020
Balisage Paper: Text Encoding and Processing as a University Writing Intensive Course
Elisa E. Beshero-Bondar
Professor of Digital Humanities
Program Chair of Digital Media, Arts, and Technology
Penn State Erie, the Behrend College
Elisa Beshero-Bondar is a member of the TEI Technical Council, as well as
Professor of Digital Humanities and Program Chair of Digital Media, Arts, and
Technology at Penn State Erie, the Behrend College. Until June 2020, she was a
professor of English Literature and Director of the Center for the Digital Text
at Pitt-Greensburg which has featured markup languages as a foundation of a
curriculum in Digital Studies. Her projects involve her in experimentations with
the TEI, including refining methods for computer-assisted collation of editions
and probing questions of interoperability to reconcile diplomatic and critical
edition encodings, as with the Frankenstein
Variorum. She is the founder and organizer of the Digital Mitford project and
its usually
annual coding school. Her ongoing adventures with markup technologies
are documented on her development site at
newtfire.org.
Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license
Abstract
Can learning markup languages and coding constitute a writing
intensive
experience for university students? Having taught
undergraduate students from a wide range of majors from humanities to the sciences
to develop web-based research projects with the the XML family of languages, the
author proposes that such coursework should fulfill a common state university
requirement of a research-oriented writing intensive
or
"writing-across-the-curriculum" course. Although coding and programming skills are
often represented as a kind of literacy, the idea that learning markup technologies
may constitute an intensive writing experience is less
familiar. This paper calls for teaching markup technologies widely in a
cross-disciplinary context to give students across the curriculum an accessible yet
intensively challenging course that integrates coding and writing to investigate
research questions.
Not just any course that introduces students to markup should be considered
writing intensive. Arguably, a class that involves tagging exercises without project
development and that invites students to write reflectively about the experience is
not engaging in an intensive way with the writing work we
associate with coding in the full experience of developing a project. This paper
argues that a writing intensive course involving the XML family
of languages should require algorithmic problem-solving and decision making,
research and citation of related projects, task management, and documentation to
share the work and help others to build upon it. A course offering such experiences
should be accessible to students from several disciplines, whether to a junior year
English or history major with little to no programming experience, or a junior year
computer science or information technology major with an interest in applied
programming, unused to research questions that drive humanities scholarship. In
presenting its case, this paper discusses the pedagogical theory and practice of
teaching composition and code as well as the concepts of blind interchange and
literate programming important to the XML markup community.
Table of Contents
Coding literacies
and writing modalities
- Writing intensively with the XML family of languages
-
- Starting from a mess
- Developing a schema, cultivating interchange, and writing professionally
- Writing-intensive querying, processing, and transforming
- The writing intensive work of interchange: two university projects on Emily
Dickinson
- Writing about code to pay a project forward
- Coda: Teaching markup languages during a pandemic
Coding literacies
and writing modalities
To any academic who remembers the exhilaration of navigating around an Andrew File
System and posting on it a paperless syllabus
in HTML back in the 1990s,
the application of the phrase teaching with technology
in the university
discourse of the 2010s may have a jarring sound. Scholarly markup geeks from a past
century have lived to see technology
for education be packaged in
learning management systems that securely take care of all of our electronic
interactions with students from posting announcements to grades. Even for 90s markup
geeks, learning our way around the technologies driving Blackboard, Canvas, or Moodle
can be abstruse and fretfully time-consuming, especially when we attempt to apply
their
byzantine integrations with various proprietary software applications like Zoom or
Panopto. Publications supporting educational technology innovations make evident that
educational software applications are universally expected to be integrated and accessed
from within a learning management system (LMS). For example, Hypothes.is, an open-source
and fully open access technology for web annotation is now packaged and funneled into
an
LMS with a proprietary gradebook integration, without which faculty may not be aware
of
its existence or understand how to apply it in their courses. A group of faculty from Seton Hall university found that teaching
with technology, can be the way to bridge perceived differences between disciplines,
especially at the developmental stages of a writing-across-the-curriculum
program
, and in their work to develop new technologically-enhanced courses,
they concentrate on the interactive writing opportunities afforded by faculty’s
encounter with a university-wide investment in a learning management system. We now rely on markup-based management architectures that control university
website delivery and our work in the technology
of higher education is
widely presumed to be about form-filling, at most using a limited tag set from a menu
system. We have been acculturated to expect that composition and digital media classes
like Writing for the Web
are and should be about
writing in Wordpress templates that provide carefully constrained and
little-investigated access to the code layer. We do not ride on the railroad; it
rides upon us,
wrote Henry David Thoreau, and we might as well update that
for today’s university content management systems: we do not direct the content
management system, it directs us.
Although we have little practical choice but to commit to developing course materials
in frameworks that we do not choose for ourselves, those frameworks (e.g. Blackboard,
Canvas, Moodle) are built from XML building blocks which the savvy customer can
occasionally find ways to modify under the hood
where permitted. XML establishes the context of educational technology in our educational
institutions, but our access is carefully gated in ways that make it abstruse and
esoteric to modify the framework. Those of us who teach students to write and build
projects with markup technologies may be few in number, but we are the ones who are
aware of the skills our students rapidly develop and hone over the course of a single
semester—skills to construct data models, search interfaces, and informational graphics
according to their own design. Learning XML with its family of languages helps students
to design their own projects independently from black box
software, and
also helps make them informed consumers to choose the software they want to commit
to
particular tasks, to find the tools most amenable to user alterations under the
hood
and most friendly to transporting the data when the software inevitably
upgrades or is no longer supported. Learning markup technologies is a way of learning
the structures of the web of information, and much the way we learn the formal genre
expectations of an essay or a poem, we learn to adapt formal rules and expectations
to
contain data and metadata in an XML framework driven by our interests in processing,
sharing, accessing the data. We write documents to be read and acted upon by humans,
and
we write markup to be read and acted upon by humans and machines.
The XML family of languages has not become more abstruse or difficult to learn since
the millennium. Indeed, instruction is much easier by the year 2020 with many tutorials
and more powerful processing methods available now. However, the learning of markup technologies and the XML stack has been
cultivated too narrowly to be recognized as a general skillset beneficial to students
across disciplines. Given universities’ deep investment in XML-based ecosystems, can
we
imagine a widely-accessible cross-disciplinary course that inverts the relationship
between code and form-box, that engages students and faculty in organizing ideas with
markup languages that they themselves control? This paper argues that a sustained
application of markup technologies in the contexts of document data modeling and web
project development should serve as a broadly accessible cross-disciplinary
writing-intensive experience for students. This exploration should open doors to
students who would not otherwise consider themselves tech savvy
programmers to learn computational tools that make for more powerful capacities to
write, develop, and interact with the document data and the semantic web.
In awakening students to the technologies that shape composition, classes that teach
digital writing modalities
(or composition that involves multiple
media forms) may provide faculty markup practitioners a well-established and widely
lauded context for teaching markup languages in a way that educational institutions
can
classify as a writing-intensive experience. For writing instructors, digital
writing modalities
or multimodal writing
can motivate
dialogue about the medium as the message and the choice of form following function
and
audience. Addressing recent trends to assign student writing compositions in audio
and
video formats, Laura Giovanelli and Molly Keener observe that Well-designed
pedagogy recognizes multimodal writing’s potential to foster student agency and
ownership as increasingly participatory citizens where literacy means composing in
a
range of print and digital media, genres, and modes, where students are consumers
and ethical creators.
Writing with markup languages can easily be taught in the context of digital
multimodal composition that fosters student agency and ownership of the code base.
Much
as students compose by remixing and ironically applying visual or auditory memes,
they
might apply markup languages to compose by self-consciously reordering scripts of
official documents and develop controlled vocabularies that help communities access
their heritage. For example Jessica Lu’s and Caitlin Pollock’s 2019 HILT course,
Introduction to the Text Encoding Initiative (TEI) for Black Digital Humanities
organized training in markup to support new ways of accessing, reading, sharing, and analyzing texts of
marginalized people
.
Students learning markup technologies need good tools. Semester-long XML-based writing-intensive
courses want access to a good
syntax-checking code editor (such as the oXygen XML editor), writeable web space with
secure FTP access, and possibly an XML database (such as eXist-dB) and a GitHub account,
in place of the more ubiquitous Wordpress account and the Adobe suite as the students’
multimodal writing desk
. While I do not expect every digital composition instructor to
flock to these technologies (though I wish they might), I do anticipate that those
of us
who can teach the XML family of languages can do so in a way that constitutes a
writing-intensive experience valuable to students across the curriculum, and especially
valuable in programs with majors, minors, and certificate programs supporting the
digital humanities.
Perhaps the best-known university context for
interdisciplinary pedagogy in digital technologies is the call for coding across the curriculum
, which
sometimes explores the intersections between writing and coding, seeing both as a
foundation for literacy in the twenty-first century. Learning to write programming
code
engages a student in writing commands, conditional expressions, and descriptive
annotation as well as metacommentary in documentation. Annette Vee finds parallels
to the distinctly imperative aspects of coding in Kenneth Burke’s speech-act theory
of
human expression as a performance. Vee calls for greater
awareness of the intersection of scripting that connects the writing and coding process,
since, according to speech-act theory, we write or speak to move an audience to process
ideas in response to a
scripted delivery, whether that audience is human or machine:
Exploring the nature of language, action, and expression through programming
allows us to think about the relationship between writing and speech differently
and also to consider the ways in which technologies can combine with and foster
human abilities. Computational and textual literacy are not simply parallel
abilities, but intersectional, part of a new and larger version of literacy.
When coding is understood as literacy, its nexus with writing becomes
explicit when we think of it on the same terms as developing language skills. But
the coding literacy movement may not be as profoundly educational as we imagine. A
developer writes in
Slate that
coding literacy
books for children
are not what students need to learn to code when these books simply lead children
to obediently follow scripts
to get a so-called correct answer. Rather, the article points out that the learning
process needed for coding is simply learning any process thoroughly and well. Far
more valuable than
code literacy
books for children are life experiences like repeatedly taking a piece of furniture
apart and putting it back together until we understand how all the pieces connect
with each other, or learning how to optimize the efficient
use of cookie cutters in rolled dough.
Really learning to code is not just about
writing a correct syntax to earn points or get the
correct answer
, but is rather
more experimental, creative, and purposeful, with awareness of many possible paths
to try. That kind of learning does not belong to
computer science departments any more than writing belongs to English departments,
as
David J. Birnbaum and Alison Langmead make explicit:
The first step toward learning to code is to recognize that computer
programming is not computer science; it is more like writing. Everyone can learn
to do it, and can be given the opportunity to learn to do it in ways that are
appropriate for their disciplines. We offer humanists years of practice in
learning to write; let us give them the chance also to learn to code. The second
step is to recognize that learning a programming language is like learning a
foreign language, except that it is much easier.
These are analogies to learning to write or learning a foreign language,
intended to persuade humanists to adopt coding into their disciplines, but the analogies
do not in themselves constitute an argument for coding and programming as involving
an
intensive professional writing experience for students of any discipline.
That argument for the writing-intensiveness of coding can be found outside the humanities.
In the context of programming education, Felienne Hermans and Marlies Aldewereld propose
that more students
would be interested in computer science if they learned to program in a way that followed
models for
learning to write. They suggest that programming instruction would be improved if
it
adapted the way writing instructors model examples of their writing process for students
to break it down into discrete tasks more efficiently. For example, they cite a study
in which
elementary school students comprehended scientific concepts better when they were
assigned
short writing assignments that engaged the topic. They suggest that understanding
a real-world context for a code script and writing observantly and reflectively about
it makes the material more widely accessible and could help a broader group of kids identify as programmers!
Indeed, this idea that writing to document your observations can help to
learn a science enacts a return to an earlier and more integrated approach to education,
before the sciences
had become formally distinct from
humanities
: the late eighteenth century saw the scientific poems of
Erasmus Darwin and calls for poetry that would bring zoology and botany to life and
encourage sympathy with the natural world. We have known that the act of writing enhances learning across
disciplines for a long time now. But futher, as Hermans and Alderweld point out,
students can learn programming more easily if they are taught in the mode of writing
instruction, with modeled examples and activities that hone observation skills. What
might this mean for learning to work with XML? It could involve students reviewing
markup, schemas, interfaces, and visualizations from established XML projects as part
of
their course experience. For
example, students can be given assignments to explore a project site like the Map
of Early Modern London or the Shelley-Godwin Archive to look at the code
under the hood and describe how they understand the component pieces of the project
to fit
together. In exploring projects like these, students in the author’s classes have
often found a basis for
understanding what kinds of research questions and web reading and research tools
they might design with the right resources, and students often build on what they
learn in their own projects to understand how they can organize information about
people, places, contexts, language patterns, revision history, and more.
It is probably no coincidence that Donald Knuth modeled his concept of literate
programming
on code for document formatting in the 1980s. In this
context the programming of a machine to format electronic documents unites fundamentally
with the action and reproduction of writing. Knuth’s now-familiar concept seems simple
in hindsight: Instead of imagining that our main task is to instruct a computer
what to do, let us concentrate rather on explaining to human beings what we want a
computer to do.
The process of literate programming heightens an old association of verbal
text with fabric textile, applying Knuth’s concepts of weave and
tangle:
One line of processing is called weaving the
web; it produces a document that describes the program clearly and that
facilitates program maintenance. The other line of processing is called
tangling the web; it produces a
machine-executable program. The program and its documentation are both generated
from the same source, so they are consistent with each other.
Literate programming is part of the XML specifications and became
paradigmatic for the the XML family of languages by the turn of the new millennium.
In
2002 Norm Walsh modeled its application to the DocBook XSLT stylesheets by applying
namespaces to permit the tangling of actionable code with documentation.
. Responding to Walsh’s work, Eric van der Vlist prepared a clear and
thoroughly-documented explanation of literate programming applied to XML with embedded
Relax NG code,
readily transformable into multiple formats to produce schema validation checking
as well as human-readable
web-ready documentation.
And over the course of its development from the early 1990s onward, the
One-Document-Does-It-All (ODD) system has modeled literate programming for the purpose
of compiling and delivering the Guidelines of the Text Encoding Initiative (TEI) as
a
combination of documentation and processing code instantiating the schema rules of
the
community-maintained XML vocabulary.
With its early and continuing investment in literate programming, from its
specifications to its schema modeling, the XML family of languages should be understood
as thoroughly writerly.
Student coders can gain an introductory experience in literate programming in
designing ODD schemas with descriptive glosses and explanation encoded together with
schema rules in project. Whether or not we understand the drafting of an ODD
customization to constitute an experience in embedded programming, we can recognize
its
value as a writing intensive experience, because it involves students writing the
rules for
a project to work systematically and designing its data for consistency and precision,
and because it provides for description and explanation that can make a project sharable
with
others. Markup, documentation, and programming work in the development of XML-based
projects should be promoted within educational institutions as a distinctly intensive
experience of writing applied to design resources soundly and well.
Writing intensively with the XML family of languages
The aspect of XML encoding that makes it so problematic or impossible for
interoperational processing (even within a shared vocabulary like the TEI) is the
semantic naming of tags, what Desmond Schmidt has called its
illocutionary force
. This trouble for universalized processing is also a feature of XML that
makes it writerly and scholarly in nature, and gives it power as a research tool,
a
power that is enhanced when new coders learn not just how to tag but also how to manage
and process their tagging as a controlled system. Confronting the challenge of
processing the markup and sharing it with others outside one’s discourse community
(oneself, one’s team, or one’s semester class) is what we recognize as
intensive about a course in writing with markup. An
introductory writing experience with markup languages may familiarize students with
the
data structure and well-formedness and some of the issues of transformation and sharing
the data, but intensification of the writing challenge begins with confronting the
management issues of a project: writing a customization and designing schema validation
rules. Still more intensification is applied when students learn how they can navigate
and process the markup data.
This issue with the writerly nature of XML coding and processing is not well
appreciated in the larger context of digital humanities work or by professional
developers eager to facilitate the publication process and separate it from the markup
practice. Coders can produce complicated, even intricate structures with XML and TEI
and
give impatient developers headaches, but we would do well to encounter those structures
as building materials the way writers do and give our students the tools to work with
them. Rather than designing projects primarily to suit the needs of developers or
content management systems, markup practitioners can learn the writing intensive way
to
control the developing tools as their writing instruments.
Taught how to manage and process their own markup, students will design their own
markup more efficiently and systematically than they do when shown only how to tag
texts, or tag according to the rules imposed by a content management system that magically
shows their work. When ruled by an external content management system, students are
given an illusion that code is correct when looks good on the screen and conforms
to expectations. Tagging according to the rules of an externally imposed publication
software is not a writing-intensive experience, because it is stripped of creative
experimentation, rigorous decision making, and intellectual challenge.
A course in text encoding could take a concentration within a
discipline like English Literature, as Kate Singer’s class at Mount Holyoke did, to
concentrate collectively on constructing a digital edition of a collection of poems.
Singer found that the broad-based ‘humanities language’ of the TEI enabled
students to question, historicize, and reconsider the poetic terminology we use to
describe poems
. She and her students found that the controlled vocabulary of
the TEI gave considerable latitude to a community of scholars to rethink or apply
old conventional literary terms as they
saw fit. Because the TEI elements for poetry express simple structural forms rather
than specialized terms (line-groups rather than stanzas, for example), Singer’s students
recognized that it was up to the encoder to apply specialized, historically specific
poetic terms in their customized application of the TEI. In the context of a course,
that work of deciding the appropriate system of terms for oneself can be excitingly
experimental. The
pedagogical benefit of engaging students in markup and its applications was to foster
decision-making, documentation, and design thinking, as Singer found her students
eager
to take on design decisions for customizing their own interface for their edition.
Not
only did they benefit by gaining tech skills, but they also became more observant
readers of poetry as well as the interfaces and infrastructures of larger-scale digital
scholarly editions they encountered. This kind of interpretive markup may,
finally, give us some inkling of how TEI might be used as an analytical tool for
smaller-scale, case-based projects perfect for undergraduates as they learn to parse
and categorize their own textual situations.
. Courses like these prioritize the intellectual engagement of a class with
the document objects they are investigating, and here the markup is clearly a research
and investigation tool. Fitting the students’ markup to an externally imposed uniform
publication framework would have made the work less messy and easier to publish, but
would have stifled the students' experimentation and removed them from the intellectual
decision-making process of doing their own project development. Even unfinished work
in the course of a short semester is a stepping stone to renewed engagement in a process
of structured work with document data modeling, querying, checking, testing, and transforming
to share their work. Such work can lead to impressive senior thesis projects.
The XML family of languages were designed not only to be widely accessible, but also
to be a vocabulary that the writer controls, consults, remixes, and transforms. It
does
not take very long to acquaint new coders to the rules of how to tag a document, or
how
to turn a plain text
document into an XML document, though often people
experience a little frustration with figuring out what they can
do with attributes. The first week or two of a class that involves markup methods
can
orient people to the basic rules of well-formedness, but that almost immediately introduces
an
engaging intellectual experience when we ask students to develop their own hierarchies
to organize what they are reading, when we invite our students to try to recognize
what is implicit and find ways to use elements, attributes, or comments to make that
explicit.
Starting from a mess
Just as we think of free-writing as a valuable exercise to start a first preliminary
draft in a composition class, in teaching markup, a certain amount of mess and
unreliability is okay as we are figuring out what we want to prioritize.
Often new coders introduce far more differently named elements than they really need
in ways that would be baffling to keep track of in a fully developed project. To understand
how to code helpfully and meaningfully, a student needs to confront the
problem of sharing and reproducibility. Thinking about how to share a decision process
and a set of rules for an XML project with others is cultivating the awareness of
audience that is
emphasized in rhetoric and composition classes as a means to craft better sentences,
to
trim out verbosity, to outline a thesis project.
The road to improvement can be based on the same principles of understanding how
to convey relationships, and how to prioritize a main idea with a subordinate
clause. A student might write a paragraph like the following in a draft that could
benefit from some rounds of revision:
Historically, women have had a tough time when it comes to writing novels
and combatting prejudices and sexism. Many female authors have had to
publish their novels with a male pseudonym or as an anonymous author. When
writing Frankenstein, Mary shelley wrote it
among her friends to see who could write the best horror story and she did
not tag it with her name. The famous story was left anonymous so her friends
wouldn't have a prejudice view when reading it. The story ended up being her
most famous book and she was a female writer, who wrote a horror story about
a male creator and a frightening, male creature.
Instructors who teach writing courses understand the complexities of
advising a student on how to revise writing like this. We often comment that the
student’s ideas are
good
or
interesting
, but we need a
stronger sense of how the ideas connect, and every sentence needs to support a
single central idea. In this case we think the central idea is about how Mary Shelley,
like other women authors in English history, opted to conceal a female identity in
publishing her work.
Beyond simple misspellings and missed capitalization, we can identify conceptual
problems of subordination, especially evident in the last sentence where the fame
of
the book and the female identity of the writer are placed at the same
level as the main idea and represented out of chronological sequence. The ordering
of ideas and
the decision of where to place subordinate clauses is a problem we can associate with
organization
or hierarchy. Rewriting such a paragraph sometimes involves reorganizing and
condensing, pulling together apparently disconnected parts in the first draft. For
example, here is my own attempt to rewrite the student’s paragraph:
Historically, many women authors faced a sexist and prejudiced publishing
industry and opted to conceal their identities either by publishing with a
male pseudonym or anonymously, as Mary Shelley did with the anonymous first
publication of Frankenstein in 1818. The
novel, a horror story about a male creator and a frightening male creature,
became her most famous work, and eventually was published with her name on
the title page.
My suggested rewriting of this seems a little unsatisfactory because I
sense I have removed something interesting that the student might have developed,
a
loose end
from the draft, something about the idea of a female author writing about a male
scientist creator and a male creature, whose violent conflict drives this book.
Would the student have wanted to explore that issue of a woman author investigating
male
conflict? Is that perhaps the topic of another essay entirely? Writing and rewriting
can impose order but also cut out possible avenues of development.
I can find similar examples that demonstrate issues with subordination in
students’ first XML encoding efforts. In my roughly seven consecutive years of
teaching undergraduates to code with the XML family of languages, I notice a
recurring pattern in the first three of weeks of a semester: that some students have
difficulties with conceptualizing dependencies, much like the issues we identified
in the student paragraph about Mary Shelley. Students just starting to code often
prepare shallow hierarchies. For example, instead of bundling list items together
into a
wrapped cluster, they make a very flat tree where every line is its own entry, a child
of the root element.
These students benefit from seeing examples of nested markup, and also from understanding
something about how the markup may be processed to work out how attributes can refine
the markup by helping to categorize, describe, point out related resources, or clarify
something unclear in the source document. To help students discover many different
ways they could apply markup, I have
found that inviting my students to write their first markup on a recipe, one that
contains an interesting variety of ingredients, measurement units, and activities,
provides a very clear and easily recognizable sense of structure with lots of
categories of information. I ask students to envision a scenario for the encoding
that tries to create a system for filing documents:
First, read this recipe for homemade bread, and pretend you are filing it
with hundreds of other recipes that you need to fit a set purpose, such as
running a restaurant, in which you need to keep track of kinds and
quantities of ingredients required. XML is written to store information, and
when we apply it to a situation with numbers and units, like with coding
recipes, the code we write can help make computerized calculations, and help
optimize searching across a collection for particular kinds of ingredients.
Your code might be designed to help categorize ingredients by what part of
the grocery store they can be found in. The challenge of the assignment is
to write code that helps categorize ingredients, mark necessary equipment,
and stages for processing, but the system you develop is up to you.
It is fascinating to see the variety of encodings students submit for
this assignment, with no two being much alike. With a few exceptions the students
usually are able to submit well-formed XML by day 2 of the course, but they
sometimes don't quite understand the concept of nesting structures or demonstrating
relationships, as for example this tagging of the ingredients list of a sourdough
recipe.
<recipe type="allAges" name="sourdoughIngredients">
<measurement>1 1/4 cups</measurement> (<amount>160 grams</amount>)
<ingredient>white bread flour</ingredient>, plus more for dusting
<measurement>1/4 cup</measurement> (<amount>38 grams</amount>)
<ingredient>stone-ground whole-wheat flour</ingredient>
<measurement>1/4 cup</measurement> (<amount>32 grams</amount>)
<ingredient>stone-ground whole rye flour</ingredient>
<measurement>1/2 teaspoon</measurement>
<ingredient>instant yeast or bread machine yeast</ingredient>
<measurement>1 teaspoon</measurement>
<ingredient>table salt</ingredient>
<measurement>1/4 cup</measurement> (<amount>55 grams</amount>)
<ingredient>dry fermented cider</ingredient>
<measurement>1/2 cup</measurement> (<amount>120 grams</amount>)
<ingredient>lukewarm water</ingredient> (<temperature>80 degrees</temperature>),
plus an optional 1 tablespoon recipe></recipe>
<!--jgb: You may substitute a Pilserner beer for the dry fermented cider. -->
Here we see a common problem for new learners of XML: thinking that white space is
sufficient to relate like to like, rather than recognizing the need to position
wrapper elements. The student has not quite understood that the
<measurement>
and
<amount>
elements are
not really associated together. Nor has the student tried to apply attributes, yet,
but they have ventured an XML comment about ingredient substitution. The code is
promising for its regularity and consistency, but lacks an understanding of how to
work with the XML tree hierarchy. Another student seems to have a stronger grasp on
the assignment, but even here we can find some issues that relate to writing
problems of redundancy or overreliance on a particular word, here the attribute
@type
being overused:
<recipe type="bread" name="country loaf (pain de campagne)">
<measurement type="cup">1 1/4 cups (<measurement type="gram">160 grams</measurement>)</measurement>
<!-- sd: is this a good/okay way to do measurement types? it seems weird but i don't really know -->
<ingredient type="dry">white bread flour</ingredient>, plus more for dusting
<measurement type="cup">1/4 cup (<measurement type="gram">38 grams</measurement>)</measurement>
<ingredient type="dry">stone-ground whole-wheat flour</ingredient>
<measurement type="cup">1/4 cup (<measurement type="gram">32 grams</measurement>)</measurement>
<ingredient type="dry">stone-ground whole rye flour</ingredient>
<measurement type="tsp">1/2 teaspoon</measurement>
<ingredient type="dry">instant yeast or bread machine yeast</ingredient>
<measurement type="tsp">1 teaspoon</measurement>
<ingredient type="dry">table salt</ingredient>
<measurement type="cup">1/4 cup (<measurement type="gram">55 grams</measurement>)</measurement>
<ingredient type="wet">dry fermented cider</ingredient> (may substitute Pilsener beer; see
headnote)
<measurement type="cup">1/2 cup (<measurement type="gram">120 grams</measurement>)</measurement>
<ingredient type="wet">lukewarm water</ingredient> (<temp>80 degrees</temp>),
plus an <measurement type="tsp">optional 1 tablespoon </measurement>
DIRECTIONS
<step n="1"><equipment type="utensil">Whisk</equipment> together the flours, yeast and salt in a
<equipment type="bakeware">mixing bowl</equipment></step>.
<step n="2">Combine the cider and water in a <equipment type="bakeware">liquid measuring cup</equipment></step>.
<step n="3">Add the liquid to the flour mixture; use a <equipment type="utensil">spatula</equipment>
or <equipment type="utensil">bench scraper</equipment> or your hand moistened with water to blend them for about a minute
</step>. The dough should be shaggy yet cohesive.
<step n="4">Cover the bowl with a <equipment type="cloth">towel</equipment>;
let the dough rest for <time>20 minutes</time></step>.
<step n="5">Moisten your kneading hand. If the dough seems stiff,
add the optional tablespoon of water.</step>
<step n="6">Stretch one edge of the dough (still in the bowl), then press it into the center of
the bowl. Repeat this about a dozen times, moving clockwise to catch all sides of the dough</step>.
(This should take <time>1 or 2 minutes</time>.)
<step n="7">Turn the dough over so the seams are on the bottom</step>.
<step n="8">Cover and let rest for <time>20 minutes</time></step>.
<step n="9">Repeat the clockwise stretching and folding two more
times, with <time>20-minute</time> rests after each</step>.
<step n="10">Cover and refrigerate <time>at least 8 hours and up to 24 hours</time></step>.
The dough should have doubled. If it hasn't, leave it on the counter until it does.
<step n="11">Lightly flour a work surface</step>.
<step n="12">Use a <equipment type="cloth">pastry cloth</equipment> or clean
<equipment type="cloth">dish towel</equipment> to line a round <equipment
type="bakeware">colander</equipment>. Dust the cloth with flour</step>.
<step n="13">Transfer the dough to the floured work surface. Fold the edges toward the center to create
a round shape, turning it over so the seams are on the bottom</step>.
<step n="14">Let it rest for <time>5 minutes</time>, then transfer to the colander, seam side up</step>.
<step n="15">Cover with a <equipment type="cloth">towel</equipment> and let the dough rise for
<time>1 1/2 hours</time>.</step>
<step n="16"><time>Thirty minutes before baking</time>,
place a <equipment type="bakeware">cast-iron Dutch oven (lid on)</equipment>
or <equipment type="bakeware">enameled cast-iron pot with a lid (on)</equipment>
in the oven; preheat to <temp>475 degrees</temp></step>.
<step n="17">Carefully remove the hot pot from the oven.</step>
<step n="18">Turn the dough out onto the counter so the seams are on the bottom</step>.
<step n="19">Use <equipment type="utensil">kitchen scissors</equipment>
to make 8 snips on the top of the dough in an evenly spaced spoke pattern,
each about 1/4-inch deep</step>.
<step n="20">Lift the dough and carefully drop it into the hot pot.
Immediately cover with the hot lid</step>.
<step n="21">Bake for <time>30 minutes</time>, then reduce the heat to
<temp>450 degrees</temp></step>.
<step n="22">Uncover and bake for <time>8 to 10 minutes</time> or
until the crust is dark brown</step>.
Try to minimize the amount of time the oven door is open. The bread is done when its internal temperature
registers <temp>205 degrees</temp> on an <equipment type="utensil">instant-read thermometer</equipment>
and the loaf sounds hollow when knocked on the underside.
<step n="23">Transfer the loaf to a <equipment type="bakeware">wire rack</equipment>
to cool for at least <time>1 hour</time> before cutting</step>.
</recipe>
The nested
<measurement>
elements inside
<measurement>
elements show a problem the student was trying to solve with hierarchy,
and make a good opportunity for the instructor to discuss with the student how to
deal with all the information given about
equivalent units (grams to cups). We could suggest the student use just one measurement
element and try encoding the equivalency information in attribute
values, for example. And we need to think about the representation of fractions: If
the code is going to
be processed by a computer to, say triple this recipe, how might we write markup to
represent the numerical quantities and conversion factors? What is remarkable here
for an early XML assignment is the student’s decision to mark types of equipment
within the steps. The student could reconsider the variety of attributes, but the
effort to track categories of measurement as well as categories of ingredient (wet
vs. dry) is admirable on a first experience with angle brackets.
Even when students quickly learn how to apply hierarchies, they are, of course,
prone to inconsistencies before they learn to write schema validation
code, as in the following example:
<step n="9"><process type="action">Lightly flour</process> a
<item type="equimpent">work surface.</item>
Use a <item type="equipment">pastry cloth</item> or
<item type="equipment">clean dish towel</item> to
<process type="action">line a
<item type="equipment"><adj type="equipment">round</adj>
colander</item></process>.</step>
<step n="10"><process type="action">Dust</process> the
<item type="equipment">cloth</item> with
<item type="ingredient">flour</item>.</step>
<step n="11"><process type="action">Transfer</process> the
<item type="ingredient">dough</item> to the
<adj type="equipment">floured</adj>
<item type="equipment">work surface</item>.</step>
This student reveled in coming up with complex hierarchies in his first week of
coding, but a typo means that his attribute values are inconsistently marked. Such
imprecision might pass unnoticed as a relatively harmless error in a student essay,
but here it becomes an opportunity to introduce the power of schema writing to
students, to write their own spell checkers for their attribute values and control
which elements and attributes are permitted to appear at each level of the
hierarchy.
In the first week of my class, students usually move from encoding a recipe to
marking up a poem or a piece of historical correspondence. Encoding a
different genre of document can lead students to recognize different kinds of data
and observations about the formal dimensions and organization of patterns. They also
often
take more of an interest in referencing different kinds of information like names,
dates,
people, and places, as well as images, motifs, rhyme. Quite frequently student
beginners will take the text content of a document and repeat it in an attribute
value, as for example wrapping code around a name as given in a text, and using that
name in an attribute value on the element, until they receive some suggestions that
they might want to use the attribute as a key for a standard identifier whenever
this individual is mentioned by their various names. Students who have difficulty
constructing dependent clauses may find the preparation of an informative,
non-redundant hierarchy just as challenging as their composition courses, and while
their first efforts are observably messy, they can be discussed in terms of how to
simplify if one were to prepare a large collection and wanted to work systematically
with a particularly interesting and tractable kind of data.
Developing a schema, cultivating interchange, and writing professionally
Students often improve their markup dramatically when they learn to write Relax NG
schema code that creates rules for encoding. This may be a first data-modeling
experience for students in a general education context, when they are called upon
to
think in a meta
or higher order reflective way about formalizing their
code, and to make it possible for others to understand and apply it. Learning to
write schema code also leads to writing
comments to explain decisions and document the code. The following example pairs a
short coded document with a student’s schema, and a conversation with another student
who was reviewing the code and offering advice. First, here is the XML the student
prepared with some good-natured snark from my own assignment instructions:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="01-22_SCHEMA_rngEx02.rnc" type="application/relax-ng-compact-syntax"?>
<root>
<intro> Make sure you do the following: </intro>
<step num="1"> 1) <act type="spec">Save</act> your <obj type="comp">Relax NG file</obj>
<desc type="rel">with the <obj type="comp">.rnc extension</obj> at the <obj type="conc"
>end</obj></desc> and <act type="unspec">work with it</act> in the <desc type="adj"
>same</desc>
<obj type="comp">file directory</obj>
<desc type="rel">with your <obj type="comp">.xml file</obj>.</desc>
</step>
<step num="2"> 2) <act type="spec">Associate</act> your <obj type="comp">.rnc schema</obj>
<desc type="rel">with your <obj type="comp">.xml file.</obj></desc>
<exp>(You are <desc type="rel">finished</desc> with this <obj type="conc">exercise</obj> if
your <obj type="comp">XML</obj> is <desc type="rel"><act type="spec">associated</act>
with your schema</desc> and <desc type="adj">both</desc>
<obj type="comp">files</obj> have <act type="spec">come out</act>
<desc type="adj">"green"</desc> in <obj type="comp">oXygen</obj>.)</exp>
</step>
<step num="3"> 3) <act type="spec">Upload</act>
<desc type="adj">BOTH</desc>
<obj type="comp">files</obj> here. We <act type="unspec">need to see</act> your <desc
type="rel"><obj type="comp">.xml file</obj> and your <obj type="comp">.rnc
file</obj>.</desc>
</step>
</root>
<!-- bb_1/22/20: I wanted to keep it simple for once, so I literally used the assignment as a text. Sue me.-->
Here is the Relax NG schema in compact syntax, where a new pattern of writing extensive
commentary is emerging in the student (whose initials are bb). The peer-reviewing
student is amp, who had learned to write schemas and design projects in a previous
semester:
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
start = root
#bb_1/22/20: root is the root element
root = element root{intro, step+}
#bb_1/22/20: step is one of our 'highest' content objects on the hierarchy
step = element step{num, (exp|act|obj|desc|text)+}
#amp: This would be better as mixed content! (We've already touched on this in class, but I'll leave a note here as well) So, it would look like this: step = element step{num, mixed{exp | act | obj | desc)+}}
#bb_1/22/20: intro is a misc element
intro = element intro{text}
#bb_1/22/20: other important places in the hierarchy
exp = element exp{(type|act|obj|desc|text)*} #exp = explanation
desc = element desc{(type|act|obj|text)*} #desc = description
#amp: This is the same as before: these two would be better written with mixed content! So: exp = element exp{mixed{(type | act | obj | desc)*}}
#desc = element desc{mixed{(type | act | obj)*}}
act = element act{type, text} #act = action
obj = element obj{type, text} #obj = object
#bb_1/22/20: attributes
num = attribute num{xsd:int}
type = attribute type{text} #types:
#comp = computer
#conc = concept
#rel = relate
#adj = adjective
#spec = specific
#unspec = unspecific
#amp: My first comment is about comments! I'm glad to see you using comments
not only to leave notes to the instructors, but also to give information about decisions
that you're making while coding.
This is so helpful when it comes to projects that you're going to share publicly,
because it allows others to see these choices and better understand your code.
So, great work with that!
#amp: Another thing that I like with your schema is the organization!
You have clearly designated sections for your elements and attributes that make your code really clear.
#amp: For the future, I would try working more with the other repetition indicators,
as well as with datatypes! In this specific assignment, you could code for the date of the
assignment, and use xsd:date or xsd:YearMonth (or both!) to give even more metadata in
your xml. Overall, this is a well-organized, simple schema that you'll be able to develop
into a more complex schema if you were to continue with it (and with future projects!)
The beginner student, bb, experimenting with tidy, highly legible commentary on his
code, including mapping out attribute values he might use to replace the ambiguous
text. But we also see students beginning to hear from each other about project work
on which they will eventually collaborate. Students are getting used to seeing a code
file as a site of productive conversation and ongoing revision.
When students learn only tagging and do not learn how to write schema validation or
process their own code
to visualize and analyze the data they have marked, they remain unaware of what the
markup makes possible or of the problems imposed by
imprecision and inconsistency. They may be tagging correctly, but not in a way that
communicates meaningfully or reliably. As they first become aware of the human
unreliabilities in applying markup, they may come away with a limited idea that the
tree structures we create are necessarily subjective and arbitrary. They would be
reinforced in this thinking by old arguments from those who find embedded markup a
source of intrusive confusion. For example, Johanna Drucker has asserted that
embedded markup confuses levels of discourse: Putting content markers into
the plane of discourse (a tag that identifies the semantic value of a text
relies on reference, even though it is put directly into the character string)
as if they are marking the plane of reference is a
flawed practice. Markup, in its very basis, embodies a contradiction. It
collapses two distinct orders of linguistic operation in a confused and messy
way.
Certainly messiness and confusion can be applied to an ill-conceived data
model as well as it can to a piece of disorganized and unrevised writing. But
embedded encoding itself is only a mess
to the extent that it defies
comprehension, navigation, and processing by an informed reader of markup who is a
member of a community of practice. Against Drucker’s dismissal of markup as mess we
should counter the frequent practice in the markup community of blind
interchange, as Syd Bauman defined it in 2011:
you want my data; you go to my website or load my CD and download or copy
both the data of interest and any associated files (e.g., documentation or
specifications like a TEI ODD, a METS profile, or the Balisage tag library);
based on your knowledge of my data that comes from either the documents
themselves or from the associated files (or both), you either change my data
to suit your system or change your system to suit my data as needed. Human
intervention, but not direct communication, is required.
Far from posing a
mess
, the actions of documenting
descriptive markup make it sustainable and sharable when the encoder is removed in
time and space and technological delivery system from those who encounter the code.
Blind interchange is the benefit of the tangle and weave of literate programming,
and it is reinforced by communities that encounter the code and interact with it.
Understood in this way, the
mess
of markup is indistinguishable from
the
mess
of writing; both may be ordered with care and explanation
for an audience, first an audience of one’s instructor and peers, and then an
audience that one does not necessarily meet in person.
Writing-intensive querying, processing, and transforming
Teaching students how to query their XML code and how to transform it for publication
encourages new systems of thinking and
gives writers access to their own means of production. The learning required takes
weeks, not years, and is suitably incorporated in a university semester without needing
specialized computing prerequisites. Not until we teach students how to customize,
query, or transform their markup can
they engage with it in a way that educational institutions might characterize as
writing intensive. By way of reference, the Pennsylvania State
University’s cross-curricular definition of a writing intensive course requires that
writing be used to help students learn course content
, as well as
ways of writing in the discipline
, and that it have formal
expectations delivered in structured assignments. These expectations are familiar
to us from
university composition courses in the requirements for research papers and thesis
documents with structured sections, appendices, but they can also be communicated
in the
context of encoding XML projects according to a carefuly developed schema and a
well-documented codebase. Most important and pertinent to XML project development
is
the expectation that writing-intensive courses engage in significant rounds of revision:
Markup and coding with the XML family of languages becomes an
intensive writing experience when students return to it and
revise it, to better document decisions for the project team, to better document
decisions for readers outside the project, to improve the precision of the data, and
to
simplify the categories to make the code more coherent in categorizing and processing
information. When students learn to inspect the code and share its customization in
project teams, the markup becomes subject to intensive review and systematic revision
to
make it sharable rather than subjective. The more this is done, and the more experience
that students and scholars gain, the more prepared they are to share in wider
conversations. For example, markup practitioners in the classicist community share
applications of the
EpiDoc guidelines, and a medievalist graduate student prepares to
speak at the annual conference of the Text Encoding Initiative or at the annual
Kalamazoo International Congress on Medieval Studies. As with peer-reviewed scholarship
in any discipline, the XML code-base is subject optimally to heated debate and decisions
are made befitting communities of practice.
Because the XML family of languages is amenable to rapid learning, a student can
become a stack developer
easily in the course of a semester, as
Clifford Anderson observed of the course he taught students in XQuery: XQuery
makes it possible for students to become productive without having to learn as
many computer science or software engineering concepts. A simple four or five
line FLWOR expression can easily demonstrate the power of XQuery and provide a
basis for students' tinkering and exploration.
As XQuery developers know, the simple for
, let
, where
, order by
and return
statements that make a FLWOR
are good ways to introduce students to
programming concepts quickly and give them powers to construct all kinds of new data
structures from an XML document, whether HTML, SVG, or structured text formats like
CSVs to be imported into spreadsheets, or JSON formats for structured maps and
arrays. Students work at the intersections of different data formats while exploring
what they can build out of XML trees. Bringing students to work at these
intersections leads code-writers to make challenging decisions
about streamlining the code-base, making it more legible, tractable, XPath-able.
Learning XPath and writing XSLT or XQuery to process XML moves a coder from
following rules obediently so that others will someday process the data, to becoming
an active intellectual investigator who can wield markup as a skilled professional
writer.
The writing intensive work of interchange: two university projects on Emily
Dickinson
University students can and do create projects built to last
, that is,
launched in a way that others can build on and continue based on the documentation
they
provide. As a case in point, let us consider two markup projects, decades apart,
addressing the poetry of Emily Dickinson.
The first is a proof-of-concept proposal for a PhD thesis prepared by then graduate
student at University of Virginia, Michele Ierardi, and as of 2020, it is now only
accessible on the web from the Wayback Machine: Translating Emily: Digitally Re-Presenting Fascicle 16. The project applied 1990s HTML and an early form of
JavaScript to render Emily Dickinson’s handwritten variants on her own poems in a
way
that did not demote those variants to a footnote, but gave them equal space using
the
capacities of hypertext. On reading one of the poems on the website, the reader would
encounter Dickinson’s own different versions of a line in slowly flashing text. The
editor’s hope in designing her interface was to make readers more aware of Dickinson’s
open-endedness, in not cancelling out multiple versions of a line so that all
possibilities could coexist. The site was a proof of concept that did not materialize
into a PhD thesis project, but it persisted and influenced my teaching of American
Literature courses when I wanted to share Dickinson’s unusual writing process with
my
students and give them an experience of an interesting and accessible (if slightly
hypnotizing) digital edition interface. The JavaScript on the site ceased to function
around 2010, and soon thereafter I began seeking a way to continue accessing this
cleanly and simply encoded project in a way that would still benefit my students.
In 2015, soon after I had begun teaching courses in coding and the XML stack at
Pitt-Greensburg, I was fortunate to find a group of students interested in poetry
and
fascinated by the possibilities of restoring a digital archive. The students and I
contacted Michele Ierardi and obtained her permission to reconstruct her site. This
involved converting the code from HTML with unmatched tags to TEI P5, as well as adding
new research. My students investigated additional versions of the poems and added
more data to include TEI critical apparatus markup that would encode
Dickinson’s variants as well as other printed versions of the same poems in a series
of
editions published after Dickinson’s death. Their new goal was to build on Ierardi’s
work and create a readable interface for comparing the multiple versions of Dickinson’s
poems, and to begin expanding that work to include other fascicles, or bundles of
poems that Dickinson created beyond the original collection Ierardi presented. My
students’ site is the
second Emily Dickinson project,
strongly bound to the first. These students have since graduated from university, but they continue to
work on this project, adding a new fascicle and tinkering with the interface, and
I
understand from the ongoing project director, Nicole Lottig, that she intends to
continue coding and developing the site to display all of Dickinson’s fascicles as
a
long-term project. An excellent sampling of the project’s interface for reading a
Dickinson poem and seeing its variant texts and images together is its display of
Poem 1605, which
shows how most of the early print editions cut out Dickinson’s entire last stanza
and
typically ignored her variants.
The editors share their TEI code
from the interface, which applies critical apparatus markup with parallel segmentation in
the TEI. The students opted to represent all witnesses even when they were silent,
as demonstrated in their coding decisions for the last stanza of poem 1605. They also
faced a significant challenge to encode Dickinson’s uncanceled variant passages in
her manuscript, and found a way to do this by applying the @type
attribute with values of "var0" and "var1":
<lg>
<l n="17"><app>
<rdg wit="#df16 #fh">And then a Plank in Reason, broke,</rdg>
<rdg wit="#ce #poems3"></rdg>
</app></l>
<l n="18"><app>
<rdg wit="#df16 #fh">And I dropped down, and down—</rdg>
<rdg wit="#ce #poems3"></rdg>
</app></l>
<l n="19"><app>
<rdg wit="#df16 #fh" type="var0">And hit a World, at every plunge,</rdg>
<rdg wit="#df16" type="var1">And hit a World, at every Crash—</rdg>
<rdg wit="#ce #poems3"></rdg>
</app></l>
<l n="20"><app>
<rdg wit="#df16 #fh" type="var0">And Finished knowing—then—</rdg>
<rdg wit="#df16" type="var1">And Got through—knowing—then—</rdg>
<rdg wit="#ce #poems3"></rdg>
</app></l>
</lg>
In their transformation to HTML (linked above), the students applied JavaScript and
CSS to the variant data coded in these attributes to produce a dynamic and distinctive
reading interface permitting the reader a ready view of comparison data.
In preparing this project, the students had the benefit of an earlier and
simpler markup model, as well as a sense of purpose in giving a remarkably interesting
project a new lease on life. They needed to study the TEI P5 guidelines and essentially
took a crash course
as undergraduates from a range of majors in English,
Creative Writing, and Information Sciences in manuscript encoding and textual
scholarship in the course of a semester. They learned to transform the code and
designed the interface in the fall of 2015, and then redesigned and improved the
interface while investigating a new research question in a following spring 2016 term.
Over the course of one year, in the context of university coursework, the students
not only designed a new reading interface but also explored a
serious research question of how these editions compare to one another.
Writing XSLT and
XQuery on the project, these students produced SVG visualizations of Comparative Dash
Reduction
(measuring which editions most frequently normalized
Dickinson’s dash punctuation into commas, semicolons, or periods), and a network
analysis to investigate which editions share the most variants in common.
This was created with XQuery to pull and calculate data from the TEI critical apparatus
markup they had modelled for the poems.
The network analysis explored how frequently published versions aligned with Dickinson’s
writing in the manuscript versions, based on generating counts with XPath of how frequently
a particular version (coded as a reading witness) aligned with other versions. The
students wrote XQuery code to extract this data into simple TSV
(tab-separated values file) and plotted in Cytoscape network analysis software. Their
programming work with XQuery depended on their care in designing the rules for the
project schema and frequently correcting the markup.
The student website documents its methodology extensively and I now use it as a model
for my current students to prepare documentation that features code and
coding decisions. Along the way of producing it and sustaining their
codebase, the project team cultivated multiple GitHub repositories
with issue tracking as they turned to new sources of data and worked to
combine Fascicles 6 and 16 into a new site. Their writing intensive experience involved
countless messages to each other to fix broken code, make a visualization work, update
the website, refine the CSS and JavaScript. The professional experience with web
development took them far beyond what would be possible in a course in tagging and
markup alone, or a course in web development within a content management system. The
writing
intensive part involved recursively producing and testing and refining their own
interface. And the project keeps on giving to future students.
In Fall 2019, a colleague of mine from the History department, William Campbell, did
me
the great honor of taking my coding course, following a tradition at Pitt begun when
I
took David Birnbaum’s XML-stack coding course on Obdurodon in Spring 2013.. Campbell launched The Brecon
Project, together with students on his team, to study the manuscript
tradition of the foundation charter of a Reformation-era collegiate church and school
in Wales. The students did not need to know medieval Latin to work on the document
data
modeling or even to apply critical apparatus markup with a tightly controlled schema
combining Relax NG and Schematron, generated from a TEI ODD customization that they
devised and revised over the course of their project meetings. For the students and
my
History colleague involved in the project, this was their introduction to the TEI
critical apparatus as a document data model. They turned to previous projects
from our course to follow some examples of critical apparatus markup in order to
understand how to prepare their own.
Without needing to consult the Dickinson project team, the Brecon team was able to
adapt
and build on the example of their markup to take their own study in new directions,
recognizing how their project diverged. For the Brecon team, the text of the charter
was a prose text rather than a
bundle of poems, but nevertheless required a modeling of textual variation over time.
Alyssa Argento, a returning student who was mentoring project teams and continuing
learning XSLT and SVG on her own, took on the challenge of trying to show how the
manuscript and print witnesses compared quantitatively: which versions of the charter
shared the most material in common,
inspired by the example of the network analysis on the Dickinson project. Finding
her
computer unable to install the latest version of the network analysis software that
the
Dickinson team had relied on for their visualization, Argento studied the project
data, worked out
how to arrange eight witnesses as nodes in a circle, and produced a network graph
based on calculations she made with XPath with weighted edges and sized nodes to provide
a detailed visual summary of
how much the eight different versions shared in common across 25 sections of the
charter. Having produced a static network visualization, she then studied how to
make it interactive by applying JavaScript to address attributes on the SVG elements
and
to associate those SVG elements with corresponding columns and rows in HTML tables
containing data from each section of the charter. Her
interactive visualization, accessible at http://brecon.newtfire.org/html/analysis/network.html represents work that
she envisioned and worked out by herself with occasional input from me and the project
team, and while it needs work in the documentation area, it represents a line of
succession from earlier projects in my course. I share it here to demonstrate what is
possible for undergraduates to build on their own with the benefit of learning the
XML
stack.
The many student projects developed in the two sibling
University of
Pittsburgh coding courses taught by David J. Birnbaum (see
https://dh.obdurodon.org) and by me with our respective cohorts of
student peer-mentors over the past decade are now my richest data set for comprehending
the possibilities of interchange and up-conversion and development on a code base. Our
students show us how this work is not only writing intensive in the moment of
application, but intensifies over time as we learn new ways of doing things and build
on
the model of previous projects.
Writing about code to pay a project forward
When students prepare a project in markup, they are not simply writing papers to
be
filed in a course-specific context. They are preparing a research site, and their
work
can often be continued by themselves or others. Students can be building beyond the
constraints of a single semester, and even if they are not tempted to return to the
project, they can leave scaffolding behind for others to continue the work or alter
its
direction, or to retrieve the source document files and start afresh. Awareness of
the
potential energy of the work they are doing can give shape to an encounter with markup
languages
in the course of a semester. The energy input over a course of weeks puts emphasis
on preparing material
that others can read and reuse. Assignments for a writing intensive experience can
be constructed
with attention to:
-
reading the code and documentation of other projects, critiquing it, and
building on it
-
preparing documentation meant to be read by peers and professor(s) working
with you, and meant to be read by others who access your code from a
repository. Such documentation may include:
-
Kanban board workflow management
-
README.md files in a GitHub repository
-
developing task lists and and issue tracking
-
responding to questions and assisting your teammates on a
project discussion forum like Slack
These are informal writing-intensive activities to do with taking
responsibility for method and processes and for managing the intellectual content
of a
project. Studying how copyright applies to code and markup and choosing a license
for
sharing the work should be part of this experience. Students should learn how to credit
their own and others work, and how to transport their data files when they need to
move to a new publishing environment.
Giving students access to the full set of tools in the XML arsenal and establishing
both immediate (in-semester) short-range and long-range possibilities for their work
will introduce students from any background to the potential of writing with markup
languages to form and connect with communities of practice. This cannot
fail to be a professionalizing experience. Even in developing projects that are not
successful, there will be opportunities to recognize in failures what to document,
how
to redo the work differently, or how another group might start over if a team must
sunset
the work. Whether or
not students go on to use markup again after the course is over, they will have engaged
with a powerful form of writing that marks up, investigates, curates, propagates,
and
conserves textual data. This is the very definition of a writing-intensive experience
that is both professionalizing and cross-disciplinary in its reach.
Coda: Teaching markup languages during a pandemic
Teaching a writing-intensive course in markup modalities offers little distress for
adaptation to a remote learning environment. In March 2020, when the author’s
university (then the University of Pittsburgh at Greensburg) closed the campus and
moved
all courses online, there was little difficulty in transferring learning materials
to a
new format because we had already been relying on tutorials and assignments we had
written and posted on the web, but more importantly because we had already developed
a
sense of community in the forms of asynchronous conversation cultivated not in the
learning management system but rather in GitHub and on Slack. Prior to the pandemic
quarantine, during January and February project teams already had developed asynchronous
connections, reinforced by their own emojis (a rubber duck meme associated with rubber
duck debugging, for example). The coding class might properly be recognized as
flipped
, in which most of the learning was already taking place in an
applied context outside of class, while student and faculty class meetings reviewed
content all together, to learn how to interact with an unfamiliar interface or to
review
issues students are having or help address something that is not working. It was easy
to
continue the management of a course in which students had
been trained to work with project management tools and to be writing and sharing their
documentation in GitHub repositories and over Slack channels. We did miss the in-person
interaction on which this class relied, with instructors able to look over the shoulders
of students to help resolve a problem on their computers. Synchronous virtual meetings
could not replace this, but using screen captures in asynchronous chat became more
necessary. Students were certainly challenged to verbalize things that were not
working properly when an instructor could not physically come around behind the student’s
computer to see what was
going wrong. These communication problems were sometimes resolved by connecting with
an instructor over Zoom to share a computer screen. Adding comments to code stored
in the class’s shared eXist-dB XML database and their project GitHub repositories
continued as it had from the beginning of the semester. Because this class had cultivated
tools to be able to
communicate and work together online, they were perhaps less challenged and more bonded
virtually than
students in the courses managed only within the learning management system. In the
pandemic crisis of 2020
the writing intensive nature of shared project development seemed especially beneficial
in
supporting our hive of student coders. Build teams
of developers in a class, and a pandemic may slow but not stop them.