How to cite this paper
Graham, Tony. “Call me Pastichemael: Recreating the Moby-Dick first edition.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Graham01.
Balisage: The Markup Conference 2021
August 2 - 6, 2021
Balisage Paper: Call me Pastichemael
Recreating the Moby-Dick first edition
Tony Graham
Tony Graham is a Senior Architect with Antenna House, where he works on their XSL-FO
and CSS formatter, cloud-based authoring solution, and related products. He also provides
XSL-FO and XSLT consulting and training services on behalf of Antenna House.
Tony has been working with markup since 1991, with XML since 1996, and with XSLT/XSL-FO
since 1998. He is Chair of the Print and Page Layout Community Group at the W3C and
previously an invited expert on the W3C XML Print and Page Layout Working Group (XPPL)
defining the XSL-FO specification, as well as an acknowledged expert in XSLT. Tony
is the developer of the ‘stf’ Schematron testing framework and also Antenna House’s
‘focheck’ XSL-FO validation tool, a committer to both the XSpec and Juxy XSLT testing
frameworks, the author of “Unicode: A Primer”, and a qualified trainer.
Tony’s career in XML and SGML spans Japan, USA, UK, and Ireland. Before joining Antenna
House, he had previously been an independent consultant, a Staff Engineer with Sun
Microsystems, a Senior Consultant with Mulberry Technologies, and a Document Analyst
with Uniscope. He has worked with data in English, Chinese, Japanese, and Korean,
and with academic, automotive, publishing, software, and telecommunications applications.
He has also spoken about XML, XSLT, XSL-FO, EPUB, and related technologies to clients
and conferences in North America, Europe, Japan, and Australia.
©2021 Antenna House, Inc.
Abstract
Moby-Dick by Herman Melville is frequently used as the example document for EPUB and
CSS applications. At around 670 pages, it is also a good choice for demonstrating
the automated analysis features of AH Formatter. This presentation describes features
of working with – and sometimes augmenting, sometimes correcting – the TEI source
for the American first edition of Moby Dick to create a PDF version in the style of
the 1851 original.
Table of Contents
- Introduction
- Successive Approximations
- Styling from Page Images
- Front Matter
-
- Title page
-
- Book title
- Contents
- ‘Etymology’ and ‘Extracts’
- Body
-
- Chapter separator
- Footnotes
-
- Duplicate footnotes
- Footnote size
- Block
-
- Widows and orphans
- Hyphen at end of page
- Text
-
- Italics and small-caps
- ‘Curly’ quotes
- Consecutive em dashes
- Baseline grid
- Headers and Footers
- Conclusion
- Acknowledgments
Introduction
This paper describes aspects of the stylesheets that were developed to format the
first
American edition of Moby-Dick by Herman Melville. The stylesheets illustrate one way
to
approach developing a stylesheet for XSL-FO, and they also illustrate how to use some
AH
Formatter extensions. The stylesheets were developed for a project to demonstrate
how to use the Automated
Analysis feature 6 of AH Formatter V7.1 7.
AH Formatter V7.1 is able to automatically detect a range of typographic problems
in a formatted document. Solving these problems usually requires editorial or stylistic
changes, and sometimes both. Automated analysis of formatting problems is most useful
with longer documents. With shorter documents, the user might decide they can find
all of the problems just by looking at the few pages.
The first American edition of Moby-Dick was chosen because:
-
Moby-Dick is frequently used as a sample document for EPUB and CSS examples.
-
At around 670 pages when formatted, it is obvious that automated analysis will be
both
quicker and more consistent than visually inspecting each page.
-
The book is out of copyright.
-
The text is freely available in XML.
-
Scans of the original pages are available on the web. 1 2 3
The source for Moby-Dick 3 is TEI-encoded XML 4 from the Wright American Fiction
project 5. Moby-Dick is also available as a Project Gutenberg eBook 11, but the tagging in that
version lacks sufficient detail.
Because this was the testbed for the automated analysis feature, the initial emphasis
was on getting
the text block of the body pages correct. The styles for everything outside the text
block –
headers and footers, the front-matter, and the advertisements at the back of the first
edition
– were initially developed as a rough approximation of the formatting used in the
first
edition. Over time, the styles have been refined to more accurately mimic the printed
first
edition.
Successive Approximations
To develop a stylesheet for formatting with either XSL-FO or CSS is usually a process
of
developing successive approximations of the final result. This is true whether the
look of
the document is being developed on the fly, developed according to a design brief,
or
developed to match an existing document, as with Moby-Dick.
The first draft of a stylesheet will likely produce only a rough approximation of
the
final result. If you are developing on the fly, then you haven’t made up your mind
about the
final look at that point anyway. If you are developing according to a design brief,
then the
first version that you format is likely to have the correct page size and the correct
fonts
and font sizes for major titles and paragraphs but may omit more context-specific
styles
such as for the table of contents, index, tables, nested lists, and so on. It is similar
for
developing styles to match an existing document.
That is usually followed by a sequence of making and reviewing changes to bring the
styles closer to the final result. This is true, of course, when you are developing
on the
fly, because the final result isn’t known until you say that you have the result that
you
want. It is also true for both developing according to a design brief and developing
to
match an existing document, because there are additional contexts that you know you
have not
handled yet and, quite likely, more contexts that neither you nor the designer had
anticipated. These might include nesting lists of different types or handling figures
or
table immediately after a title or, for Moby-Dick, handling Queequeg’s mark or stage
directions and songs.
Successive changes should, of course, bring you closer to the final result. In reality,
some changes will have to be redone, and some changes will throw up new problems,
but the
overall movement is to close in on the final result.
Styling from Page Images
The initial styles for the pages – particularly for the front-matter – were refined
by
setting a photograph of a page from the first edition as the background image for
the
corresponding page and adjusting the XSL-FO to match. The following image shows the
formatted title page with the photograph of the first edition’s title page as the
page
background opened in the AH Formatter GUI.
The sequence of steps to use adjust the styles to match a page scan used as a background
image is:
-
Modify a copy of the XSL-FO to add axf:bleed
and axf:crop-offset
properties to each
fo:simple-page-master
that will have a background image. For example:
<fo:simple-page-master master-name="First-PageMaster"
page-height="7.375in"
page-width="4.78in"
axf:bleed="0.5in"
axf:crop-offset="0.5in">
-
If necessary, rotate the page image so that the text is as horizontal as
possible.
The first edition is now 170 years old, and the available page images are photographs
of
pages in the bound book, rather than scans of individual pages. The result is that
the text
in the scans is not always perfectly parallel, either because of the condition of
the page
or because of the curve of the paper when the page was photographed. The following
image
shows that variation can happen: the red lines are parallel, the text is not.
-
Specify the page image, scaled and positioned to match the formatted page, as the
background image of either the fo:simple-page-master
:
<fo:simple-page-master
master-name="First-PageMaster"
page-height="7.375in"
page-width="4.875in"
background-image="page-images/MD_Amer_0038.jpg"
axf:background-size="5.21in"
background-position="-0.12in -0.15in"
axf:bleed="0.5in"
axf:crop-offset="0.5in">
or on the fo:page-sequence
that generates the page:
<fo:page-sequence
master-reference="CoverFrontMaster"
background-image="page-images/MD_Amer_0019.jpg"
axf:background-size="5.7in"
background-position="-0.7in -0.3in">
Because the page images for the first edition are photographs, there was considerable
variation in the size and position of the page within each image. Getting the correct
size
and position was an iterative process of modifying the XSL-FO and viewing the result
in the
AH Formatter GUI, then repeating the process until the result is satisfactory. Enabling
‘Show Borders’ in the AH Formatter GUI makes it easier to judge how to adjust the
background
image.
-
Iteratively modify the XSL-FO then view it in the AH Formatter GUI until the
formatted document satisfactorily matches the page from the first edition.
-
Modify the stylesheets for generating the XSL-FO to recreate the FOs and properties
that were arrived at manually.
The result can be quite a close approximation of the original:
The different parts of the front matter of the first edition show considerable variation
in fonts, font sizes, and letter- and word-spacing. That, combined with the necessarily
imprecise size and position of the background images, has resulted in a range of values
for
the same properties applied at different places on different pages. When time permits,
it
should be possible to rationalize these and use fewer, more consistent values and
still
reproduce the first edition pages with sufficient accuracy. After all, the first edition
was
printed with a fixed set of founts and with fixed increments of the space that could
be
added between letters. Font sizes, etc., were unlikely to have been specified in points
in
America in 1851, but the sizes would have been internally consistent.
Front Matter
The front matter of Moby-Dick comprises:
-
Title page
-
Copyright page
-
Dedication
-
Contents
-
Fly title
-
Etymology
-
Extracts
Title page
As shown previously, it is possible to reproduce the title page fairly
accurately.
Book title
The markup for the book’s title does not include sufficient information to
accurately reproduce the formatted title:
<docTitle>
<titlePart>MOBY-DICK;</titlePart>
<titlePart type="sub">OR, THE WHALE.</titlePart>
</docTitle>
plus the book’s title is formatted identically on the fly title page, but its markup
has even less correspondence to the formatting:
<div type="fly_title">
<head>MOBY-DICK; OR, THE WHALE.</head>
</div>
Because the stylesheet is specific to Moby-Dick, it was simpler to ignore the markup
and to use xsl:analyze-string
and generate FOs around parts of the title text:
<xsl:template match="docTitle | div[@type = 'fly_title']/head"
priority="5">
<fo:block
font-size="24pt"
letter-spacing="0.37em"
line-height="1"
text-align="center"
font-stretch="extra-condensed">
<xsl:analyze-string
select="normalize-space(.)"
regex="OR,">
<xsl:matching-substring>
<fo:block
font-size="8pt" font-variant="all-small-caps"
font-stretch="normal"
letter-spacing="0.125em" space-before="30pt">
<xsl:value-of select="." />
</fo:block>
</xsl:matching-substring>
<xsl:non-matching-substring>
<fo:block axf:letter-spacing-side="start">
<xsl:if test="contains(., 'THE WHALE.')">
<xsl:attribute name="space-before" select="'30pt'" />
<xsl:attribute name="letter-spacing" select="'0.9em'" />
</xsl:if>
<xsl:analyze-string
select="."
regex="\.| ">
<xsl:matching-substring>
<fo:inline letter-spacing="0.3em">
<xsl:value-of select="." />
</fo:inline>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="." />
</xsl:non-matching-substring>
</xsl:analyze-string>
</fo:block>
</xsl:non-matching-substring>
</xsl:analyze-string>
</fo:block>
</xsl:template>
The document, with a few minor exceptions, is formatted entirely in Source Serif
Pro. The font is both open source and a reasonable match for the font that was used
for
paragraphs in the first edition. However, in the first edition, the title page (and
some
other titles) uses both narrow and small-capital variants of the text font. Source
Serif
Pro does not have a narrow version, so the narrow variants are achieved by setting
the
font-stretch
property (for example, font-stretch="extra-condensed"
) and relying on
AH Formatter to adjust each character’s width. Source Serif Pro does have true small
caps 8, but font-variant="small-caps"
as defined in XSL 1.1 uses small caps only for
lower-case letters. Because the small caps in the title are represented in the XML
as
capital letters, it is necessary to use the font-variant="all-small-caps"
AH Formatter
extension to format the capital letters in the source as small caps.
Many of the titles in the first edition use letter-spaced characters. Letter spacing
is specified with the letter-spacing
property, and values such as
letter-spacing="0.37em"
were arrived at through trial and error to match the
appearance of the page image when it was used as the background. However, in the first
edition, the letter spacing between an alphabetic character and a following punctuation
character is sometimes less than the letter spacing between two alphabetic characters.
The solution is to use xsl:analyze-string
to generate an fo:inline
with a different
letter-spacing
value around just those characters. A refinement, used elsewhere in the
stylesheet, is to also use the axf:letter-spacing-side
AH Formatter extension so that
all of the space that is added is between the alphabetic characters added at their
start
side and so does not contribute to the space between an alphabetic character and a
following punctuation character.
Contents
The Table of Contents is formatted in two columns. It is marked up as a list but is
rendered as a four-column table to be able to recreate the formatting of the first
edition.
The TEI for the Table of Contents begins:
<div type="contents">
<pb n="v (Table of Contents) " xml:id="VAC7237-00000003"/>
<head>CONTENTS.</head>
<list>
<item>I.—Loomings. <ref target="VAC7237-00000013" rend="right">1</ref>
</item>
<item>II.—The Carpet Bag. <ref target="VAC7237-00000017" rend="right">7</ref>
</item>
The content of the ref
elements is each chapter’s page number in the first
edition. The target
attribute, however, refers to pb
milestone elements that mark
the start of each two-page spread. At least one of the cross-references was found
to point
to the spread after the first page of its chapter and had to be corrected. More may
yet be
found.
The cross-references could not be used anyway because the XSL-FO version does not
attempt to recreate the page breaks of the first edition. The cross-references from
Table
of Contents entries to chapters in the generated PDF are determined from the position
of
the list item for each chapter in the Table of Contents list:
<!-- Every chapter has a generated ID, and 'EPILOGUE.' is the only
ToC entry without a page number. -->
<xsl:variable
name="target"
select="if (exists(ref))
then concat('chapter-', position())
else 'epilogue'"
as="xs:string" />
The markup for each chapter begins <div type="chapter">
, but generating the ID
for each chapter could not use position()
because some of the pb
milestones appear
between chapters:
<fo:block
id="{@type}-{count(preceding::div[@type = current()/@type]) + 1}">
The Table of Contents is formatted as a four-column table to keep the different parts
of the Table of Contents entries aligned:
The alignment and spacing of the leader dots is simple with XSL-FO:
<fo:leader leader-pattern="dots"
leader-pattern-width="1em"
leader-alignment="end" />
‘Etymology’ and ‘Extracts’
The ‘Etymology’ and ‘Extracts’ segments each consist of an introductory narrative
page
followed by quotes and, in ‘Etymology’, a table. The fonts used for the titles in
the
first edition are not consistent, so they each needed a separate template.
In both ‘Etymology’ and ‘Extracts’, each quote has an attribution, and each
attribution is marked up as following the quoted material:
<cit>
<q>
<p>"Very like a whale."</p>
</q>
<bibl>
<title>Hamlet</title>.</bibl>
</cit>
However, if there is enough space on the last line, the attribution is formatted in
the same fo:block
as the quotation:
If there is not enough space, because either the last line or the attribution is too
long, then the attribution is formatted on the next line.
Placing the attribution in the same block is handled by the common XSLT pattern of
not
formatting the bibl
as part of the default processing and instead explicitly
selecting its content when processing the q
:
<xsl:template match="q/p">
<fo:block>
<xsl:apply-templates />
<xsl:if test="position() = last() and
exists(../following-sibling::*[1][self::bibl])">
<fo:leader leader-pattern="space"/>
<fo:leader leader-pattern="space" leader-length.optimum="100%"/>
<fo:inline-container padding-left="2em" padding-right="0.125in"
max-width="80%" text-indent="0">
<fo:block text-align="right">
<xsl:apply-templates
select="../following-sibling::*[1]/node()" />
</fo:block>
</fo:inline-container>
</xsl:if>
</fo:block>
</xsl:template>
<xsl:template
match="bibl[exists(preceding-sibling::*[1][self::q[p]])]"
priority="5" />
Placing the attribution either on the last line of the quotation or on the next line
is handled by the common XSL-FO pattern of using two fo:leader
.
Body
The majority of Moby-Dick is 135 chapters of largely text. Melville scholars like
to
find patterns in the structure of the chapters 9, but when formatting Moby-Dick, the most
useful distinctions are between paragraph-like blocks of text and other content.
The non-paragraph content includes:
-
A single graphic (for Queequeg’s mark, )
-
Inscriptions from tombstones
-
Songs and poems
-
Speeches and stage directions as if for a play
Chapter separator
When a chapter in the first edition ends near the bottom of a page, the next chapter
begins on the following page with space before the chapter title:
When a chapter does not end near the bottom of a page, there is an additional
separator printed before the chapter title. To complicate matters, the space between
the
separator and the chapter title is less than the space before a chapter that starts
on a
new page:
When a chapter ends with some space left at the bottom of a page but not enough space
for the separator and the chapter title, the separator is printed at the end of the
chapter and the next chapter starts on the next page:
When the first edition was composed manually, it would have been straightforward to
add the separator when and where it was needed. It is not quite as straightforward
with
automated, ‘lights-out’ formatting using XSL-FO. Because the page breaks are not known
before the document is formatted, it is not possible to just insert as many separators
as
needed, and the XSL 1.1 Recommendation does not support conditional processing based
on an
area’s position on the page.
Two things make this possible with AH Formatter: firstly, the
axf:suppress-if-first-on-page
extension property makes AH Formatter suppress the
separator for a chapter title at the top of a page; and, secondly, the standard
space-after.precedence="force"
on the fo:block
for the separator ensures the
correct distance between the separator and the chapter title when the separator is
present
while allowing the different space-before
value on the chapter title to apply when the
separator has been suppressed or is on the previous page.
<xsl:template
match="div[@type = 'chapter'][exists(head[@type = 'sub'] |
fw[@type = 'head'])]">
<xsl:if
test="exists(preceding-sibling::div[@type = current()/@type])">
<fo:block axf:suppress-if-first-on-page="true" text-align="center"
padding-top="0.125in"
space-after="0.2in" space-after.precedence="force"
axf:baseline-grid="none"
axf:baseline-block-snap="none">
<fo:external-graphic src="images/separator.svg" />
</fo:block>
</xsl:if>
<fo:block
id="{@type}-{count(preceding::div[@type = current()/@type]) + 1}">
<fo:marker marker-class-name="Chapter-Title">
<xsl:apply-templates
select="(fw[@type = 'head'], head[@type = 'sub'])[1]/node()"
mode="marker" />
</fo:marker>
<fo:block-container
axf:baseline-grid="none"
axf:baseline-block-snap="none"
keep-together.within-page="always"
keep-with-next.within-page="always"
space-before="{if (exists(preceding::div[1]
[@type = 'chapter']))
then '0.5in'
else '0.72in'}"
space-before.conditionality="retain">
<xsl:apply-templates select="head" />
</fo:block-container>
<xsl:apply-templates select="* except head" />
</fo:block>
</xsl:template>
Footnotes
Footnotes are marked up as a ref
containing the footnote marker that refers to
the separate note
containing the footnote content:
<p>
<emph>Whaling not respectable?</emph> Whaling is imperial! By old
English statutory law, the whale is declared "a royal fish."<ref
rend="super" target="#note_001" xml:id="return_001">*</ref>
<note place="foot" xml:id="note_001">
<p><ref target="#return_001">*</ref>See subsequent chapters for
something more on this head.</p>
</note>
</p>
The XSL-FO fo:footnote
contains both an fo:inline
for the footnote marker
and an fo:footnote-body
for the footnote content, so the XSLT stylesheet does not
process the note
where it occurs in the document but instead formats the content of
the note
by using key()
to find the note that is referred to by each
ref
:
<xsl:template match="note[@place = 'foot']" />
<xsl:template match="ref[exists(key('footnote',
substring-after(@target, '#')))]"
priority="5">
<fo:footnote
id="{@xml:id}"
axf:suppress-duplicate-footnote="true">
<fo:inline>
<fo:basic-link
internal-destination="{substring-after(@target, '#')}">
<xsl:value-of select="." />
</fo:basic-link>
</fo:inline>
<fo:footnote-body
id="{substring-after(@target, '#')}"
font-size="7pt"
line-height="10pt">
<xsl:apply-templates
select="key('footnote',
substring-after(@target, '#'))/node()" />
</fo:footnote-body>
</fo:footnote>
</xsl:template>
Duplicate footnotes
One page of the first edition has two references to the same footnote:
The TEI XML repeats the footnote text:
<p>
<emph>Whaling not respectable?</emph> Whaling is imperial! By old
English statutory law, the whale is declared "a royal fish."<ref
rend="super" target="#note_001" xml:id="return_001">*</ref>
<note place="foot" xml:id="note_001">
<p><ref target="#return_001">*</ref>See subsequent chapters for
something more on this head.</p>
</note>
</p>
...
<p>
<emph>The whale never figured in any grand imposing way?</emph> ...
cymballed procession. <ref rend="super"
target="#note_002" xml:id="return_002">*</ref>
</p>
<note place="foot" xml:id="note_002">
<p><ref target="#return_002">*</ref>See subsequent chapters for
something more on this head.</p>
</note>
XSL 1.1 would render both footnotes, but the axf:suppress-duplicate-footnote
extension property causes AH Formatter to generate only one copy of the footnote when
both footnotes occur on the same page.
Footnote size
Moby-Dick also includes some whale-size footnotes:
Some things that could have been done were not needed:
-
In the first edition, these two footnotes start on the same page, and it is the
second footnote that continues onto a second page. Even so, both footnotes have the
same ‘*’ footnote marker in the first edition.
Because the markers in the first edition are all the same, it is not necessary to
use the axf:footnote-number
and axf:footnote-number-citation
extension
elements to generate and use a sequence of footnote markers.
-
It is possible to limit the height of the footnotes using
axf:footnote-max-height
, but the height of the formatted footnotes is comparable
to the height in the first edition, so this also was not necessary.
Block
Widows and orphans
An orphan is too few lines before a page break, and a widow is too few lines after
a
page break.
The First Edition has multiple single-line orphans.
However, the only single lines at the top of a page are single-line dialogue. It is
impossible to say how many two-line widows were deliberately forced. For example,
page 610
ends with widely-spaced text, and page 611 begins with the last two lines of the
paragraph:
Some of the wide spacing is due to the white-space before ‘?’ and ‘!’ in the First
Edition, but compare the First Edition with the fewer lines when the paragraph is
formatted:
Similarly, the paragraph on pages 371–373 in the first edition is 26 lines, but is
25
lines when formatted on one page by AH Formatter. The first four formatted lines are
identical to the First Edition, but then they diverge.
The formatted version uses the XSL-FO 1.1 defaults of orphans="2"
and
widows="2"
.
Hyphen at end of page
The First Edition has multiple pages that end with a hyphen:
The formatted version specifies hyphenation-keep="page"
on fo:root
so that words are
not hyphenated across a page break. The hyphenation-keep-mode
setting in the Option
Setting File is not overridden, so AH Formatter pushes only the otherwise-hyphenated
word
to the next page, not the entire last line.
Text
Italics and small-caps
The markup for text in italics and in small-caps needed to be corrected for proper
formatting. In the original TEI XML, italic text was marked by an empty <hi rend="i"/>
element at the start of the italic text but there was no indication where the italic
text
ended. It might be argued that to not enclose the italic text makes textual analysis
easier, but foreign words (or words thought to be foreign) were marked up with a start
and
an end tag, for example: <foreign xml:lang="LAT">Folio</foreign>
.
Text in small-caps in the first edition was included in the TEI XML as capital letters
without any extra markup. It was necessary to find the text that should be small-caps,
add
markup, and change the text to mixed-case. For example, ‘THE’ at the start of a chapter
becomes <hi
rend="small-caps">The</hi>
.
‘Curly’ quotes
Moby-Dick makes extensive use of both single- and double-quotes. This includes
apostrophes replacing letters in broken English for speech from non-native speakers
of
English. In the first edition, the left and right quotes are visibly different:
In the TEI source XML, however, all quotes are the same:
<p>"Do you is all sharks, and by natur wery woracious, yet I zay to
you, fellow-critters, dat dat woraciousness—'top dat dam slappin' ob
de tail! How you tink to hear, 'spose you keep up such a dam slappin'
and bitin' dare?"</p>
Converting the straight quotes to ‘curly’ quotes initially seemed straightforward,
but
it was made complicated by quotes before emphasized text and the difference between
left
single quotes at the start of quoted text and right single quotes at the start of
a word
to indicate a dropped letter.
<!-- Convert single and double quotes to 'curly' quotes. -->
<xsl:template match="text()" name="ahf:text">
<xsl:param name="text" select="." as="text()" />
<xsl:value-of select="ahf:text($text)" />
</xsl:template>
<xsl:function name="ahf:text" as="xs:string">
<xsl:param name="text" as="text()" />
<!-- The replacement that depends on the current node must be
first. -->
<xsl:variable
name="text"
select="if (matches($text, '"$') and
empty($text/following-sibling::node()))
then replace($text, '"$', '”')
else $text"
as="xs:string" />
<!-- Moby-Dick uses broken English for speech from non-native
speakers of English. The speech can include words with the
dropped initial vowel indicated by a right single-quote.
Handle those before replacing any ' with left
single-quotes. -->
<xsl:variable
name="text"
select="replace($text, '''(s?t?("|\s|[.,;:]|(balmed|dention|em|gainst|ll|mong|parm|quid|specially|spose|stead|teak|till|[Tt]is|[Tt]was)(,|\.|\s)|$))', '’$1')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '(^|\s|"|—)''([^"]|$)', '$1‘$2')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '(^|—|\s)"', '$1“')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '"(\s|[—.,;:]|$)', '”$1')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '([^\s])''([^\s])', '$1’$2')"
as="xs:string" />
<!-- Variations on '* * *' in 'Extracts'. -->
<xsl:variable
name="text"
select="replace($text, ' \*', '  *')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '\* ', '*  ')"
as="xs:string" />
<xsl:sequence select="$text" />
</xsl:function>
The ahf:text()
XSLT function is also used in other contexts; for example:
<xsl:template match="div[@type = 'fly_title']/bibl">
<fo:block text-align="center" hyphenate="false" font-size="5pt"
line-height="10.5pt"
space-before="2.33in" space-before.conditionality="retain">
<!-- Provide structure that is not in the source XML. -->
<xsl:analyze-string select="ahf:text(edition/text())"
regex="HERMAN MELVILLE,">
...
Consecutive em dashes
The First Edition uses two or three consecutive em dashes as a typographic effect
in
multiple places, for example:
Most typography books that cover the em dash recommend a thin space before and after
the dash. For example, Correct Composition 12 states:
As the dash entirely fills the body sideways, it should have before and after it a
thin space to prevent the interference with adjoining characters.
Many digital fonts preserve the letterpress practice that the em dash completely fills
its width. However, Source Serif Pro includes built-in white-space before and after
the
stroke. This is generally useful, but it looks bad when there are consecutive em
dashes:
It looked for a time that it would be necessary to wrap consecutive em dashes with
<fo:wrapper font-family="serif">
to select a font with em dash that would join up.
However, a chance (re)discovery of the Unicode characters for two and three consecutive
em
dashes provided the way to show the correct dashes without changing fonts. More steps
were
added to the text handling:
<xsl:variable
name="text"
select="replace($text, '———', '⸻')"
as="xs:string" />
<xsl:variable
name="text"
select="replace($text, '——', '⸺')"
as="xs:string" />
It is now possible to use the correct dashes from the same font:
Baseline grid
‘Show-through’ occurs when text on the back of a page is visible through the paper.
The shadow of the text on the back reduces the legibility of the text on the front.
One
way to reduce show-through (aside from using thicker paper or only reading the document
electronically) is to align the lines of text on the front and back of the page.
This image from the first edition shows some show-through, but it also shows both
that
the lines mostly line up and that lines resume their alignment after the three irregular
lines:
Keeping lines aligned front-and-back is straightforward when all of the text is the
same font size and has the same line height. It becomes harder when the text includes
titles, etc., that have different font sizes, line heights, and space before and after.
It
is often possible to style a title such that the space before the title, the line
height
of the title, and the space after the title add up to a multiple of the base line
height.
However, this will fail if some titles extend over two lines and the line height of
the
title is not a multiple of the base line height.
The AH Formatter baseline grid extension can both align lines to a common baseline
and
allow lines in specific blocks to either align to their own grid or align to no grid
at
all. The red lines in the following figure highlight that lines in ordinary paragraphs
are
aligned to the baseline grid even after the three irregular lines and after a chapter
number and title:
The first step is to specify the baseline grid using axf:baseline-grid
:
<xsl:template match="body">
<fo:page-sequence
master-reference="PageMaster"
writing-mode="from-page-master-region()"
initial-page-number="1"
axf:baseline-grid="root">
<xsl:call-template name="PageMaster-static-content" />
<fo:flow flow-name="xsl-region-body" hyphenate="true"
text-align="justify">
<xsl:apply-templates />
</fo:flow>
</fo:page-sequence>
</xsl:template>
The second step is for blocks that do not use the baseline grid to establish their
own
grid, also using axf:baseline-grid
:
<xsl:template match="body//q">
<fo:block text-align="center"
text-indent="0"
space-before="0.25lh"
font-size="7pt"
line-height="9pt"
axf:baseline-block-snap="before margin-box"
axf:baseline-grid="new">
<xsl:apply-templates />
</fo:block>
</xsl:template>
axf:baseline-block-snap
specifies how a block aligns with the baseline grid, if any,
of its parent block.
Headers and Footers
The headers and footers in the first edition, when present, are quite simple: just
the
page number and the chapter title. However, an abbreviated title is used for some
chapters,
even for chapters that do not have a long title. The TEI XML did not include the running
header text, so any abbreviated titles were added as fw
(“forme work”) elements 10.
For example:
<div type="chapter">
<head>CHAPTER XXIX.</head>
<head type="sub">ENTER AHAB; TO HIM, STUBB.</head>
<fw type="head" place="top-centre">ENTER AHAB.</fw>
It is simple to choose the fw
element, if present, in preference to the title text
as the content of the fo:marker
for the running header:
<fo:marker marker-class-name="Chapter-Title">
<xsl:apply-templates
select="(fw[@type = 'head'], head)[1]/node()"
mode="marker" />
</fo:marker>
The abbreviated title is ordinarily centered in the header:
However, even the abbreviated title can be quite long. At least one title is long
enough
that it cannot be centered in the header without crowding the page number:
The solution is to let the header overflow when it is too wide and to specify
axf:overflow-align
so the page number remains aligned with the outer edge of the text
block:
<xsl:template name="Odd-Header">
<fo:block
keep-together.within-line="always"
text-align="center"
font-size="8pt"
border-bottom="1pt solid black"
axf:leader-expansion="force"
padding-bottom="5pt"
margin-bottom="4pt"
axf:overflow-align="end">
<fo:page-number color="transparent"/>
<fo:leader />
<fo:inline letter-spacing="0.22em">
<fo:retrieve-marker
retrieve-class-name="Chapter-Title"
retrieve-position="last-starting-within-page" />
</fo:inline>
<fo:leader />
<fo:page-number />
</fo:block>
</xsl:template>
Conclusion
Developing a stylesheet
to format the first American edition of Moby-Dick by Herman Melville presented several
challenges, including challenges posed by the TEI markup for the source. The challenges
were able to be solved using a combination of the features of XSLT,
XSL-FO, and AH Formatter extensions.
The stylesheets for formatting Moby-Dick are on GitHub at https://github.com/AntennaHouse/moby-dick. The TEI XML source is in a submodule of the main repository. The XML is also available
separately at https://github.com/AntennaHouse/moby-dick-tei.
Acknowledgments
Wendell Piez helped me navigate some of the details of TEI markup.
References
[1] Internet Archive. Moby-Dick, or, the Whale.
Duke University Libraries. https://archive.org/details/mobydickorwhale01melv/page/n7/mode/2up.
[2] Melville Electronic Library. Moby-Dick Side-by-Side: The American And British First Editions.
Melville Electronic Library. https://melville.electroniclibrary.org/moby-dick-side-by-side
(Archive).
[3] IU Digital Library Program. Moby-Dick, or, The Whale.
Melville, Herman, (1819–1891). http://webapp1.dlib.indiana.edu/TEIgeneral/view?docId=wright/VAC7237&brand=wright
(Archive).
[4] IU Digital Library Program. Moby Dick, or, The Whale. http://dogwood.dlib.indiana.edu:8080/xubmit/rest/repository/wright/VAC7237.xml
(Archive).
[5] IU Digital Library Program. Wright American Fiction. Indiana University.
http://webapp1.dlib.indiana.edu/TEIgeneral/welcome.do?brand=wright (Archive).
[6] Antenna House. Automated Analysis.
https://www.antenna.co.jp/AHF/help/en/ahf-analyzer.html.
[7] Antenna House. Antenna House
Formatter V7. https://www.antennahouse.com/formatter-v7.
[8] Grießhammer, Frank. Introducing Source Serif 2.0. Adobe Typekit Blog. January 10, 2017.
https://blog.typekit.com/2017/01/10/introducing-source-serif-2-0/.
[9] Wikipedia. Moby-Dick. Chapter structure. https://en.wikipedia.org/wiki/Moby-Dick#Chapter_structure.
[10] Text Encoding Initiative. Headers, Footers, and Similar
Matter. P5: Guidelines for Electronic Text Encoding and Interchange.
https://tei-c.org/release/doc/tei-p5-doc/en/html/PH.html#PHSK.
[11] Project Gutenberg. The Project Gutenberg eBook of Moby-Dick, by Herman Melville. https://www.gutenberg.org/files/15/15-h/15-h.htm.
[12] De Vinne, Theodore Lowe. Correct Composition. The Century Co., New York, 1904.
×De Vinne, Theodore Lowe. Correct Composition. The Century Co., New York, 1904.