Birnbaum, David J., and Charlie Taylor. “How long is my SVG <text> element?” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Birnbaum01.
Balisage: The Markup Conference 2021 August 2 - 6, 2021
Balisage Paper: How long is my SVG <text> element?
David J. Birnbaum is Professor of Slavic Languages and Literatures at the
University of Pittsburgh. He has been involved in the study of electronic text
technology since the mid-1980s, has delivered presentations at a variety of
digital humanities and electronic text technology conferences, and has served on
the board of the Association for Computers and the Humanities, the editorial
board of Markup languages: theory and practice,
and the Text Encoding Initiative Technical Council. Much of his electronic text
work intersects with his research in medieval Slavic manuscript studies, but he
also often writes about issues in the philosophy of markup. For the past ten
years he has been teaching an XML-oriented course in the University Honors
College at the University of Pittsburgh entitled Computational methods in
the humanities (http://dh.obdurodon.org).
Charlie Taylor is an undergraduate at the University of Pittsburgh studying
History of Art & Architecture and Classics. She was enrolled in Dr.
Birnbaum’s Computational methods in the humanities class in
Spring 2020, and currently serves as a teaching assistant for the course. She
will graduate in Spring 2023 and plans to pursue a graduate degree in Medieval
Studies.
SVG layout requires that the developer be in control of the dimensions of the
objects that must be placed in the coordinate space. It is easy to specify (or
compute based on other specifications) the size (bounding box height and width) of
many SVG objects (e.g., rectangles, circles, lines), but identifying the bounding
box for text is challenging because SVG text does not know its own length.
In this report we explore two methods for working around this limitation. The
first method, implemented in XSLT, consults exported font metrics to determine the
length of SVG <text> elements and uses that information to make
layout decisions as the SVG is created. The second method, implemented in
JavaScript, determines the length of SVG <text> objects as the
SVG is rendered in a browser and uses the information to control the layout at
rendering time.
SVG layout requires that the developer be in control of the dimensions of the objects
that must be placed in the coordinate space. It is easy to specify (or compute based
on
other specifications) the size (bounding box height and width) of many SVG objects
(e.g., rectangles, circles, lines), but identifying the bounding box for text is
challenging because SVG text does not know its own length (and, perhaps surprisingly,
its height). Developers are able to specify a @font-size, which is related
to (but does not match) the height of the font (from baseline to baseline) as rendered[1], but the rendered length of a string of characters depends on the widths of
the individual characters, and those widths are not exposed in any
convenient, accessible way in a regular XML development environment.
Our test case for exploring and addressing the challenges <text>
elements pose for layout is an SVG bar graph like the following:[2]
In this graph the bars grow up and down from a central X axis, and they bear
individual slanted labels below the lowest horizontal ruling line.[3] Under those bar labels we label the X axis itself. The challenge involves
determining the Y position of the general X axis label under the graph, since that
position will depend on the length of the longest bar label, and will therefore vary
according to the specific values of those bar labels. If you are like us, you place
that
general label with a combination of guesswork and trial and error and then hard-code
a
magic number, which—predictably and understandably—breaks as soon as you try to reuse
the code with new input.[4]
Solutions
This report explores two strategies for working around the difficulty of knowing
the
length of an SVG <text> element:
The XSLT approach: As a preprocessing step
before undertaking the transformation that outputs the SVG bar graph, create a
mapping between characters and their individual widths by querying the font metrics.[5] Font metrics are not accessible by default to an XSLT processor, but
we can extract from the font file and format them in a preprocessing step and
then import a machine-accessible mapping from glyph to width into the XSLT that
generates the SVG. With this approach, the length of a <text>
element is presumed to be the sum of the widths of its constituent characters.
If the text strings are horizontal or vertical, the length is one of the
dimensions of the bounding box, and the other dimension is (approximately) the
font size. If the text is rotated, we can apply trigonometric functions to the
length of the text (which functions as the hypotenuse) and the angle of rotation
to compute the dimensions (the adjacent and opposites sides of a right triangle)
of the bounding box.
The JavaScript approach: Output SVG
components that must be arranged horizontally or vertically as separate
<svg> elements with no width or height specified. Without
our JavaScript intervention, these would wind up with a default width and height
in a browser that would not match the actual dimensions of the contents. We use
CSS Flexbox to arrange the <svg> elements horizontally or
vertically and employ the JavaScript element.getBBox() function to
compute the dimensions of the bounding box for each of the
<svg> elements. We can then write the computed width and
height values of each <svg> element into the DOM, effectively
assigning a display width and height after loading the SVG into the browser,
instead of when creating it with XSLT.
These two approaches, each of which has advantages and disadvantages, are described
in
more detail, with examples, below.[6]
An XSLT approach
Overview
The XSLT approach computes the dimensions of the text as it creates the SVG
and positions other elements accordingly. This approach is less precise than the
JavaScript method for reasons discussed below, but because in our actual use we
normally want to include a small amount of padding around our
<text> elements, we do not require placement with exact
precision.
In the sample SVG below, we compute the length of the text (see the discussion
of the XSLT Implementation, below) and employ it to draw a
line parallel to horizontal or vertical text, so that we can examine the
accuracy of our measurements visually.[7] (That is, the lines are not created as an underlining font effect;
the lines are drawn independently of the text as SVG <line>
elements.) In the case of diagonal text we draw a bounding rectangle. The third
of the four examples in each figure is a null string, so we expect to see no
text and no line. The images below are in SVG that is rendered in the browser,
which means that the exact appearance may vary depending on browser
features:
The examples show the following desirable results:
The overall precision is quite high.
A null string is correctly matched by a zero-length line.
At the same time, these examples reveal several limitations. One limitation to
computing the dimensions of SVG <text> elements while
creating them is that user agents (typically
browsers) may impose small unpredictable modifications on the layout. For
example, the last of the horizontal examples above specifies a
@font-size of "16" (which is the SVG and CSS
default when no size is specified) and computes a length of 297.703125. However,
when when we open the SVG in Chrome, Firefox, and Safari and use the JavaScript
getBBox() method to query the size of the bounding box, the
three browsers report slightly different values for both height and width[8]
Table I
Browser
Height
Width
Chrome
17.796875
297.703125
Firefox
18
298.3999938964844
Safari
17.734375
297.703125
The retrieved length in Chrome and Safari matches the computed length,
but the value in Firefox does not.
A second limitation to computing the dimensions of SVG
<text> elements while creating them is that the height of
an SVG <text> element is not fully predictable from its
@font-size attribute. In the table above, we see that none of
the browsers sets a height of 16 and each of them sets a different height from
the others.[9]
A third limitation to computing the dimensions of SVG
<text> elements while creating them is that kerning
support in browsers is inconsistent, and user-controlled kerning support in SVG
and CSS is unreliable. Whether you see kerning applied in the horizontal,
vertical, and diagonal samples above will depend on your browser. The images
below are PNG screen captures of the three samples in the versions of Firefox
(left), Chrome (middle), and Safari (right) mentioned above. All three apply
kerning to the horizontal text; Firefox and Chrome also apply kerning to the
vertical and diagonal text, but Safari does not.
SVG <text> elements may have a @kerning
attribute (although support for kerning is slated for removal from SVG 2[10]) and CSS supports a font-kerning property. Browser
support for both of these features is too inconsistent and unpredictable to be useful.[11]
Implementation
Because this method requires knowing the widths of the individual glyphs while
constructing the SVG, we first extract width information from TrueType TTF
files, which we format as XML for ease of access during subsequent processing.
We do this with the script in Appendix A, but that
particular script is not a requirement, and any method of reading, exporting,
and formatting the font metric information can be used.[12] The output of our extraction process is an XML document that looks
like the following:
<metrics>
<metadata
fontName="Times New Roman"
fontPath="/System/Library/Fonts/Supplemental/Times New Roman.ttf"
fontSize="16"/>
<character dec="32" hex="0x20" name="space" str=" " width="4.0"/>
<character dec="33" hex="0x21" name="exclam" str="!" width="5.328125"/>
<!-- more characters -->
</metrics>
We use the XPath doc() function to access this document (assigned
to the variable $times-new-roman-16-mapping in this case) from
within the XSLT that creates our SVG, and we compute the length (in SVG units)
of the text strings with the following XSLT function:
This function explodes the string into a sequence of single-character strings,
looks up the widths of each of them in the mapping table extracted from the font
metrics, and sums the widths.[13]
Illustrations
In the horizontal butterfly chart below, the bars are labeled on the Y axis on
the right side, and a general label for the Y axis must be positioned to the
right of—but not too far to the right of—the longest bar label. Because we can
compute the lengths of the bar labels, we can find the length of the longest
bar, augment it with a predetermined amount of padding, and use that figure to
place the general Y axis label.
The vertical butterfly chart below illustrates a variant of the same task. In
this case we set @writing-mode to "tb" (top to bottom)
to create vertical text and use the longest bar label to compute the position of
the general X axis label:
The strategy for positioning objects at a specific distance from rotated text
is similar to the strategy for vertical text, except that instead of using the
length of the diagonal text directly, we use it as input to compute the vertical
space the rotated text occupies. We again start with vertical text and transform
it by rotating it away from the vertical with transform="rotate()".
rotate() takes three arguments: the angle of rotation (in
degrees) and the X and Y coordinates of the center of rotation, which we map to
the X and Y coordinates of the <text> element.[14] If we regard the length of the text as the hypotenuse of a right
angle that has been rotated counter-clockwise away from true vertical, the
vertical height of the rotated text is the adjacent side of the right triangle,
which we can compute as hypotenuse * cosθ.[15] The graph with diagonal bar labels looks like:
Discussion
Advantages
SVG created in this way looks, both in raw form and when rendered,
like the SVG we would have created by specifying the positioning as
we did previously, that is, by using trial and error to identify
positioning values that produce acceptable layout and then
hard-coding them. This means that adopting the XSLT approach
described here does not require changing how we work with the
resulting SVG, which is not the case with the JavaScript approach
described below.
SVG created in this way does not depend on specific browser (or
other user-agent) behaviors. Except for unpredictable browser
kerning (or other automatic glyph-modification behavior), SVG
<text> elements placed with this method
should be positioned as reliably as non-text SVG elements, and the
method should continue to function with the browsers of the
future.
The XSLT-based method described here does not depend on a browser
or other DOM-and-JavaScript-aware environment to compute the
placement and positioning of SVG elements after the DOM content has
been loaded. This means that this method may be easier to use
outside browser environments.
Because creating SVG in this way does not require JavaScript
programming, it poses a lower barrier to entry for XML developers
who are not also comfortable with JavaScript development.[16]
Disadvantages
This approach requires prior preparation of font metric
information, an extra step that also has scalability implications
because changing decisions about font families, font sizes, or other
font properties requires exporting and formatting separate metric
information for them. Over time developers might create libraries of
font metric information for reuse, but separate metric files will be
required for all combinations of font family, font size, and font
effects.
The method described above accesses only the glyph widths, which
represent a small part of the font metric information available from
a font file. A more robust and accurate implementation would employ
additional logic to deal with font effects (e.g., kerning,
ligatures, positional glyph variants) and transformations. That
information is present in the fonts, but the developer would have to
locate it in the font metric tables and then export it in a usable
format. We have not attempted to do that in this proof of concept
demonstration.
Because the metrics of client-side system fonts may differ in
unpredictable ways (the most extreme of which would be the complete
non-availability of a font used to compute the size of a
<text> element), developers may prefer to
rely on webfonts, which would need to be prepared and made
available. The approach described here is not compatible with font
stacks because font metrics cannot be expected to be consistent
across fonts, including across fonts commonly stacked.
The description above works with TTF fonts, but has not been
tested with other font formats, such as OTF and TTC—let alone font
formats of the future.
A JavaScript approach
Overview
The JavaScript approach leaves the SVG @width and
@height attributes unspecified if they cannot be computed at
the time the SVG is created, as is the case when the width and height depend on
the dimensions of <text> elements. When the SVG is eventually
loaded into a browser, a JavaScript function computes the missing dimensions and
writes them into the DOM in a way that controls the layout.[17]
Because the JavaScript approach relies on JavaScript applied in the browser
after the document loads, and on CSS that we specify, this method can be used
only where the developer has the file system or server permissions needed to
control these features. Because we cannot rely on being able to apply our own
CSS and JavaScript in the context of the Balisage Proceedings in which this report is being published, the output
of the JavaScript method is illustrated below with PNG screen captures.[18] The raw HTML+SVG for the mockup immediately below, with embedded CSS
and JavaScript, can be found in Appendix B. All of the
raw files used to create the examples based on the authentic Van Gogh data,
illustrated below, are available in our GitHub repository at
https://github.com/djbpitt/xstuff/tree/master/svg-text-size.
Implementation
Because we do not specify the width or height of the rectangular viewport of
our <svg> with explicit @width or
@height attributes, in the absence of our JavaScript
intervention the <svg> elements will assume whatever widths and
heights the browser assigns as defaults. Those default values are implementation
dependent, and therefore unpredictable and unusable for our purposes.[19] Additionally, <svg> elements default to a CSS
display value of inline-block, which we change to
block to provide more transparent control over explicit
positioning.
Our general strategy relies on HTML and CSS Flexbox to manage the layout of
the SVG components that cannot be positioned without information about the size
of <text> elements. For this purpose what would normally be
constructed as a single SVG document (e.g., a single <svg>
element that contains a bar graph with all of its labels) is divided into
multiple SVG documents, each wrapped in an HTML <div> or
other block-level element. We use CSS Flexbox to render the
<div> elements where they would have been rendered in a
single, unified SVG document.[20]
We illustrate our positioning strategy in the image above. In that image the
<body> contains three <section>
elements, one for each of the horizontal rows (outlined in red). The
display property of the <body> is set to
flex with a flex-direction value of
column, so that the <section> elements will
be rendered from top to bottom in the browser window. The height of an HTML
<section> is, unless specified explicitly in other ways,
determined by the height of its contents, which, without our JavaScript
interventions, would be whatever default height the browser assigns to the
embedded <svg> elements. To override that default we use
JavaScript to compute the actual height of the <svg>
descendants of each <section> and write those values into
@height attributes on the <svg> elements. Once
those attributes have been added, the height of a section will be determined by
the now declared height of the tallest <svg> it contains. We
use the CSS row-gap property to introduce a gutter between the
sections.
Each of the three <section> elements also has its
display property set to flex, this time with a
flex-direction value of row, so that its
<div> children will be distributed horizontally across
the row. We use the CSS column-gap property to introduce a gutter
between the <div> elements within each row (each HTML
<section>). Our JavaScript writes @width
attributes with computed values into the <svg> elements
alongside the @height attributes discussed above, and once the
JavaScript has created those attributes, the <div> elements
will occupy the horizontal space required for their SVG contents, overriding the
default width values that would otherwise have been supplied by the
browser.
The JavaScript that supplies explicit @width and
@height values to the <svg> elements
is:
window.addEventListener('DOMContentLoaded', init, false);
function init() {
const divs = document.querySelectorAll('div');
for (i = 0; i < divs.length; i++) {
bb = divs[i].querySelector('g').getBBox();
divs[i].querySelector('svg').setAttribute('height', bb.height);
divs[i].querySelector('svg').setAttribute('width', bb.width);
}
}
Our sample uses <div> elements only as wrappers for SVG
content (not elsewhere) and we place only a single SVG <g>
element inside each <svg> element, which simplifies our
subsequent processing. HTML structured differently may require fine-tuning, but
the general principle would be the same:
Structure SVG plots with separate <svg> elements
for the components that cannot be positioned, relative to one another,
without knowing the size of descendant <text>
elements. Use as few separate SVG documents as possible, wrapping each
one in an HTML <div> elements. Do not specify
@width or @height on the
<svg> elements.
Use CSS (in particular, Flexbox) properties to control the relative
(but not absolute) horizontal and vertical position of the
<div> elements. Control spacing between them with
the Flexbox gutter properties column-gap and
row-gap.
Use the JavaScript getBBox() method to determine the size
of the bounding box for each SVG component after it has been loaded into
the DOM and write the width and height into the <svg>
wrapper element as @width and @height
attribute values. This step converts the relative positioning above into
positioning that depends on the exact computed size of the
<svg> elements.
Illustrations
The visualizations of the Van Gogh data below were created with our JavaScript
method, and have the same content as those created with the XSLT method above.
For inclusion in the Balisage Proceedings we
opened each of them in Firefox and captured the screen as a PNG image with a
transparent background. The position of the Y axis label at the right of the
first image, and of the X axis label at the bottom of the second and third, is
controlled by our JavaScript function.
Discussion
Advantages
The biggest advantage of the JavaScript approach is that it does
not require the preparation of font metric information in advance,
which is a benefit both initially and with respect to
scalability.
The JavaScript approach adapts to the viewing environment because
it reads the actual dimensions of the SVG objects after their
rendering information has been computed within the browser. This
means that the dimensions will be correct for any font, including
system fonts and fonts determined according to a font stack.
Because the JavaScript approach has access to the actual rendered
dimensions, it does not require additional logic (that is,
additional developer intervention) to deal with font effects (e.g.,
kerning, ligatures, positional glyph variants). The
getBBox() method computes width and height after
any such font adjustments have been applied.
Because the JavaScript approach relieves the developer entirely of
the responsibility of computing the dimensions of SVG
<text> elements by off-loading that task onto
a JavaScript function that is executed in the browser, it removes an
opportunity for user error.
Disadvantages
The JavaScript approach requires JavaScript programming knowledge,
which cannot be assumed for all XML-stack developers, and the same
is true with respect to knowledge of CSS Flexbox. The actual CSS and
JavaScript required is light and the examples provided here can
serve as models, but both the CSS and JavaScript employed here will
typically have to be adjusted according to the specific structures
employed in individual projects.
Because we rely on HTML and CSS to support the relative
positioning of the SVG components of what is conceptually a single
image, this method is not convenient for the creation of stand-alone
SVG.
Having to split what is conceptually a single SVG graph or chart
into multiple <svg> elements, each inside its own
HTML <div>, complicates the specification of the
components of the SVG and thus creates an additional potential locus
for miscalculation by the developer.
Although the user cannot miscalculate the dimensions of an SVG
<text> element with the JavaScript method
because the user does not perform that calculation directly, the
user can misplace a component in the Flexbox portion of the
implementation.
Centering text in an ellipse
Centering text inside an SVG ellipse is easy because the SVG
<ellipse> element includes a specification of its center
(@cx and @cy), and those same values can be repurposed as
the @x and @y values of a contained <text>
element, which will center the text inside the ellipse if the <text>
element also specifies the value of both @dominant-baseline and
@text-anchor as "middle". If the length of the ellipse on
the X axis needs to adapt to the length of the text the ellipse will contain, we can
use
the XSLT method described above as input into computing the horizontal radius
(@rx value) of the ellipse (half of the length of the contained text
plus however much padding we want to include between the sides of the text and the
surrounding ellipse), and also to position a row of ellipses with even spacing. The
code
in Appendix C takes a sequence of words that will be familiar to a
Balisage audience, surrounds each with an ellipse that is sized according to the length
of the word, and spaces the ellipses evenly, producing the following output:
Loose ends and future directions
Our XSLT approach assumes that the font metrics that can be extracted from the font
files are sufficient to compute the length of an SVG line of text. Our implementation
does not attempt to access kerning information, and it has not been tested on writing
systems that involve character substitution (including positional glyph variation
or
ligation) or glyph reordering. We have confirmed that our XSLT approach is not impacted
negatively by zero-width characters as long as the font metrics for those glyphs
correctly report a width of zero, as is the case with the Times New Roman font that
we
used for testing and illustration.
Saxon-JS relies on a JavaScript implementation of the Saxon XSLT engine that is
designed to perform transformations inside the browser. Insofar as Saxon-JS incorporates
both XSLT and JavaScript functionality, it may support a cleaner integration of our
two
approaches. We do not explore a Saxon-JS strategy here.
Our definition of the problem to be solved presumes that an algorithmic solution is
appropriate. Our experimentation above confirms that we can ask XSLT or JavaScript
to
compute the dimensions of SVG <text> elements, dimensions that we had
previously regarded as unknowable (that is, not knowable in the same way as the
dimensions of SVG elements like rectangles, circles, and lines), and we can use that
information to automate the placement of other SVG components that must be positioned
in
ways that depend on the dimensions of <text> elements. At the same
time, algorithmic layout as described and implemented here is a crude method because
it
is not sensitive to context-specific details. For example, we have assumed that the
general labels on X and Y axes should be positioned not too close to and not too far
from the longest data labels on those axes, but those assumptions, although
self-evidently correct as far as they go, are insufficient. In practice, a human layout
designer might want to customize the positioning according to specific features of
the
data, e.g., by labeling an axis in a way that is closer than the distance dictated
by
the longest data label, but that happens not to overlap that label. We have mocked
up
the following example to illustrate the possible placement of an axis label with
coordinates that would lead to overlap with the longest data label except that the
longest data label happens to be conveniently out of the way:
This sort of detail is also computable in principle, although it would require taking
more information into consideration, and we have not attempted to do that here.
Conclusion
The goal of this report has been to identify strategies for working around the fact
that SVG <text> elements do not know their length (and, to a lesser
extent, their height), which makes it difficult to position other elements in relation
to them without a combination of guesswork and a potentially tedious trial-and-error
approach. Furthermore, manual positioning is brittle under reuse because it requires
new
human intervention whenever the visualization code is applied to different data.
We have identified, implemented, illustrated, and discussed two approaches to meeting
this need, one in XSLT and one in JavaScript. Each of these approaches has advantages
and disadvantages, and both have been shown to work with the authentic examples used
here. We have identified some limitations and some directions for future improvement,
but even with their limitations, both methods provide realistic alternatives to
guesswork, trial-and-error, and magic numbers.
Appendix A. Script to extract font widths as XML
The following Python script exports the widths of all glyphs in a TrueType font. The
script has two positional arguments, a required path to the font and an optional point
size. If no font name is supplied, or if the supplied font is not found, the program
gives the names of all fonts that it does find. The point size defaults to 16, which,
unless modified with CSS, is the size of SVG text that either does not specify a
@font-size or specifies a size of "medium".
#!/usr/bin/env python
# https://stackoverflow.com/questions/4190667/how-to-get-width-of-a-truetype-font-character-in-1200ths-of-an-inch-with-python
# https://fonttools.readthedocs.io/en/latest/index.html
# https://www.geeksforgeeks.org/create-xml-documents-using-python/
# https://stackoverflow.com/questions/678236/how-to-get-the-filename-without-the-extension-from-a-path-in-python
from fontTools.ttLib import TTFont
from fontTools.ttLib.tables._c_m_a_p import CmapSubtable
from xml.dom import minidom
from matplotlib import font_manager
from pathlib import Path
import argparse
import pprint
pp = pprint.PrettyPrinter(indent=2)
# Validate fontname
# https://stackoverflow.com/questions/15203829/python-argparse-file-extension-checking
def validateFont(fontName):
"""Find full font system path from bare name (without extensions)"""
installed_fonts = {Path(item).stem: item for item in font_manager.findSystemFonts()}
return installed_fonts.get(fontName) # returns None on KeyError
def allInstalledFonts():
"""Report all available fonts if user supplies erroneous value"""
installed_fonts = {Path(item).stem: item for item in font_manager.findSystemFonts()}
return sorted(installed_fonts.keys())
# Handle command line arguments: font name and optional size
parser = argparse.ArgumentParser()
parser.add_argument("ttf", help="TrueType font name without extension (quote names with spaces)")
parser.add_argument("size", help="size in points (defaults to 16pt)", type=int, nargs="?", default=16)
args = parser.parse_args()
fontName = args.ttf
size = args.size
fontPath = validateFont(fontName) # returns None for erroneous value
if not fontPath: # bail out if font not found
print(f"Font '{fontName}' not found. Legal fontnames are:")
pp.pprint(allInstalledFonts())
quit()
# from StackOverflow
font = TTFont(fontPath, fontNumber=0) # BUG: breaks on ttc, even with fontNumber; table is different?
cmap = font['cmap']
t = cmap.getcmap(3,1).cmap # map of decimal values to glyph names
s = font.getGlyphSet()
units_per_em = font['head'].unitsPerEm
def getTextWidth(text,pointSize):
total = 0
for c in text:
if ord(c) in t and t[ord(c)] in s:
total += s[t[ord(c)]].width
else:
total += s['.notdef'].width
total = total*float(pointSize)/units_per_em;
return total
# from minidom documentation
root = minidom.Document()
xml = root.createElement('metrics')
root.appendChild(xml)
metadata = root.createElement('metadata')
metadata.setAttribute('fontName', fontName)
metadata.setAttribute('fontPath', fontPath)
xml.appendChild(metadata)
c_dict = dict()
for num_dec in range(65535): # entire BMP; decimal Unicode value
char = chr(num_dec) # character as string
c_dict[char]= getTextWidth(char, size) # default SVG font-size is 16 (medium)
for item in c_dict.items(): # string-value : width
char = item[0] # string value of character
num_dec = ord(char) # Unicode value (decimal)
num_hex = hex(num_dec) # Unicode value (hex)
width = item[1] # glyph width
if num_dec in t: # not all values are present in font
name = t[num_dec] # look up name by decimal value
e = root.createElement('character')
e.setAttribute('str', str(char)) # attribute have to be set as strings
e.setAttribute('dec', str(num_dec))
e.setAttribute('hex', str(num_hex))
e.setAttribute('width', str(width))
e.setAttribute('name', name)
xml.appendChild(e)
# serialize and render XML
xml_str = root.toprettyxml(indent=" ")
print(xml_str)
The output is an XML document that looks like the following:
<metrics>
<metadata
fontName="Times New Roman"
fontPath="/System/Library/Fonts/Supplemental/Times New Roman.ttf"
fontSize="16"/>
<character dec="32" hex="0x20" name="space" str=" " width="4.0"/>
<character dec="33" hex="0x21" name="exclam" str="!" width="5.328125"/>
<!-- more characters -->
</metrics>
The SVG layout strategy described here uses the @str value to retrieve
the @width value. The @dec (decimal) and @hex
(hexadecimal) values are not currently used.
Appendix B. Sample HTML for JavaScript method
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- ================================================================ -->
<!-- Illustrates JavaScript mediated layout of SVG <text> -->
<!-- -->
<!-- <body> has <section> children arranged vertically and outlined -->
<!-- in red -->
<!-- <section> elements have <div> children arranged horizontally -->
<!-- and bordered in blue -->
<!-- ================================================================ -->
<head>
<title>JavaScript method mockup</title>
<style type="text/css">
svg {
display: block;
overflow: visible; # for tightly rotated text
}
body {
display: flex;
flex-direction: column;
row-gap: 1em;
}
section {
display: flex;
flex-direction: row;
column-gap: 1em;
outline: 4px red solid;
}
div {
border: 2px blue solid;
}</style>
<script type="text/javascript">
<![CDATA[
window.addEventListener('DOMContentLoaded', init, false);
function init() {
const divs = document.querySelectorAll('div');
for (i = 0; i < divs.length; i++) {
bb = divs[i].querySelector('g').getBBox();
divs[i].querySelector('svg').setAttribute('height', bb.height);
divs[i].querySelector('svg').setAttribute('width', bb.width);
}
}//]]></script>
</head>
<body>
<section>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" dominant-baseline="middle" dx=".5em">Sample vertical
text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-30)">Sample diagonal text (30º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-60)">Sample diagonal text (60º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
</section>
<section>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" dominant-baseline="middle" dx=".5em">Sample vertical
text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-30)">Sample diagonal text (30º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-60)">Sample diagonal text (60º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
</section>
<section>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" dominant-baseline="middle" dx=".5em">Sample vertical
text</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-30)">Sample diagonal text (30º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" writing-mode="tb" font-family="Times New Roman"
font-size="16" transform="rotate(-60)">Sample diagonal text (60º)</text>
</g>
</svg>
</div>
<div>
<svg xmlns="http://www.w3.org/2000/svg">
<g>
<text x="0" y="0" font-family="Times New Roman" font-size="16"
dominant-baseline="middle" dy=".5em">Sample horizontal text</text>
</g>
</svg>
</div>
</section>
</body>
</html>
Appendix C. XSLT to size ellipses according to contained text
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:djb="http://www.obdurodon.org"
xmlns:math="http://www.w3.org/2005/xpath-functions/math" exclude-result-prefixes="#all"
xmlns="http://www.w3.org/2000/svg" version="3.0">
<xsl:output method="xml" indent="yes"/>
<!-- ================================================================ -->
<!-- Variables: data -->
<!-- ================================================================ -->
<xsl:variable name="texts" as="xs:string+"
select="('There', 'is', 'nothing', 'so', 'practical', 'as', 'a', 'good', 'theory')"/>
<xsl:variable name="text-lengths" as="xs:double+" select="$texts ! djb:get-text-length(.)"/>
<!-- ================================================================ -->
<!-- Variables: font -->
<!-- ================================================================ -->
<xsl:variable name="times-new-roman-16-mapping" as="document-node()"
select="doc('times-new-roman-16.xml')"/>
<xsl:key name="lengthByChar" match="character" use="@str"/>
<xsl:variable name="font-size" as="xs:double"
select="$times-new-roman-16-mapping/descendant::metadata/@fontSize"/>
<!-- ================================================================ -->
<!-- Constants -->
<!-- ================================================================ -->
<xsl:variable name="inter-ellipse-spacing" as="xs:integer" select="20"/>
<xsl:variable name="text-x-padding" as="xs:integer" select="10"/>
<xsl:variable name="y-pos" as="xs:double" select="$font-size div 2"/>
<!-- ================================================================ -->
<!-- Functions -->
<!-- ================================================================ -->
<xsl:function name="djb:get-text-length" as="xs:double" cache="yes">
<!-- ============================================================ -->
<!-- djb:get-text-length -->
<!-- -->
<!-- Parameters: -->
<!-- $in as xs:string : input text string -->
<!-- -->
<!-- Returns: length of text string as xs:double -->
<!-- ============================================================ -->
<xsl:param name="in" as="xs:string"/>
<xsl:sequence select="
string-to-codepoints($in)
! codepoints-to-string(.)
! key('lengthByChar', ., $times-new-roman-16-mapping)/@width
=> sum()"/>
</xsl:function>
<xsl:function name="djb:x-pos" as="xs:double">
<!-- ============================================================ -->
<!-- djb:x-pos -->
<!-- -->
<!-- Parameters: -->
<!-- $text-offset as xs:integer : offset of string in sequence -->
<!-- -->
<!-- Stylesheet variables used: -->
<!-- $inter-ellipse-spacing as xs:integer : between edges -->
<!-- $text-x-padding as xs:integer : padding on each side -->
<!-- -->
<!-- Returns: x position for center of ellipse and text -->
<!-- sum of: all preceding string widths -->
<!-- 2 * padding for all preceding -->
<!-- inter-ellipse-spacing for all preceding -->
<!-- half width of current -->
<!-- padding left of current -->
<!-- ============================================================ -->
<xsl:param name="in" as="xs:integer"/>
<xsl:sequence select="
$text-lengths[position() lt $in] => sum() +
$inter-ellipse-spacing * ($in - 1) +
$text-x-padding * ($in - 1) * 2 +
$text-lengths[$in] div 2 +
$text-x-padding"/>
</xsl:function>
<!-- ================================================================ -->
<!-- Main -->
<!-- ================================================================ -->
<xsl:template name="xsl:initial-template">
<!-- ============================================================ -->
<!-- Compute X values for @viewBox -->
<!-- ============================================================ -->
<xsl:variable name="text-count" as="xs:integer" select="count($texts)"/>
<xsl:variable name="left-edge" as="xs:double" select="
djb:get-text-length($texts[1]) div 2 +
$text-x-padding +
$inter-ellipse-spacing (: extra padding at start :)"/>
<xsl:variable name="total-width" as="xs:double" select="
sum($text-lengths) +
$text-x-padding * 2 * $text-count +
$inter-ellipse-spacing * ($text-count - 1) +
$left-edge"/>
<xsl:variable name="padding" as="xs:integer" select="$inter-ellipse-spacing div 2"/>
<!-- ============================================================ -->
<!-- Create SVG -->
<!-- ============================================================ -->
<svg viewBox="
-{$left-edge + $padding}
-{$font-size + $padding}
{$total-width + 2 * $padding}
{($font-size + $padding) * 2}">
<g>
<xsl:for-each select="$texts">
<xsl:variable name="text-offset" as="xs:integer" select="position()"/>
<xsl:variable name="x-pos" as="xs:double" select="
djb:x-pos($text-offset) (: x center of ellipse :)"/>
<ellipse cx="{$x-pos}" cy="{$y-pos}"
rx="{$text-lengths[$text-offset] div 2 + $text-x-padding}" ry="{$font-size}"
fill="none" stroke="black" stroke-width="1"/>
<text x="{$x-pos}" y="{$y-pos}" dominant-baseline="middle" text-anchor="middle"
font-family="Times New Roman" font-size="{$font-size}">
<xsl:value-of select="."/>
</text>
</xsl:for-each>
</g>
</svg>
</xsl:template>
</xsl:stylesheet>
[1] If the user does not specify a font size, the default is 16 pixels, which
corresponds to specifying the value "medium" for a
@font-size attribute on the SVG <text>
element. See below concerning the actual height of SVG <text>
elements, which is not the same as the value of the declared or implicit
@font-size.
[2] The sample graphs for this report are based on materials from Charlie Taylor’s
Van Gogh as a ‘tortured genius’
(http://vangogh.obdurodon.org,
https://github.com/charlietaylor98/vangogh-gang), which she
undertook in Spring 2020 together with Colin Woelfel, Nate McDowell, and Rachel
Saula as part of David J. Birnbaum’s Computational methods in the
humanities course (http://dh.obdurodon.org) in the
University Honors College at the University of Pittsburgh.
[3] There is no data for two of the places because the developers used all place
names in their corpus for all graphs even though not all places included data
for every graph. In this example, letters written in Brussels and Nuenen within
the corpus did not address mental health or stress factors, although letters
from those places did contribute data to graphs of other topics. The place names
are sorted and labeled according to their first appearance in the corpus, and
the labels are not meant to imply that Van Gogh did not return to some of those
places in later years.
[4] Placing the general horizontal X axis label more precisely in a way that takes
into account the lengths and positions of all of the bar labels (and not only
the length of the longest one) is a more complicated version of the same task.
For example, if the leftmost and rightmost bar labels are the longest, a human
designer might choose to place the general X axis label, if it is short enough,
between them, instead of fully below them. Similarly, a human designer might
choose, if the bar labels happen to be longer on one side of the graph than the
other, to place the general X axis label on the shorter side, below the bar
labels on that side, instead of centering it.
Our modest goal is simply to place the general label for the X axis with
reliable precision below the longest bar label. Whether the more exact placement
described above would be worth the extra effort cannot be determined in any
general way, but because it builds on the same methods as the simpler task, it
could, at least in principle, be implemented as an extension of the solution to
that simpler task by taking more measurements into consideration when placing
the label.
[5]Character refers to an informational
unit of writing independently of its presentation. Glyph refers to a font resource used to render a
character. A text string in our SVG is made up of characters; the
rendering of the text inside a browser is a made up of glyphs.
The distinction between character and glyph is important in many
contexts because it is not necessarily one to one. For example, the
writing unit "ä" may be represented as either a single
composite character (Unicode U+00E4 Latin Small Letter A with
Diaeresis) or a sequence of two Unicode characters, one for
the base character "a" (U+0061 Latin Small Letter
A) and one for the combining diacritic (U+0308
Combining Diaeresis). Whether a writing unit is
represented as two (or more) characters informationally or as one is
independent of whether it is rendered in a browser or other platform as
two (or more) glyphs or as one.
In this report we use both terms because we focus on situations where
the mapping between characters and glyphs is one to one, or where the
distinction does not affect the computation of text length.
[6] We are grateful to the colleagues who participated in the discussion we opened
about this topic in the xml.com Slack workspace, and especially to Gerrit
Imsieke and Liam Quin. Slack postings are ephemeral, but although that
conversation will not be preserved, we are happy to be able to acknowledge it,
more sustainably, here.
[7] Browsers interpret the values of the SVG
@dominant-baseline attribute inconsistently. See Gudehus 2021 for discussion, an interactive diagnostic
interface, and a suggested work-around, which we have adopted where
needed in this report.
[8] We tested these examples under MacOS 11.3.1 (Big Sur) with Chrome
Version 90.0.4430.212 (Official Build) (x86_64), Firefox Developer
89.0b11 (64-bit), and Safari Version 14.1 (16611.1.21.161.6).
[9] The common assumption that a @font-size value specifies
the vertical distance between baselines is incorrect because, among
other things, the actual spacing depends on the font design, and not
only on font size. See De Oliveira 2017 for details and
discussion, as well as Chase 2019. In principle it
should be possible to extract the actual height from the font metrics,
as we do for character width, but we have not explored that possibility
for this report.
[11] With no SVG @kerning attribute, with an SVG
@kerning attribute value of "auto" (the
default, which enables kerning), and with an SVG @kerning
attribute value of "0" (disables kerning), kerning was
applied consistently to horizontal text in all three browsers and to
vertical (@writing-mode="tb") text in Firefox and Chrome,
but not in Safari. The CSS font-kerning property worked as
advertised in horizontal text in all three browsers, and had no effect
on vertical text in any of them: kerning was always applied to vertical
text in Firefox and Chrome and never applied to vertical text in
Safari.
Kerning can be either positive or negative. Negative kerning decreases
the space between glyphs, and is used where allowing the bounding boxes
around two characters to overlap makes them appear more evenly spaced to
a human, as is the case with the sequence AV. Positive
kerning increases the space between glyphs, and may be used when rounded
letters would otherwise look too close together (e.g.,
OC). Because kerning contributes, positively or
negatively, to the overall length of the text, the sum of glyph widths
in the string may only approximate the actual width of the bounding box,
since some <text> elements may become longer or (less
commonly) shorter when kerning is applied. It is possible to extract
kerning information from a font file, as we already do with glyph
widths, but because, for reasons described above, it is not possible to
control kerning reliably in the rendering, we cannot be confident that
our computed widths would match the rendered widths even with that
additional adjustment.
[12] In its current form our script to extract glyph widths processes only
TTF fonts and fails on, for example, TTC and OTF. Those font formats
also include metric information, which means that we could amend our
script to process them, as well.
[13] The function can also help find the bounding box of multiline text
where each line is a <tspan> child of the
<text> element. The length of the
<text> element then becomes the length of its
longest child <tspan> element, and the height of the
<text> element becomes the sum of the heights of
all <tspan> child elements.
For that matter, we could reverse our perspective and instead of
letting the text length tell us the length of the SVG
<text> element, we could use our function to
compute the lengths of individual words as input into a line-wrapping
routine that breaks a single text string into <tspan>
elements that fit within a predetermined <text> size.
We have not attempted these enhancements in this report, but they are
natural next steps insofar as they depend on the same type of
information as the <text> handling that we do
implement.
[14] This approach rotates the text around its starting position (left edge
and vertical center in this left-to-right example), so that the upper
left corner of the diagonal rendering protrudes slightly to the left and
top of the center of rotation. Because we do not require high precision,
we ignore these protrusions in positioning our elements, and we set the
CSS overflow property to visible where needed
to ensure that the image is not cropped. An alternative approach might
be to add single space characters at the beginning and end of the string
before performing our computation, so that any overlap after rotation
would affect only the invisible space characters, and not the visible
text.
[15] Inconveniently, SVG rotation requires that angles be expressed in
degrees, while XPath trigonometic functions in the math:
namespace require that they be expressed in radians. We convert degrees
to radians with radians = degrees * π /
180. The width of the text, should we need it, is the
opposite side of the angle, and its length can be determined as
hypotenuse * sinθ.
[16] We used Python to create XML font metrics files, but
Python programming knowledge is not a requirement to access
font metric information inside font files. Among other
things, there is an open-source stand-alone command-line
ttx program that can
export metric information from fonts as XML. See An intro to FontTools for more information.
[17] Our JavaScript method, as applied to real data (see the Van Gogh
examples, below), uses XSLT to create the SVG. What we mean, then, when
we refer to this method as a JavaScript approach
(contrasted to an XSLT approach, above) is that we use
JavaScript, rather than XSLT, specifically to compute the size of SVG
<text> elements so that we can use those sizes to
position other SVG elements. Except for the dimensions of
<text> elements, both methods use XSLT
computation to transform XML source documents to SVG.
[18] Our screen captures in this case are taken from the Firefox version
identified above, but we have verified that the method yields a
comparable rendering (except for cosmetic differences in browser
defaults) in Chrome and Safari.
The default size for HTML replaced elements will be used:
300px wide, 150px tall. This applies for <img>,
<object> or <iframe>. The
default 300×150 size also applies to inline
<svg> elements within HTML documents, but
that’s a relatively recent consensus from the HTML5
specifications: other browsers will by default expand inline SVG
to the full size of the viewport—equivalent to width:
100vw; height: 100vh; — which is the default size for
SVG files that are opened directly in their own browser tab.
Internet Explorer cuts the difference, using width of 100% and
height of 150px for images and inline SVG. (Bellamy-Royds 2017)
[20] The <div> wrappers are not strictly necessary
because we could, alternatively, have applied CSS rules to the
<svg> elements directly. For that matter, we
could even have applied the JavaScript getBBox() function
to individual SVG <text> elements and arranged those
using CSS Flexbox. The hierarchical level at which to manage the
in-browser positioning is a matter of preference, and we found our
approach the simplest for us to understand and implement.