Documents, text, and models

SOCRATES: Suppose that when a person asked you... either about figure or colour, you were to reply, Man, I do not understand what you want.... Meno, he might say, what is that simile in multis which you call figure, and which includes not only round and straight figures, but all? Could you not answer that question, Meno? I wish that you would try; the attempt will be good practice with a view to the answer about virtue....

— Plato, Meno

XML in the text world

Much XML usage involves text documents, where a large part of the content consists of readable natural language sentences (though a large share also include non-text parts[1]). We have a fairly well-developed understanding of how to model text documents and of the tradeoffs involved. But XML documents are not all primarily text; some have little or no text at all. This paper examines how common document models we use in the text world fit one quite different kind of data: vector graphic images, and in particular their very common application for many kinds of diagrams. SVG is a common XML schema for such data, and I will use it in many examples; but other schemas are also available.

Our models for text documents are not only orthographic but also conceptual, categorizing parts of documents in terns of their rhetorical or broadly semantic significance. Some such models enumerate essential features of documents: properties whose equivalence seems necessary and sufficient for two artifacts to be considered the same document for various purposes, roughly glossed as being the same work. Renear and Dubin (2003) provide a cogent analysis of such sameness conditions. Different models serve different needs, from printing to paleography to genre studies. Some of the essentialist models are associated with the shorthand OHCO, an acronym based on the claim:

(1) A document is an ordered hierarchy of content-based objects.

Order in (most) text documents seems largely uncontroversial. Hierarchy merely notes that parts posited in a document can nest. Content-based means that the objects' categories (ideally) reflect conceptual significance rather than layout (for example heading rather than big bold Helvetica). Content-based expressed that the objects posited in the text are of types intended to reflect the natural (or natural-language) kinds found in the text itself, plus tables, illustrations, etc. when more than decorative. In these models, the focus is on assigning document parts to classes (categories) based on their rhetorical or ontological significance, not appearance.

Objects is reminiscent of Object-Oriented programming, but the usage is not identical.[2]

One similarity is that document components fall into classes, which are (a) defined for each application but (b) commonly mirror real world conceptual categories. Document objects also typically have data such as XML attributes and content, as well as self-identity and connections (interaction) with others.

Like in OOP, document objects have behaviors, which are largely associated with classes but commonly refer to properties of instances. Behaviors may be thought of slightly differently, however: Document systems often isolate them from data, on the rationale that a given class may need wildly different behaviors in different contexts. A date object may be displayed in a special font and color when printing, but spoken in a distinct tone by a speech interface. It may lead to wildly different displays in a calendar tool; and it may be treated in yet other ways for a retrieval system, or systems not yet imagined. Still, the fact of being a date, personal name, heading, abbreviation, quotation, or cross-reference leads to some behavioural expectations.

For most objects in very generic schemas as discussed earlier, the assignable behaviors are commonly chosen from a small set largely involving font, color, margins, and wrapping. A few types require less generic but still conventional support: footnotes, index entries, auto-numbered lists, links, and so on.

Other much more specialized schemas commonly require specialized rendering and other behaviors. Forms, MathML, SVG, and instrument telemetry schemas fall in this class: rendering them with generic features such as font choices or colors is much less useful than rendering abbreviations, emphasis, foreign words, and defined terms using only those same choices.

Objects in documents may tend more strongly to be ordered than is typical in OOP, and make different use of hierarchy. OOP typically organizes object classes into is-a (ontological) subclasses, which inherit both attributes and methods. Some OOP classes also organize instances into has-part (mereological) structures, but many do not. Document models tend to reverse that focus: content objects are nearly always nested, creating mereological structures, while the ontological relationships between content object classes may be left to documentation, left implicit, or even left unconsidered.

That is not to say ontology is absent or unimportant. The field treats of superclasses all the time: lists, sections, links, form elements, block vs. inline elements, soup, and so on. Many schemas developed conventions to express subclasses: In SGML this was often done via parameter entries grouping various elements or attributes and providing a name, such as "heading" in general, or "soup". HyTime architectural forms, XML namespaces, HTML class attributes, and XSD complex types do similar service.

XML and schema languages provide no means of determining whether elements are content-based, either for text or for diagrams. However, OHCO and similar models argue that such elements express more essential properties, and document representations that use them better support preservation, update, and re-use.[3]

Modelling levels

OHCO and similar models of documents are often contrasted with 3 other levels of abstraction:

  1. Image representations, where even the fact that some appearance is an instance of the letter e requires an interpretative act;

  2. Grapheme representations, where orthographic symbols are explicit, including spaces and punctuation that mark a few kinds of components. Symbols form a very small vocabulary, and are rarely generatively combined or coined. This level represents very limited objects (although conventional additions commonly to creep in).

  3. Layout representations, where content is grouped into geometric objects such as pages, columns; and text boxes (possibly auto-wrapped, possibly absolutely positioned). Some regions may correspond to content-based objects, but they are not those things per se -- for example, a rhetorical paragraph may be broken across regions (even pages), with other regions geometrically but not logically between.

  4. What I will call Object representations or descriptive markup, where content is grouped into larger meaningful units as described earlier.

Coombs et al. (1987) refer to procedural markup rather than layout, focusing on the instructions often used to create a layout, as distinct from the layout itself. Changing margins, font, justification, etc. can be performed by selecting individual scopes and setting properties in a GUI (like many word processors); inserting program-like command (as in troff); or choosing a canned layout. The common factor is focus on physical layout for a target medium. Fundamentally, people choose layouts to convey conceptual distinctions (as well as for aesthetic reasons), not the reverse: one does not simply walk into text turning layout into quotations, stage directions, or chapters because of appearance – rather the reverse.

The levels of abstract models differ along in a variety of specific features. One feature to note about this taxonomy of models is that the objects used at each level are drawn from different domains with very different cardinalities, and whose members typically combine in quite distinct ways. Respectively, the domains can be glossed as (1) colors; (2) orthographic symbols; (3) symbols plus regions such as rectangles; and (4) rhetorical and semantic categories.

A second feature is that these levels all order their objects, but in different ways: 1 and 3 are dominated by spatial placement, while 2 and 4 are dominated by nested linear orderings (which only sometimes correspond to space).

A third key feature is the sort of meanings the objects bear.

At the Image level, the units of representation are pixels or particles of ink, which have practically no meaning singly. But all the higher levels involve signs.[4] Orthographic systems seem to lack features aimed directly at representing the visual regions of the Layout level, or the amorphous visual atoms of Image level -- unless perhaps one counts lead type sorts.

A fourth key feature is whether objects are composed of other objects (not counting hierarchy in the grammar of natural language itself).

Pixels per se form rows and columns, but many image technologies do not use them. Graphemic punctuation can express a few nestable categories such as word, sentence, paragraph, and quotation[5] A Layout representation may be hierarchical or not. TEX can stack boxes generally, and CSS has a nested box model, recursive table layouts, and absolute positioning (thus enabling nearly any geometry). The Object level differs by supporting hierarchical structures in general, rather than particular (linguistic or geometric) ones.[6]

The Object level can represent objects that are not as easily excluded as part of language. Authors often mark objects that are distinct in their content but not in the linguistic system per se. For example, even users of FRESS would create macros for components such as axioms, lemmas, etc. DocBook similarly distinguishes many components of technical manuals, such as key-names, option values, and so on.

Sometimes XML imposes hierarchy unnecessarily or unambiguously. For example, a phrase that is both a foreign phrase and a definitional use of a term could be tagged with either on the outside, intending no difference in meaning. Few document representations other than LMNL and standoff markup directly provide for truly coterminous elements, but doing so is not especially difficult (e.g. <foreign+defn>). This edge case arises even in layout-oriented schemas that differ in whether they provide only bold and italic, or also a combined 'bolditalic' construct.

Beyond hierarchy

Natural language has numerous non-hierarchical and even unordered phenomena, such as pronouns, coreference, alternative word orders, interrupted dialog, cross-reference, and multiple attachment of noun phrases that serve roles in both a main and a subordinate clause. In addition, there are often multiple perspectives to annotate, which do not nest neatly with each other. Adequate document models must give some account of such phenomena as well.

Syntax vs. model

It is well known that many general syntaxes can represent data even beyond their design spaces. A pixel can be modeled as field. Anything in computer storage can be modeled by a long string of bits (and, at bottom, is). As is important for images. Any arrangement of boxes and text can be represented by a list of a handful of basic objects like

    <text x="72" y="90" font="Helvetica 12 bold">Introduction</text>
    <rect x1="60" y1="70" x2="300" y2="122" lineColor="#FF0000" />

XML can be represented by a linear sequences of SAX-like events, which are in turn easily represented even in CSV. XML is particularly good for documents not because of syntax details, but because its native constructs map readily to document models which have proven useful for serious work with non-ephemeral text documents. Here we will examine how and to what extent this holds true for diagrams as well.

The Essential Document

As mentioned earlier, the claim that some model is close to what text is, really is largely a claim about essentials:

  • What remains across all versions of the same document?

  • What, if changed, suffices to make something no longer the same document?

These questions depend on the purpose at hand: For some purposes, only the very same physical artifact will do; for a slightly broader range of purposes, only a very accurate facsimile. For many purposes, a document with the same graphemic text and at least roughly similar layout is fine; for example, layouts that enable the reader to infer the same categories for the same document portions. A large print edition, different fonts, or another page size is not "different" for everyday purposes. Consequent mechanical changes such as ligatures, soft hyphen use, and completely different page decoration such as headers and page numbers, go a bit further. Even renumbering figures, lists, or items may be a non-essential change.

The boundary between essential and inessential changes to a document can often be illuminated through considering the visually impaired reader. A document printed in Braille or read aloud, is for many purposes "the same document", even considering Braille symbols that signal concepts more like typography than orthography. Similarly, an audio text must in many cases represent emphasis and text object boundaries somehow.

Because text can use the full range of natural language abilities to refer to itself and its properties, it is easy to construct cases where even small changes become essential. A document may refer to its own formatting (a book about typography may do this a lot), or an active reader may refer to layout details, as Fermat referenced regrettably small margins. St. Paul even does this in manuscript: See with what large letters I am writing to you with my own hand. (Gal. 6:11). Thus, no property of a text document is in principle non-essential.

On the other hand, even large changes can be inessential if they preserve consistency: cross-references such as see page 27 or section VII, or guides such as indexes and tables of contents. In short, changes must not impact readers' ability to perceive intended assertions or distinctions.

For text documents, having the same sequence of text characters seems typically essential, which is approximately the FRBR level of Expressions. Nevertheless, changes to coded character sets, ligatures, hyphenization, modernizing away the English long s (ſ), or changing between Hiragana and Kanji or Latin letters and Braille, could change every character in the text yet may (arguably) not make an essential difference.

Document structure components pose issues similar to characters: Changing the encoding or markup, or even changing the schema to have completely different names but the same abstract meanings, seems as inessential as changing coded character sets. But merging speeches in a play, turning a quotation into a plain paragraph, or messing with the nesting relationships of sections, all seem essential.

Essentials

With respect to text documents, a key argument for OHCO-like models is that these content-based objects are, like the words, essential to the text: they persist across various expressions of the text, in a way that layout does not. For example:

Changing whether a given span of text is a quotation or not can radically change the meaning, for example by causing the reader to associate certain beliefs, sentiments, or actions with the wrong party.

Changing spacing or punctuation (which can be considered forms of markup) can change meaning, as in We are now here vs. We are nowhere, or the oft-memed Let's eat, grandma.

Losing the emphasis tag in World hunger is not a problem completely reverses the meaning.

Cases like tagging <foreign> and <emph> both as <italic> also seem to compromise the essentials of a document (even if that might not be visible to the reader), much as substituting same-shape Greek or Cyrillic Unicode letters for Latin ones seems misleading even though the result may be visually identical, or narrowly transcribing a speaker who cannot pronounce certain phonemes distinctly.

Beyond such essentials, XML is commonly used to attach information that is not considered part of the document at all -- for example, part-of-speech tags or other linguistic annotations; metadata that typical readers never see; red-lining or other representation of document editing history.

Isomorphism

OHCO-like models speak of making content-based objects explicit. This involves assigning concrete names (or numbers, etc) to abstract notions such as quotation or section, and concrete scopes to their instances. In a particular representation such as XML, it further involves representing those named objects with actual syntax. At all these stages, alternative choices are possible, meaning that multiple concrete expressions can represent the same abstract structure. For example:

  1. Most syntactic representations define some details as not being information: comments, variations in how line boundaries are coded, order or certain things, etc. Canonical XML provides such a definition for XML.

  2. The same content objects can have different names and delimiters: p vs. para vs. parafo.

  3. Information can be shifted between syntactic constructs, such as attributes vs. elements in XML (TEI P3 sic vs. corr being one example).

  4. Schemas may differ in how, and how finely, they distinguish related element types. div may distinguish levels purely by being nested, or the schema may pack level numbers into the name: div1, div2, etc.; this hardly seems an essential difference.

  5. Containers for large units such as chapters and sections can be reliably inferred from level-specific headings. Conversely, heading levels can be inferred from container nesting. Thus, several different arrangements of such elements can express the same abstract ordered hierarchy.

  6. A document can be translated without loss to an inverted representation where each vocabulary item occurs once, with a list of the positions where it occurs. It may be unintuitive to claim such a document is the same as one represented in a more conventional order; but the information content is the same:

     <doc>
        <type orth="be"  tokens="2 6" />
        <type orth="not" tokens="4" />
        <type orth="or"  tokens="3" />
        <type orth="to"  tokens="1 5" />
        <type orth=","   tokens="7" />
    </doc>

Diagrams as non-textual XML

XML is not only used for text documents. It is also used for many other kinds of data, by which I mean data differing much more fundamentally than the already substantial differences between (say) DocBook and TEI, or HTML and RSS. For example, music, databases, calendars, and knowledge representation; and, of course, vector images.

How do the features and characteristics of text document models (and their essential features) serve for analyzing such data?

Vector images are composed of discrete components of various types, with properties. Rendered images such as photographs or freehand drawings, commonly represent objects that are discrete in the real world (such as fruit in a still life), but they are often not discrete in the drawing: they overlap each other almost arbitrarily, and finding objects is a matter of AI -- unlike most objects in text documents and vector image representations.

Even before one considers the ontology of the objects depicted this distinction has huge effects for authors and recipients:

  • Vector objects can be scaled without degradation: the scaled image may have an arbitrarily different number of pixels (or no pixels, for example if displayed via a vector scope or pen plotter).

  • A drawn shape that is overlapped by others, can still be picked up and moved later. Each has a persistent existence. In that environment, consistent appearance requires that objects are kept in some "stacking order": which is drawn first matters.

  • Objects have identity, but also properties. A rectangle has a certain position, dimension, color and thickness of border, and color of interior (possibly including quasi-colors like transparent or gradients). But those properties can be changed without changing the object's identity.

  • Objects in vector images typically are instances of a category or class. In many GUI applications the class is merely a shape: rectangle, star, etc., and chosen from a visual menu. However, many applications include libraries of named entities used in a certain field of work, each with a conventionally-associated shape. Some applications (often ones with an accessible underlying program-like language) allow users to define custom classes and subclasses, and a few allow changing the class of a drawing object while retaining all applicable properties (color, border line type, label, position, etc).

  • Vector objects have persistent identities as well as persistent existence. This may be an internal code invisible to users, or a human-legible value (SVG uses XML IDs).

Vector images can usefully be divided into two subcategories.

Some are maps, in which the layout and often the scale are part of the information being conveyed. A floor plan for a building is clearly in this class. The fact that one room connects to another is necessary but not sufficient information; the actual dimensions of each room, the position of doors and windows, and so on are necessary as well.

Another way to think of this, is that maps are not well represented as mathematical graph structures. Creating a node for each room, hall, closet, etc., and arcs connecting them might occasionally be useful, but not for most floor plan uses. The same is true of a circuit-board layout (though somewhat less so, since distances are only critical is some cases, such as extremely high-speed interconnects or limiting EM interference, capacitance, etc.).

Diagrams, in contrast, have a much greater separation between layout and meaning. An org chart is the same if it has the same people and jobs connected in the same pattern. It may be sloppily laid out, with randomly differing box sizes and colors, and lines crossing all over the place; that makes it ugly and annoying to interpret, but little or no information has been lost. Electronic schematics and exploded parts diagrams are also of the general kind (see below).[7]

Some connections may have an order which is part of the essential information. The connections to a logic chip must go to the right pins, and for some purposes genealogical charts must keep track of the order of children and marriages. On the other hand, resistors in schematics can go either way, and not all genealogical charts care about such orders. Even so, the order of drawing need not always match the logical order.

In computer science, such diagrammatic information structures are called Graphs, and they come in Directed and Ordered sub-types as needed. Instances (or layouts) that preserve the same graph structure are isomorphic.

As to order, diagrams are very much like OHCO text documents. Printing each act, scene, speech and stage direction in Hamlet in a random font and size, but preserving their ordered hierarchical relations, would maintain the same Expression. Components can sometimes be moved even more freely, so long as the logical order(s) or reading are not broken: figures can be floated, footnotes moved, etc. And usually some re-arrangements are inessential, such as re-line-wrapping, or setting in one vs. several columns.

What I am calling maps cannot be abstracted in the same way.

Vector objects can be combined to represent any image: at the limit, any bitmap can be represented by a large number of tiny shapes, as in CAD and animation models. However, countless users use them for what I will call maps and diagrams.

By maps I mean images that represent (often but not always to scale), some real or putative object. For example, a building floor plan, a blueprint, a CAD or wireframe model, or an electronic circuit board (as in Figure 1); or even an artistic drawing. In maps, the layout is often essential: a circuit board is meant to be realized physically, with copper traces and holes that have to match up with real-world electronic parts. Perhaps in that respect they are more akin to copyfitting than to representations of document structure. Thus, maps have a very substantial difference from the way we tend to conceptualize text documents (at least in the XML world).

Figure 1

A circuit board as a map (from [jimblom]).

I will instead focus here on diagrams, which are much less essentially tied to physical layout. The circuit diagram (schematic) corresponding to the board layout of Figure 1 is shown in Figure 2. In this case, dimensions and positions are nearly irrelevant: the operative information is the components themselves (a term even more literally applicable here than in text documents), and their logical connections;. These are represented by conventional visualizations (actual resistors are not jagged, nor diodes triangular).

In designing an electronic circuit, one combines components drawn from a variety of categories such as chips, resistors, switches, and many more. Subclasses are important in some cases, such as non-inductive resistors or electrolytic capacitors. Components have both common properties (say, maximum voltage rating), and type-specific ones (such as resistance). As with authors writing text, engineers design circuits by combining and organizing components (each with its own meaning and purpose), into aggregates with higher-level meaning and purpose. In both cases the components and their organization matter a great deal, but their physical placement usually matters much less to the design.[8]

Figure 2

A circuit diagram, or schematic (from [jimblom]).

Like most other vector drawing apps, a circuit design tool provides objects of the necessary kinds (or a user with enough free time can create them). A single drawing tool may or may not be suited for creating both of the figures above. If it is, it will have quite different properties and rendering even for the same component in the two cases. Such tools may or may not know anything about the legitimate combinations or constraints, just as an XML editor may or may not know about validation.

At the most basic level, then, diagrams consist of objects in various categorizes (for which particular images or shapes serve as proxies), connected in a certain way, and rendered into some space. Both shapes and connections are often equipped with names, numbers, and/or descriptive text. Particular shapes may be conventional symbols in a domain, or simply assigned by the author. This is very similar to how content objects are applied in text documents, and in both cases authors may opt to bend or break conventions, even so far as laying out every individual object uniquely.

A Canticle for Leibowitz (Miller 1959) is a story set in a post-Apocalytic future, where monks preserve what is left of technical and scientific knowledge. Brother Francis discovers the distinction of structure and layout for blueprints:

The knowledge that the color scheme of blueprints was an accidental feature of those ancient drawings lent impetus to his plan. A glorified copy of the Leibowitz print could be made without incorporating the accidental feature. With the color scheme reversed, no one would recognize the drawing at first. Certain other features could obviously be modified. He dared change nothing that he did not understand, but surely the parts tables and the block-lettered information could be spread symmetrically around the diagram on scrolls and shields. Because the meaning of the diagram itself was obscure, he dared not alter its shape or plan by a hair; but since its color scheme was unimportant, it might as well be beautiful. He considered gold inlay for the squiggles and doohickii.... When Brother Horner illuminated a capital M, transmuting it into a wonderful jungle of leaves, berries, branches, and perhaps a wily serpent, it nevertheless remained legible. Brother Francis saw no reason for supposing that the same would not apply to the diagram.

In text documents, connections between document components are usually expressed by proximity, size, and relative indentation; though numbering, boxing things together, co-indexing (such as footnote numbers), and other methods also come up. In diagrams, all of those may still be used, but connections are often represented by lines and arrows of various styles, as well as proximity and geometric containment (boxes that group other boxes). Both modalities commonly co-index sets of similar things by co-ordinating the choice of font, color, and other obvious properties.

Diagrams are used in countless fields. Some familiar uses include: Slide presentations, Organizational charts, Flowcharts (as well as UML diagrams, workflows, state diagrams, etc.), schematics, some exploded parts diagrams, or even graphs and charts.

Broadly speaking, vector graphics, like content-based document models, can be viewed as an object-oriented notion.

SVG is an XML schema for vector graphic interchange that is widely supported. This idea was so popular even at the start, that there were six competing submissions to the W3C in the Web vector graphics space the year work on SVG began (https://www.w3.org/Graphics/SVG/WG/wiki/Secret_Origin_of_SVG).

Figure 3 is an extremely simple diagram: two boxes, of different types, each with a label, and joined by a connector. I will use this to illustrate many of the issues involved in mapping between the level of diagrams in the abstract, and diagrams as expressed in various apps and in SVG.

Figure 3

An overly simple block diagram

Most of the same issues of sameness and difference arise for diagrams as for text documents: Just how much can a diagram change, and in what ways, before it becomes a different diagram? So the question arises: do such diagrams fit a conceptual model like text document? Is (2) true?[9]

(2) A diagram is an ordered hierarchy of content-based shapes.

To decide, we must first analyze the structure and implications involved.

Related work on diagrams and documents with XML

There is a substantial literature involving diagrams in relation to documents, but much of it examines visualization of documents as images. Eliot Kimber (2013) presented tools to map text documents to slide presentations. Wendell Piez (2018) has used XSLT to generate fascinating document views in SVG. Yves Marcoux (2008) created visualizations for GODDAG structures. Borovsky, Birnbaum, Lancaster, and Danowski (2009) discussed visualizations for XML collections. A different take by Hugh Cayless (2008) applied SVG to link images and transcriptions. Liam Quin (2015) discussed diagrams for visualizing XML structure, and noted that

In practice any graphical representation will almost certainly be used for both exploration and storytelling, and so we see the wisdom in Alberto Cairo’s observation that there’s a continuous spectrum rather than two distinct sorts of picture (Cairo 2013).

Tomokazu Fujino et al. (2004) discuss tradeoffs in using Xml-Based Graphics (SVG and X3D) in statistics:

The point that we would like to emphasize here is its portability. Many features such as cooperation with other XML-based format and interactivity can be included into one graphics file so it can be used not only as a part of an application but also as an independent application itself. Even if it is generated on server side, there is no traffic between the server and the client when the interactive function works.... The big difference between FLASH and XML-based vector graphics would be that one is closed binary file, and the other is XML format as open and standard specification.

I find vector images (particularly diagrams) interesting as a modelling case because they share some broad characteristics with text documents, yet have great differences in the kind of information involved. For example:

  • Vector image drawing programs provide users with an inventory of familiar basic objects (shapes), such as various boxes, arrows, or schematic symbols. This is very like word processors providing paragraphs, footnotes, index entries, etc., or XML schemas providing an inventory of element types.

  • Shape instances have properties that can be set, many of which end up affecting how they appear: colors, line styles.

  • Shapes typically have associated text such as labels, reference numbers, etc., just as most objects in an OHCO text document do. Diagramming apps often support simple formatting within the text, but rarely any notion of content markup or even named styles within it.

  • Many vector drawing programs include "snap to regions" and glue points on shapes, that provide points of attachment for other objects such as arrows, labels, or other shapes. These bear some resemblance to explicit anchors in text documents, but are usually defined on a shape class rather than instances.[10]

  • Vector images are commonly converted to Image representations, but can be represented at any of the modelling levels discussed earlier.

  • Getting higher-level representations back from pixel ones (moving up) is a very common practical difficulty for vector image users, as it is for text document users.

  • Moving up levels is especially hard for computers; it is easier for humans (perhaps because humans intuitively grasp implicit structures), but extremely tedious to carry out.

  • Vector images frequently contain a lot of natural-language text, if only in small snippets such as box labels, callouts, etc.

There are also some key differences. For example, most apps have a grouping operation, with which one can select multiple objects and join them into one. This has little or no effect on appearance, but after grouping one can select, move, and resize them as a unit. Groups, however, are not typically typed objects themselves. One can group some shapes, apparently making a new symbol for the diagram's visual vocabulary, However, the group is (in many, perhaps most, applications) only an instance, not a class. It has no name or associated data beyond that of it parts. In some apps a group can be cloned by reference, in others copied, but neither operation works quite as in OOP. In most drawing programs the set of shape classes is fixed, or modifiable only by writing actual code.

In SVG the <g> element accomplishes just such grouping. A group, however, can be assigned an ID, by which it can be instantiated elsewhere (and modified by overriding some properties). However, this is only (so to speak) a second-class class: one cannot add new elements to the SVG schema.[11] In contrast, with any XML schema language one can define a new type, and then instantiate it as needed.

This is a crucial difference. In XML, if one defines a new kind of element (or even a sub-class using one or another technique), then one can instantiate it and changes to the class affect all instances. For example, a new style can attach to it and affect all instances. Copies of vector groups, in contrast, often bear no relation to their original, so changing properties of one does not affect others. SVG can <use> any object by ID, dynamically copying it elsewhere; but many properties cannot then be overridden. This puts the user back in the same position as with raster graphics: The group does not exist as a type, but merely instances (which may happen to still look similar, or diverge).

This leads directly to tedious work to modify a shape. As with procedural markup, one must either find enough common features to characterize just certain instances (which may be arbitrarily difficult, or even impossible); or check everything by hand.

Drawing programs rarely enable modifying or sub-classing predefined shapes. For example, a workflow tool may have steps of certain specific types such as edit, verify, and approve. At one point they might all appear as rectangles, but later (or for a different audience or publisher) as distinct shapes or colors. In most vector drawing apps there is no notion of a class of shapes apart from how a thing is rendered. That is, the app may well define rectangle or diamond, but users cannot subclass any of those with a new name and/or a new appearance.

Commonly, a shape (say, a diamond) cannot be converted to another (say, a hexagon) with the same general properties. The best one can do is create a new hexagon, and manually set each property. One can then at least copy that hexagon; though changing all the new hexagons to a new color will require selecting every one -- they have no true name by which they can be controlled.[12]

Vector programs vs. vector APIS

Vector image drawing tools come in two broad forms: GUI and programmatic: deploying shapes by sketching or choosing from menus (like WYSIWYG), or writing programs that call functions to make and modify shapes.

So far I have mainly focused on vector diagrams as managed through GUIs: the equivalent of WYSIWYG word processing. Vector graphics, however, can also be created programmatically. Instead of dragging a rectangle from a tool bar and using various mouse actions to modify it, one can write something in a (quasi-)programming language, such as:

    new Diamond(loc="10 10 50 50", label="Analyze", bgColor="#F00", lineColor="#00F",...);

or

    <Diamond loc="10 10 50 50" label="Analyze" bgColor="#F00" lineColor="#00F"/>

This approach easily supports factoring out repeatedly-used sets of properties. SVG has a style mechanism fairly similar to CSS, and even re-uses many of the very same styling properties.

Many drawing systems are integrated into full-fledged programming languages, in which case the programmer can create new abstractions outside of the vector system itself. One can address the lack of shape classes by creating functions to draw shapes and then calling them as needed. For the earlier example of distinguishing edit, verify, and approve shapes, one can create a class for each, with render() or other methods as needed. The functions can change over time, much like an element's style definition for XML.

SVG is similar to other vector languages, but it has a few less-typical features, some of which naturally follow from its XML base:

In SVG a shape instance (or a drawing of any complexity) can have an XML ID. One can instantiate it at will by referring to the ID. Referring instances can override some properties if desired. Because the reference is itself an SVG construct, which has independent existence in the SVG drawing, changing it does change all instances. Instantiation can be multi-level.

One big difference, however, is that an SVG drawing tends to have a lot of top-level objects, and their order matters far less than in most natural languages. Scrambling the drawing order of vector shapes (or the SVG file order) makes no difference except where they overlap. Most vector drawing systems address this by a settable explicit Z-order for shapes.[13]

SVG of course defines some XML elements and their semantics. But unlike text documents, the elements have only fixed geometric meanings. Its most relevant elements include text, circle, rect, line, path, textPath, polygon, g and use (the grouping constructs mentioned above), and img (much like HTML). Thus, SVG off the shelf operates essentially at the presentation or layout level. It is not wildly inaccurate to think of it as procedural markup for drawing, with g and use providing functionality somewhat like troff macros.

Other elements define styles; deal with fonts, colors, and patterns; animate; filter; and handle lighting. Attributes specify shape properties, apply affine transformations, define IDs, and so on. But fundamentally, these are all in the service of rendering a few basic shapes, which can be compounded into larger ones (accomplished by the g (group) element).

SVG in typical drawing apps

SVG provides few native shapes, but there are customizable editors. Many vector drawing apps provide enormous libraries of stock shapes, but also offer export to SVG. What do these do, and how do they leverage SVG to model their worlds? To examine the issues, we will use the extremely simple diagram shown in Figure 4.

Figure 4

A simple block diagram

This diagram was drawn with YEd, which provides both box shapes, as well as multi-segment connector lines with arrows, and the ability to attach connectors to boxes (so the connector resizes if the box is moved). YEd can export SVG (discussed below), but like many drawing programs, has a separate native format, in this case "GraphML" (Brandes 2014).

I have shown YEd first because its model for diagrams is extremely focused on logical structure of diagrams, with GraphML's fundamental objects being simply node and label. Thus, it can express a graph structure entirely apart from rendering or even labels:

<graphml>
  <graph id="G" edgedefault="undirected">
    <node id="big_box"/>
    <node id="doc"/>
    <edge id="e1" source="big_box" target="doc"/>
  </graph>
</graphml>

This is about as bare-bones as one can get, but also (if labels are added) very close to a respected approach to text documents: just structure, no rendering until downstream.[14]

Files created in YEd add a lot of information to specify box, label, and connector formatting, but retain a basic structure like that above. Just as with many word-processors that export XML, YEd appears to simply write most (all non-default?) properties on every shape instance, rather than writing style definitions once and then using them by reference. When (as typical), there are only a few distinct styles but each is used very many times, this costs substantial space, though the excess is easy to remove with sed or XSLT.[15]

<graphml xmlns="http://graphml.graphdrawing.org/xmlns" ...>
  <key for="port" id="d1" yfiles.type="portgraphics"/>
  ...
  <graph edgedefault="directed" id="G">
    <data key="d0" xml:space="preserve"/>
    <node id="n0">
      <data key="d6">
        <y:ShapeNode>
          <y:Geometry height="73.0" width="121.0" x="-529.0" y="-424.0"/>
          <y:Fill color="#FFCC00" transparent="false"/>
          <y:BorderStyle color="#000000" raised="false" type="line" width="1.0"/>
          <y:NodeLabel ... fontStyle="italic" ...>A big box<y:LabelModel>
          <y:SmartNodeLabelModel distance="4.0"/></y:LabelModel><y:ModelParameter>
          <y:SmartNodeLabelModelParameter labelRatioX="0.0" .../></y:ModelParameter></y:NodeLabel>
          <y:Shape type="rectangle"/>
        </y:ShapeNode>
      </data>
    </node>
    <node id="n1">
      <data key="d6">
        <y:GenericNode configuration="com.yworks.flowchart.document">...</y:GenericNode>
      </data>
    </node>
    <edge id="e0" source="n0" target="n1">
      <data key="d10">
        <y:PolyLineEdge>
          <y:Path sx="0.0" sy="0.0" tx="0.0" ty="0.0">
            <y:Point x="-468.5" y="-326.0"/>
            <y:Point x="-368.0" y="-326.0"/>
            <y:Point x="-368.0" y="-444.0"/>
            <y:Point x="-282.0" y="-444.0"/>
          </y:Path>
          <y:LineStyle color="#000000" type="line" width="1.0"/>
          <y:Arrows source="none" target="standard"/>
          <y:BendStyle smoothed="false"/>
        </y:PolyLineEdge>
      </data>
    </edge>
  </graph>
  <data key="d7"><y:Resources/></data>
</graphml>

What happens, then, when YEd exports this diagram to SVG? One might expect a fairly direct mapping:

  1. SVG has a native rectangle object and plenty of style features, so the left box could just be there.

  2. SVG lacks a native document symbol, but YEd could write out an SVG group element that draws its own stock version, add an ID (say, "yed_shape_document"), and then use it as if SVG had it natively.

  3. SVG does not have an obvious way to position anything relative to the position of a prior object; much less an auto-routing capability. So the connector must end something like YEd's <y:PolyLineEdge> element above.[16]

This would lead to something like:

<svg>
  <defs id="genericDefs"/>
  <g>
    <defs id="defs1">
      <g id="yed_shape_document"
        text-rendering="geometricPrecision" stroke-miterlimit="1.45" 
        shape-rendering="geometricPrecision" transform="matrix(1,0,0,1,544,459)" 
        stroke-linecap="butt">
        <path fill="none" d="M-331 -424 L-244 -424 L-244 -360.125 Q-265.75 
        -378.375 -287.5 -360.125 Q-309.25 -341.875 -331 -360.125 Z"/>
    </defs>
    
    <g fill="white" transform="translate(544,459)">
      <rect x="-544" width="315" height="148" y="-459"/>
    </g>
    
    <g fill="rgb(255,204,0)" transform="matrix(1,0,0,1,544,459)"">
      <rect x="-529" width="121" height="73" y="-424"/>
    </g>
    
    <g transform="matrix(1,0,0,1,544,459)">
      <rect x="-529" width="121" height="73" y="-424"/>
      <text x="-496.8301" y="-382.9648">A big box</text>
    </g>
    
    <g>
      <use href="#yed_shape_document"/>
      <text x="-305.4238" y="-390.0312">and a </text>
      <text x="-312.9854" y="-375.8984">printout </text>
    </g>
    
    <g transform="matrix(1,0,0,1,544,459)" stroke-linecap="butt">
      <path fill="none" d="M-468.5 -351 L-468.5 -326 L-368 -326 L-368 -444 L-282 -444 L-283.1719 -431.9612" clip-path="url(#clipPath2)"/>
      <path d="M-283.947 -423.9988 L-277.8079 -435.4579 L-283.075 -432.9565 L-287.7609 -436.4268 Z" clip-path="url(#clipPath2)" stroke="none"/>
    </g>
  </g>
</svg>

In fact, the result takes a bit more work (I omit namespace and some formatting attributes for readability). Note that each box is drawn twice, in slightly different sizes, perhaps to inset the filled inner region (despite there being no fill), from the outer frame. Even the curved bottom edge for the document icon is drawn twice, via an explicit path:

  <?xml version="1.0" encoding="UTF-8"?><svg>
  <defs id="genericDefs"/>
  
  <g>
    <defs id="defs1">
      <clipPath clipPathUnits="userSpaceOnUse" id="clipPath1">
        <path d="M0 0 L315 0 L315 148 L0 148 L0 0 Z"/>
      </clipPath>
      <clipPath clipPathUnits="userSpaceOnUse" id="clipPath2">
        <path d="M-544 -459 L-229 -459 L-229 -311 L-544 -311 L-544 -459 Z"/>
      </clipPath>
    </defs>
    
    <g fill="white" text-rendering="geometricPrecision" shape-rendering="geometricPrecision" transform="translate(544,459)" stroke="white">
      <rect x="-544" width="315" height="148" y="-459" clip-path="url(#clipPath2)" stroke="none"/>
    </g>
    
    <g text-rendering="geometricPrecision" stroke-miterlimit="1.45" shape-rendering="geometricPrecision" transform="matrix(1,0,0,1,544,459)" stroke-linecap="butt">
      <rect fill="none" x="-529" width="121" height="73" y="-424" clip-path="url(#clipPath2)"/>
      <text x="-496.8301" y="-382.9648" clip-path="url(#clipPath2)" font-family="sans-serif" font-style="italic" stroke="none" xml:space="preserve">A big box</text>
    </g>
    
    <g text-rendering="geometricPrecision" stroke-miterlimit="1.45" shape-rendering="geometricPrecision" transform="matrix(1,0,0,1,544,459)" stroke-linecap="butt">
      <path fill="none" d="M-331 -424 L-244 -424 L-244 -360.125 Q-265.75 -378.375 -287.5 -360.125 Q-309.25 -341.875 -331 -360.125 Z" clip-path="url(#clipPath2)"/>
      <text x="-305.4238" xml:space="preserve" y="-390.0312" clip-path="url(#clipPath2)" font-family="sans-serif" stroke="none">and a </text>
      <text x="-312.9854" xml:space="preserve" y="-375.8984" clip-path="url(#clipPath2)" font-family="sans-serif" stroke="none">printout </text>
    </g>
    
    <g text-rendering="geometricPrecision" stroke-miterlimit="1.45" shape-rendering="geometricPrecision" transform="matrix(1,0,0,1,544,459)" stroke-linecap="butt">
      <path fill="none" d="M-468.5 -351 L-468.5 -326 L-368 -326 L-368 -444 L-287.5 -444 L-287.5 -431.9988" clip-path="url(#clipPath2)"/>
      <path d="M-287.5 -423.9988 L-282.5 -435.9988 L-287.5 -432.9988 L-292.5 -435.9988 Z" clip-path="url(#clipPath2)" stroke="none"/>
    </g>
  </g>
</svg> 

Although not shown here, if the document shape were copied and re-used in the diagram, the whole is repeated, but with the path giving gratuitously different coordinates. Thus the number of objects, and kinds of objects in the user's mind are unlike the SVG (after all, I picked the icons from the default menu of shapes): I draw two shapes, but the SVG has four (not even grouped). There is nothing in the SVG to say they are instances of the same type. If the shape were defined and instanced, the path could appear once even though its final coordinates (and possibly other properties) differ. The user's model of what is the same and different would approximate the file's model. Even failing that, if YEd's name for each shape type were merely put on an attribute, the members of this equivalence class could trivially be related and processed together.

SVG does not provide a native way to associate a label directly with a shape -- for example, via an attribute such as label="a big box" or labelRef="#textObj12". So one can hardly blame YEd for having separate text objects. However, those could easily be grouped with the corresponding boxes (thankfully the text object seems always to be written to the SVG immediately afterward).

To the reader I leave the design of a program to discover what shapes paths should be mapped back to; though even if that were solved, the user might desire different concepts that at one time or another happen to be rendered as the same shape, just as in text documents authors may want multiple content objects all rendered as italic.

YEd's native format is near the extreme of treating diagrams as mathematical graph structures: the basic objects are just nodes and edges. Yet merely writing an SVG group surrounding each node and edge as they are exported to SVG, with a name for the relevant shape type, would greatly aid search, transformation, conversion to other formats, and many other commonly-desired operations would be trivial rather than very difficult.

Similarly, factoring out highly-reusable things like definitions of how to draw their shapes and factoring out formatting attributes would greatly decrease file sizes, ease readability and reusability, and make the stored components directly correspond to those the user created in the first place.

Perhaps, however, despite its very sparse and simple model displayed in GraphML, YEd is just unusually poor at export? How do other drawing apps stack up on similar metrics?

inkScape

inkScape is a native SVG editor. It provides few shapes beyond SVG's native ones, though it has a wide range of image operations and drawing tools. Like YEd, it has no automated connector routing. Labels are not a property of other shapes, but must be manually created as separate text objects, then positioned and grouped.

Drawing the same diagram is a bit harder due to the lack of a pre-made document shape. However, once drawn, one can clone the shape and move or resize the clone. However, many properties of the clone can not override those of the original (for example, color or label text). So cloning doesn't work as well as it might.[17]

 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg>
  <defs id="defs2">
    <inkscape:perspective id="perspective876"
       inkscape:persp3d-origin="105 : 99 : 1"
       inkscape:vp_z="210 : 148.5 : 1" inkscape:vp_y="0 : 1000 : 0"
       inkscape:vp_x="0 : 148.5 : 1" sodipodi:type="inkscape:persp3d" />
    <marker
       inkscape:stockid="Arrow1Lend"
       orient="auto" refY="0.0" refX="0.0" id="Arrow1Lend"
       style="overflow:visible;" inkscape:isstock="true">
      <path id="path1469"
         d="M 0.0,0.0 L 5.0,-5.0 L -12.5,0.0 L 5.0,5.0 L 0.0,0.0 z "
         style="fill-rule:evenodd;stroke:#000000..."
         transform="scale(0.8) rotate(180) translate(12.5,0)" />
    </marker>
    <symbol id="Document">
      <title id="title880">Document</title>
      <desc id="desc882">A document or report</desc>
      <path id="path884" style="stroke-width:0.529167"
         d="m 3.96875,9.2604167 h 31.75 V 25.135417 C 22.489583,23.8125
         17.197917,34.395833 3.96875,27.78125 Z" />
    </symbol>
  </defs>
  <sodipodi:namedview id="base" pagecolor="#ffffff"
     bordercolor="#666666" inkscape:pageopacity="0.0"... />
  <metadata id="metadata5">
    <rdf:RDF>
      <cc:Work rdf:about="">
        <dc:format>image/svg+xml</dc:format>
        <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
      </cc:Work>
    </rdf:RDF>
  </metadata>
  <g inkscape:label="Layer 1" inkscape:groupmode="layer" id="layer1">
    <rect style="...stroke-width:0.264583;stroke-opacity:1" id="rect10"
       width="46.113098" height="30.238094" x="26.620625" y="38.957844"
       inkscape:label="A big box" />
    <use xlink:href="#Document"
       style="fill:#ffffff;stroke:#000000;fill-opacity:1" id="use1393"
       x="0" y="0" width="100%" height="100%"
       transform="matrix(1.2277034,0,0,1.4855036,100.60507,21.608701)" />
    <text xml:space="preserve" style="font-style:normal;..."
       x="29.835394" y="57.723057" id="text1450">
       <tspan sodipodi:role="line" x="29.835394" y="57.723057"
         style="..." id="tspan1452" rotate="0 0 0 0 0 0 0 0 0 0">a
       <tspan style="..." id="tspan1462" rotate="0 0 0 0">big</tspan>
       box</tspan></text>
    <text xml:space="preserve" style="..."
       x="106.98212" y="45.019466" id="text1458">
       <tspan sodipodi:role="line" id="tspan1456"
         x="106.98212" y="45.019466" style="...">and a </tspan>
       <tspan sodipodi:role="line"
         x="106.98212" y="55.602802" style="..."
         id="tspan1460">printout</tspan></text>
    <path style="...marker-end:url(#Arrow1Lend)"
       d="M 49.07351,68.705024 V 77.44602 L 87.544834,77.843342
       86.81896,24.205391 h 39.1972 v 10.72759" id="path1464" />
  </g>
</svg>

LibreOffice Draw

LibreOffice is an open source office suite. Its drawing component is not a native SVG editor, but can import and export formats including SVG, Visio, and others. For the same drawing, created natively and exported to SVG, the result is approximately as shown below (edited for readability).

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg>

 <defs class="ClipPathGroup">
  <clipPath id="presentation_clip_path" clipPathUnits="userSpaceOnUse">
   <rect x="0" y="0" width="21590" height="27940"/>
  </clipPath>
  <clipPath id="presentation_clip_path_shrink" clipPathUnits="userSpaceOnUse">
   <rect x="21" y="27" width="21547" height="27885"/>
  </clipPath>
 </defs>

 <defs>
  <font id="EmbeddedFont_1" horiz-adv-x="2048">
  </font>
 </defs>

 <defs class="TextShapeIndex">
  <g ooo:slide="id1" ooo:id-list="id3 id4 id5"/>
 </defs>

 <defs class="EmbeddedBulletChars">
  <g id="bullet-char-template-57356" transform="scale(0.00048828125,-0.00048828125)">
   <path d="M 580,1141 L 1163,571 580,0 -4,571 580,1141 Z"/>
  </g>
 </defs>

 <g>
  <g id="id2" class="Master_Slide">
   <g id="bg-id2" class="Background"/>
   <g id="bo-id2" class="BackgroundObjects"/>
  </g>
 </g>

 <g class="SlideGroup">
  <g>
   <g id="container-id1">
    <g id="id1" class="Slide" clip-path="url(#presentation_clip_path)">
     <g class="Page">

      <g class="com.sun.star.drawing.CustomShape">
       <g id="id3">
        <rect class="BoundingBox" stroke="none" fill="none" x="3284" y="3030" width="3178" height="1908"/>
        <path fill="none" stroke="rgb(0,0,0)" d="M 4873,4936 L 3285,4936 3285,3031 6460,3031 6460,4936 4873,4936 Z"/>
        <text class="TextShape"><tspan class="TextParagraph" font-family="Liberation Sans, sans-serif" font-size="635px" font-style="italic" font-weight="400"><tspan class="TextPosition" x="3588" y="4204"><tspan fill="rgb(0,0,0)" stroke="none">a big box</tspan></tspan></tspan></text>
       </g>
      </g>

      <g class="com.sun.star.drawing.CustomShape">
       <g id="id4">
        <rect class="BoundingBox" stroke="none" fill="none" x="9254" y="3157" width="2798" height="1909"/>
        <path fill="none" stroke="rgb(0,0,0)" d="M 9255,3158 L 12050,3158 12050,4689 C 10943,4677 10981,4990 9982,5064 9625,5019 9494,4984 9255,4933 L 9255,3158 Z"/>
        <text class="TextShape"><tspan class="TextParagraph" font-family="Liberation Sans, sans-serif" font-size="635px" font-weight="400"><tspan class="TextPosition" x="9861" y="3789"><tspan fill="rgb(0,0,0)" stroke="none">and a</tspan></tspan></tspan><tspan class="TextParagraph" font-family="Liberation Sans, sans-serif" font-size="635px" font-weight="400"><tspan class="TextPosition" x="9597" y="4500"><tspan fill="rgb(0,0,0)" stroke="none">printout</tspan></tspan></tspan></text>
       </g>
      </g>

      <g class="com.sun.star.drawing.ConnectorShape">
       <g id="id5">
        <rect class="BoundingBox" stroke="none" fill="none" x="4872" y="2014" width="5931" height="3941"/>
        <path fill="none" stroke="rgb(0,0,0)" d="M 4873,4937 L 4873,5953 7858,5953 7858,2015 10652,2015 10652,2729"/>
        <path fill="rgb(0,0,0)" stroke="none" d="M 10652,3159 L 10802,2709 10502,2709 10652,3159 Z"/>
       </g>
      </g>

     </g><!--Page-->
    </g><!--Slide-->
   </g><!--container-id1-->
  </g>
 </g><!--SlideGroup-->
</svg>

This is largely similar to YEd's output, but has one distinct advantage: It encloses each object as a group (g), and assigns it a class which identifies it at least as a LibreOffice shape. The com.sun.star.drawing.CustomShape uses Java-style hierarchical naming. If the program merely added one more component, naming the particular shape, portability would be easy. Names already exist for the user interface, and probably have localizations in all of LibreOffice's language versions. But even without that, at least there is one particular element in the SVG file corresponding to each object the user drew.

Summary

SVG is XML applied to vector images. Some images (such as maps) are fundamentally about their image, and thus quite unlike nearly all text documents as we commonly think of them. However, many vector images are about structures of related units. Examples include org charts, flowcharts, schematics, and countless others. These are much more similar to text documents, in being composed of discrete units that are members of conceptual categories. The categories are often conventional in a given field, and have fairly conventional ways of being visualized. To make a diagram, such units are instantiated, organized (commonly hierarchically), and placed in relation with each other. All this is quite similar between text documents and diagrams.

Many relations are expressed by proximity and order in both domains. However, many diagrams make heavy use of lines and arrows for connections, which are relatively rare for text documents. On the other hand, text documents more often use hyperlinks for long-distance connections, which are uncommon (but hardly absent) in diagrams.

For diagrams (though not for all vector graphics), an XML representation can easily be devised in which the elements correspond intuitively to the objects the user created. This could be done either by creating a schema for each domain, or a generic schema such as <shape class="decision".../>.

However, a variety of current vector image drawing applications do not write SVG anything like this. One, YEd, writes GraphML, an XML schema that is much like this; but even it write very different SVG. Apps writing SVG are reminiscent of word processing around the time descriptive markup became prominent (hand-crafted SVG can easily be better):

  • In practice, every vector drawing program has its own structures, tools, and representation, and it is hard to move data from one to another.

  • Translations often fail to retain essentials of users' work, retaining only the raw geometry.

  • Because of this, it is very difficult to edit or otherwise process such SVG.

  • SVG written out from apps, rarely treats even the app's own native objects as first-class SVG objects (for example, by defining and naming each shape, then using them).

The difficulty does not seem to center around syntax, verbosity, or readability. Rather, it seems to arise from a conceptual mismatch between the user and the software: The user forms beliefs about what things there are, and in what groups (classes?) they go. Drawing apps' behavior gives strong justification for those beliefs through the way it makes one operate, such as picking named kinds of things from a menu or being unable to change the kind (shape type) of existing objects). Yet the beliefs are almost always false. When one exports to a format which is widely known as portable and capable of directly representing such ontological structures, the result is not readily mappable to the user's model.

This seems very like procedural markup (despite SVG being an XML application): expression of the user's conceptual categories is implicit or indirect. The user's notions of which kinds of things are around, and how many, bear only indirect relationships to the software's notion. Thus, reconstructing something even roughly isomorphic to the user's model requires AI.

This kind of conceptual mismatch is hardly limited to documents and diagrams. Norman (1988, 2013) has written extensively about such model mismatches in general. A well-known example he gives is a refrigerator with these controls:

Figure 5

Refrigerators controls from Norman (2013, p. 28).

The manufacturer labelled two knobs, corresponding to a division the user knows and uses: the freezer vs. the refrigerator compartment. The natural assumption is that each knob controls the temperature in the corresponding compartment. As Norman states:

A good conceptual model allows us to predict the effects of our actions. Without a good model, we operate by rote, blindly; we do operations as we were told to do them; we can't fully appreciate why, what effects to expect, or what to do if things go wrong.... There is no need to understand the underlying physics or chemistry of each device we own, just the relationship between the controls and the outcomes.

Sadly, the intuitive suggested model can be wrong, as in this case. In fact one control sets a thermostat in one of the two compartments, while the other apportions cold airflow between compartments. Without the correct model in mind adjustment is nigh impossible; even with it, it is difficult because the relationship between the control combination and the typically-desired effects is complex and indirect. This is much like the relationship between two identical document shapes in most of the SVG shown, or between various ways of achieving a visual effect (say, for a block quote) in troff or another procedural formatter.

For the refrigerator user, the desire is almost always to make one compartment more or less cold; almost never to repartition an abstract amount of coldness between compartments. Yet the common goal is very hard to achieve, while improbable goals are easy.

Similarly, when saving or exporting data of any kind, one might wish merely to ensure that the data exists for re-use by the same application; in that case anything works. But very often indeed, one wants to give the data to someone: a co-worker who may or may not have the same drawing program; themselves at a future time; a client who can only handle certain formats; or even an archive. It will likely be possible to load SVG such as we have seen, but only to edit it after a fashion similar to editing a PDF or page-scan. Many things will not work as one might wish:

  • YEd's doubled-up boxes are not a unit of any kind, so picking one up and moving it, doesn't move the other.

  • Connectors are not attached to objects, so do not follow when objects move. Auto-routing varies greatly in general, and several of these apps do not auto-route at all -- but a poorly routed connector that still connected the right objects, would normally be correct, even if unattractive.

  • When objects are drawn individually and have no overt type, they do not readily map to corresponding objects in a target application. For example, most drawing programs provide a document shape similar to that in Figure 1 – but in no case tested will their saved SVG enable another app to use its corresponding shape. This has practical consequences: if one imports SVG and then add a document symbol at the destination, it will have a completely different look -- as if one added text to an imported text document, but could not get it into the same font as the imported text.[18]

The wide variety of complex SVG that is written even for simple cases does not mirror a real complexity in the user's model, and that mismatch leads to problems.

The potential solutions, however, seem slightly different here than with text documents. Text work commonly uses a two-level model: The level of conceptual objects, and the level of layout. The conceptual objects are dictated by the demands of a domain (are there "stanzas" or departure times or code listings or notes?). Layout is dictated by medium and by aesthetic and design choices, and executed mostly automatically. OHCO then suggests that the former is the text, while the other is epiphenomenal.

Does this approach suffice for diagrams? Diagrams do have conceptual objects, often in standardized sets (indeed, many drawing apps have named shape libraries for various applications). It seems no harder to create a schema for a flowchart than for a poetry collection. Many shapes used in diagrams have connection requirements: A choice box in a flowchart might expect at least one in connection, and exactly two out connections, while vastly more complex rules can be stated for schematics.

Diagrams also have layout, and as with text document this is more a matter of aesthetics and design. Much diagram layout is manual (though not in tools like Graphviz), but some is automatic. Connector routing is often automatic even though it is far more complex than most text-layout algorithms. Features like snap-to-grid can be enabled to impose simple layout rules if desired, and there are manually-invoked but automatically executed processes such as lining up all selected objects, spreading them evenly along a path, and so on.

However, some large differences remain. With text documents, nearly all the formatting properties of objects are separable, and applied uniformly to all instances of a type of component. "Type" need not mean element type name, but can depend on class attributes, context, or other intensions.[19]Diagram software only rarely provides a comparable mechanism, where the user can define (say) three named types of connector lines, each with its own color, stroke pattern, arrowhead style, or other properties – and then use them for three distinct purposes, retaining the ability to change their look en masse. Many drawing programs do not even provide a way to define and modify color schemes – a familiar problem when a new projector, lighting, and venue affect contrast and require re-coloring an entire presentation on short notice.

Each drawing tool has a palette(s) of shapes it knows about, usually all with names. But many do not facilitate user construction of new shapes. And if they have SVG export at all, they may not create SVG objects for each of their types, but only for each of the instances. It takes little effort to just write a shape the first time and put an ID on it (ideally, a readable one based on the user-visible shape name); then to just <use> it by reference.

In fairness, SVG could make this a bit easier for SVG generators with a few features such as relative positioning and more general property overrides.

Does an OHCO-like model fit SVG?

1: Order

In text documents, there is typically an overall reading order to all the text (users of course may read in any order they like), though there are various exceptions.

In vector graphics, it may seem that shapes are not ordered at all, unless as communicated by the structure of flow diagrams, electronic schematics, etc.; and by Z-order.

2: Hierarchy

In text documents, many content objects contain others.

In diagrams, shapes are composable into other shapes. Composition in diagrams seems also to be of two kinds. First, many drawing programs provide a grouping operation that makes new composite shapes. These that can then be instantiated in various sizes and places, and even combined into further shapes. Second, end users frequently place shapes inside other shapes to express relationships, such as several staff forming a team, or several components forming a sub-assembly. Only the second of these is comparable to text document authors assembling paragraphs, lists, sections, speeches and the like (or even tables). The first is more like a schema designer create new complex types.

3: Content-based shapes

With text, most of the content-based objects have names that are familiar to many literate people: chapter, section, paragraph, list, and quotation are widely known (if imprecise) concepts. A large share of content-based objects are the subject of sections in style manuals, and taught in primary school.

Specialized domains of all kinds add their own vocabulary to English and other languages. Specialized document genres such as semiconductor data sheets, invoices, *nix man pages, and many more, introduce objects based on their specific content needs, such as pin definitions, extended prices, command and option names, and so on. This is unremarkable.

Many kinds of drawings are similarly conventional. People draw box-and-arrow diagrams on napkins all the time. Flowcharts use standardized shapes to represent processes, choices, documents, and other concepts.[20] Long before that, engineers assigned conventional shapes to components such as resistors and batteries (as well as abstract electrical notions such as ground) in order to draw schematics. And in a reasonable sense, literacy itself is the result of assigning conventional shapes to symbolize sounds and/or meanings. This differs from artistic or idealized representations that are more iconic than symbolic.

What could we gain by applying OHCO concepts to drawings?

A common frustration in preparing technical drawings is maintaining consistency. For example, a flow diagram or business process description, or for that matter a slide presentation, commonly uses a small number of distinct concepts.

This problem is much the same as the problems for which XML was devised, albeit with differences in detail. But in drawing, software often represents only the primitives out of which the symbols are created, rather than the various kinds of symbols themselves. A more descriptive approach might use different schemas for different symbolic domains, with a more primitive drawing level filling a role comparable to CSS or XSL-FO. This can fairly easily be implemented for a given domain, with XSLT transforming the domain drawing schema to SVG, GraphML, or another schema with direct rendering support.

Off the shelf, the state of the art for drawing appears slightly better than the reactionary stage lamented by Coombs et al in 1987. Although the readily available models may not be ideal, we at least have an open syntax, so it is much more feasible to get in and change things to one's liking, than with unlamented binary file formats. Being XML, SVG makes raw, syntactic portability simple. Implementations do not need to change their models, or even their user interfaces, much at all to achieve far greater portability (in both directions!) and ease of use. They can merely take their existing, fairly object-oriented model of what drawings are, and map it directly into more usable SVG usage conventions; in many cases this mapping is probably easier than what they are doing now, because after a modicum of setup it can be much more direct.

References

Borovsky, Zoe, David J. Birnbaum, Lewis R. Lancaster and James A. Danowski. 2009. The Graphic Visualization of XML Documents. Presented at Balisage: The Markup Conference 2009, Montréal, Canada, August 11 - 14, 2009. In Proceedings of Balisage: The Markup Conference 2009. Balisage Series on Markup Technologies, vol. 3 (2009). doi:https://doi.org/10.4242/BalisageVol3.Borovsky01.

Brandes, Ulrik; Eiglsperger, Markus; Lerner, Jürgen; Pich, Christian. 2014. Graph Markup Language (GraphML). In Tamassia, Roberto (ed.). Handbook of Graph Drawing and Visualization (PDF). CRC Press. pp. 517–541. ISBN-13: 978-1138034242. http://cs.brown.edu/people/rtamassi/gdhandbook/chapters/graphml.pdf

Cairo, Alberto. 2013. The Functional Art: An introduction to information graphics and visualization. New Riders. Cited in Quin (2015).

Cayless, Hugh A. 2008. Linking Page Images to Transcriptions with SVG. Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). doi:https://doi.org/10.4242/BalisageVol1.Cayless01.

James H. Coombs, Allen H. Renear, and Steven J. DeRose. 1987. Markup Systems and the Future of Scholarly Text Processing. Communications of the Association for Computing Machinery 30 (11), pp. 933-947. doi:https://doi.org/10.1145/32206.32209.

DeRose, Steven J. 1997. Navigation, Access, and Control Using Structured Information. American Archivist 60 (Summer): 298-309. doi:https://doi.org/10.17723/aarc.60.3.0777u1361u62tqp6.

Renear, Allen and David Dubin. 2003. Towards identity conditions for digital documents. In DCMI '03: Proceedings of the 2003 international conference on Dublin Core and metadata applications: supporting communities of discourse and practice -- metadata research & applications. September 2003. https://www.academia.edu/2796396/Towards_identity_conditions_for_digital_documents.

Durand, David, Elli Mylonas, and Steven J. DeRose. 1996. What Should Markup Really Be? Applying Theories of Text to the Design of Markup Systems. In Joint International Conference: ALLC/ACH. http://xml.coverpages.org/DurandWhatShouldTextBe-ALLC1996.pdf.

Ferraiolo, Jon. 4 September 2001. Scalable Vector Graphics (SVG) 1.0 Specification. World Wide Web Consortium. https://www.w3.org/TR/SVG10.

Tomokazu Fujino, Yoshikazu Yamamoto and Tomouki Tarumi. 2004. Possibilities and Problems of the XML-Based Graphics in Statistics. COMPSTAT 2004: Proceedings in Computational Statistics. 16th Symposium held in Prague, Czech Republic. Jaromir Antoch (ed). Section: Internet based methods: 1043-1052.

Halliday, M. A. K. 1985. Spoken and Written Language. Victoria: Deakin University Press.

jimblom. Using EAGLE: Board Layout (sparkfun tutorial). https://learn.sparkfun.com/tutorials/using-eagle-board-layout.

Kimber, Eliot. 2013. General Architecture for Generation of Slide Presentations, including PowerPoint, from arbitrary XML Documents. Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). doi:https://doi.org/10.4242/BalisageVol10.Kimber01.

Marcoux, Yves. 2008. Graph characterization of overlap-only TexMECS and other overlapping markup formalisms. Presented at Balisage: The Markup Conference 2008, Montréal, Canada, August 12 - 15, 2008. In Proceedings of Balisage: The Markup Conference 2008. Balisage Series on Markup Technologies, vol. 1 (2008). doi:https://doi.org/10.4242/BalisageVol1.Marcoux01.

Norman, Donald A. 2013. The Design of Everyday Things: Revised and Expanded Edition. New York: Basic Books. ISBN 978-0465050659. 1st edition: 1988.

Peirce, Charles Sanders. 1998. The Essential Peirce. Volume 2. Eds. Peirce Edition Project. Bloomington IN: Indiana University Press. Cited in Peirce’s Theory of Signs in The Standard Encyclopedia of Philosophy, Nov 15, 2010. https://plato.stanford.edu/entries/peirce-semiotics/.

Piez, Wendell. 2014. Hierarchies within range space: From LMNL to OHCO. Presented at Balisage: The Markup Conference 2014, Washington, DC, August 5 - 8, 2014. In Proceedings of Balisage: The Markup Conference 2014. Balisage Series on Markup Technologies, vol. 13 (2014). doi:https://doi.org/10.4242/BalisageVol13.Piez01.

Piez, Wendell. 2018. Fractal information is. Presented at Balisage: The Markup Conference 2018, Washington, DC, July 31 - August 3, 2018. In Proceedings of Balisage: The Markup Conference 2018. Balisage Series on Markup Technologies, vol. 21 (2018). doi:https://doi.org/10.4242/BalisageVol21.Piez01.

Quin, Liam R. E. 2015. Diagramming XML: Exploring Concepts, Constraints and Affordances. Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). doi:https://doi.org/10.4242/BalisageVol15.Quin01.



[1] There are many edge cases between text and non-text, such as tables, form labels, index entries, program code examples, alt text for figures, etc.

[2] Definitions of OOP vary considerably. Merriam-Webster (https://www.merriam-webster.com/dictionary/object-oriented%20programming) describes objects which communicate with each other, which may be arranged into hierarchies, and which can be combined to form additional objects. Wikipedia notes that objects have data, in the form of fields (often known as attributes or properties), and code, in the form of procedures (often known as methods).... objects have a notion of 'this' or 'self')..., and that objects interact and are class-based.

[3] This very basic model has been refined, extended, or replaced in various ways; for discussion of a few, see Durand (1996).

[4] Some characters do have substantial meaning even in isolation: mathematical symbols, ? µ ι @ $ ¶, etc. A few alternate with explicit content objects such as word, sentence, or paragraph.

[5] Depending on what count as different characters (in digital form, the encoding), additional distinctions may be managed. For example, Unicode's "Mathematical" alphabet variations (italic, bold, sans serif, double-struck, etc) could be used to represent emphasis.

[6] An interesting edge case arises when specialists annotate documents by making grammatical structures overt. It can be argued whether this adds any content objects to a document, since (to the extent the analysis is accurate) they were already implicit in the linguistic content. The phenomena certainly seem essential, but whether marking them up overtly should be taken to change the document into another seems unclear.

[7] As mentioned earlier for text documents, it is possible for diagrams to make reference to their layout, slightly blurring this distinction.

[8] There are significant exceptions, such as components that produce or are sensitive to electromagnetic interference, or parts that must communicate rapidly).

[9] A diagram may include text labels that are not in boxes at all. One can take these to have an implicit shape (their bounding box), or adjust (2) to permit some non-shapes.

[10] A tighter analogy might be if XML schemas provided affirmative places within a given element type, where anchors could be placed: for example a definition list's model might only permit inbound (or outbound) anchors to (or just before) terms, not definitions. This could be accomplished with various schema languages, but is not common.

[11] One could, of course, define an entirely new schema that includes any classes one likes, and then transform it to SVG as a step in rendering.

[12] A few applications may be exceptions, and in handcrafted SVG this is easier.

[13] SVG 1.1 drew things in document order; SVG 2 adds nestable stacking contexts.

[14] Though I will not discuss it further here, Graphviz (https://graphviz.org) has a quite similar sparse node/edge model, with a similarly large set of options to affect rendering, and a wide variety of automatic layout methods.

[15] This seems an odd choice, because it wastes far more space than XML's purported verbosity ever did, makes consistency hard to discover or maintain, and violates programming dicta such as don't repeat yourself.

[16] However, soueidan (2016) notes that one can simulate relative positioning by nesting an entire <svg> within another, which establishes a nested coordinate system that is then placed into the container coordinate system (https://www.sarasoueidan.com/blog/mimic-relative-positioning-in-svg).

[17] One can clone just the shape, not group it with a text object, then add a separate text object that happens to overlay each clone. But this fails to express the relationship.

[18] Yes, that does happen (or nearly so) with some word processors; but rarely if ever with markup languages of any kind; and if it does, it is relatively easy to fix compared to, say, the path-interpretation problem discussed above.

[19] As noted earlier, there are exceptional element types that cannot be adequately rendered or interpreted by generic tools, such as math, HTML canvas, and the interactive aspects of links and forms.

[20] Flowchart symbols reached much their current form with ISO 5807 (1970), in turn based on a 1960 ANSI standard.

Author's keywords for this paper:
XML; SVG; Vector graphics; Document models