Overview

This paper is about using Invisible XML (IXML) to experiment with a textual interface over a complex underlying set of capabilities. The output from IXML is an XML document representing the parse tree, which an XQuery program interprets to make the appropriate API calls and produce the desired behaviour.

The IXML grammar specifies which non-terminals will be represented in the tree and whether they will be represented by elements or attributes. IXML rules can further modify the output to suppress or introduce various terminals as well. Because of this flexibility in determining what the XML parse tree will look like, processing of the parse tree can be decoupled from the details of the textual interface. This makes experimentation with those details easier.

Given a framework for experimenting with the formulation of a textual interface, how should we proceed with that experimentation? I suggest an approach based on the idea of treating the API as a conlang (constructed language). That means we think about the total space of syntactic, morphological, and symbolic possibilities, and choose which to apply with intention. It also means acknowledging that we are dealing with a language that, despite being interpreted by computer programs, is also and primarily for human beings to communicate with each other. The choices we make in designing our conlang will make things more or less difficult for them.

My context here is making a textual interface to use to create complex drawings using high-level artistic components. The idea is that to create a scene of five small silver fish covering a black canvas, instead of writing a pile of code and organizing the appropriate API calls, the driver does all that and you just give it an input string like "5 small silver fish cover black canvas". I will show something of my experimentation and discuss how the analysis of the conlang for that interface has also helped with my own understanding and rationalization of the underlying components and how they function.

Normalizing with IXML

An IXML grammar consists of a series of rules relating a non-terminal symbol to a series of non-terminal symbols and terminal strings along with various combining operators. The grammar is used to parse an input string to produce an XML document. Each non-terminal can be marked to indicate whether it should be included as an element with the same name (the default), included as an attribute with that name, or excluded entirely. Terminals are included as content unless marked to be excluded. In addition, terminals can also be introduced de novo without having to be present in the string being parsed. The current working draft of the IXML specification also supports the ability to provide new names for the elements and attributes instead of using the non-terminal names directly. These features can be used to normalize the output.

IXML can normalize output in four ways:

  1. By using the inclusion and exclusion symbols on terminals, IXML can normalize element and attribute contents

  2. By using inclusion and exclusion symbols on non-terminals, IXML can normalize the nesting structure

  3. By using attribute markers on non-terminals, IXML can normalize the output ordering

  4. By using rename markers on non-terminals, IXML (working draft) can normalize the names of elements and attributes

Let's look at a concrete example.

The grammar fragment in Figure 1 gives a simple infix grammar for addition and subtraction expressions. Read from top to bottom it says that an expression (expr) is a left operand (left) followed by an operator (op) followed by a right operand (right). Each of these operands is a number (number) and should be exposed as attributes (the @ symbol). A number is a sequence of digits from 0 to 9 but there should be no number element in the output (the - symbol). The operator is either addition (plus) or subtraction (minus) and should also be exposed as an attribute. Finally the plus and minus operators are represented with the appropriate symbol and those elements are also removed from the output.

Figure 1: IXML grammar for simple infix expression

expr = left, op, right .
@left = number .
@right = number .
-number = ["0" - "9"]+ .
@op = plus | minus .
-plus = "+" .
-minus = "-" .

Given this grammar, parsing of the string "4+5" gives the XML document <expr left="4" op="+" right="5"/>.

The expr non-terminal is included as an element name, the left, right, and op non-terminals are included as attributes (using the @ symbols), and the plus, minus, and number non-terminals are represented only by their content in the parsed XML (due to the - symbols).

A different representation for simple arithmetic operators is a functional notation. Figure 2 gives a grammar for that. It says an expression is an operator followed by the left and right operands in parentheses and separated by a comma. Here the operator is represented by a name ("plus" or "minus") but that name should be removed from the output and replaced with a corresponding symbol.

Figure 2: Alternative IXML grammar for simple prefix expression

expr = op, -"(", left, -",", right, -")" .
@left = number .
@right = number .
-number = ["0" - "9"]+ .
@op = plus | minus .
-plus = -"plus",+"+" .
-minus = -"minus",+"-" .

In this grammar the operator is given in functional form with a name instead of using an infix operator symbol. The rules for the plus and minus non-terminals normalize the result by removing the operator name (with the - symbol) and introducing the operator symbol (with the + symbol).

Parsing of the string "plus(4,5)" therefore also gives <expr left="4" op="+" right="5"/>.

IXML has normalized the operator names to the operator symbols, and since both the operator and the operands are attributes, the order doesn't matter, so that has been normalized as well. By removing extraneous terminal and non-terminals, the expression has been boiled down to its essence, so the nesting structure has been normalized as well.

Attribute order might be different, but that doesn't matter. If the interpreter of the XML doesn't care about element order either, that gives us even more flexibility. While IXML 1.0 doesn't currently let you change the names of elements and attributes, the working draft does. If all else fails, adding a simple XSLT transform can make the adjustment. We can then experiment with different ways of expressing the operations of the API without having to touch the underlying interpreter. IXML normalization insulates the interpreter from experimentation in the textual interface.

In my case I am experimenting with a textual interface to various drawing components. The textual interface describes the desired scene. The code that interprets the interface and makes the appropriate calls to the drawing components is fairly complex. Being able to experiment with how to express the desired outcome without having to perform major rewrites of that code is very helpful.

Conlangs

After being exposed to the wonderful variety of morphological and syntactic features present in the world's languages, students of linguistics are often tempted to try their own hand at pulling together a few of their favourites to make their own language. Long before Hollywood science fiction and fantasy epics turned this into a cottage industry, many a linguist filled little notebooks with syntax diagrams and morpheme lists for their own constructed language (conlang for short).

I am no exception.

Rather than invent Dothraki, I went on to a career in software. My experiences with conlangs stuck with me, however. When I analyze computer languages and APIs, or design them, I try to take a conlang stance:

  1. Be intentional in your choices

  2. Enlarge the range of choices you consider

  3. Remember this is a language for humans: be systematic, but allow for appropriate exceptions

The actionable implementation of this approach is to determine what the important entities and grammatical classes and relationships are in your domain and then decide how to indicate them with vocabulary and syntactic devices. Here are some of the things to think about.

Consistency sets expectations

Even where function syntax is determined by the programming language, the syntax of the function and parameter names is not. The ordering of parameters is a choice. Even in cases where parameters are identified by name rather than ordering, the order used in examples and API documentation is still a choice. The set of parameters and how to expose them is a choice. Each of these choices creates expectations that make the API easier or more difficult to grasp.

Let's look at a simple example.

Figure 3 shows a set of function names. In this case the names are all consistently a target (a noun) followed by an action (a verb). Almost. The morpheme "properties" is not an action at all: it names the result of the action. Still, the symmetric force of the API encourages us to think of it as an action, perhaps an elided form of "get-properties".

Figure 3: Simple API: Target plus action

document-get()
document-delete()
document-properties()
collection-get()
collection-delete()
collection-properties()
link-get()
link-delete()
link-properties()

What other inferences does this API encourage us to make? We probably assume that "document", "collection", and "link" are all alike in some way, and that "get", "delete", and "properties" are all alike in some way. The API is also telling us that the most important thing is the target. What the functions actually produce is less important. We probably further assume that if we encounter a new relevant target "thingy" and a new relevant action "reckon", that we would find functions named thingy-reckon() and collection-reckon().

In this example we're looking at pure function names, but this is the same approach taken when we have classes named for targets in object oriented programming and methods named for simple actions. This is the same approach taken when creating fine-grained namespaces that name targets and simple action names for functions in that namespace.

Figure 4: Simple API: Target plus action, namespace style

document:get()
document:delete()
document:properties()
collection:get()
collection:delete()
collection:properties()
link:get()
link:delete()
link:properties()

What happens if we turn things around: action first?

Figure 5: Simple API: Action plus target

get-document()
get-collection()
get-link()
delete-document()
delete-collection()
delete-link()
properties-document()
properties-collection()
properties-link()

First, the non-action "properties" starts looking ever more problematic. It really isn't an action, and this formulation makes that gratingly obvious. "get-properties" really would work much better here.

We're still going to assume that "document", "collection", and "link" are all alike in some way, and that "get", "delete", and "properties" (urgh) are all alike in some way, although the mismatch between what we expect to be an action and its English name might make us start to assume that there is something really quite different about the "properties" series of functions. We're going to expect get-thingy() and reckon-collection(). Furthermore, this leads us to think of the actions as the main thing. We may even expect to see generic get(), delete(), and properties() methods that dispatch to these specific ones. This is not at all an expectation from the target-first approach, and indeed with that approach it can be harder to imagine such methods or how to name them.

This is the approach taken when we use namespaces for specific kinds of actions, where the actions operate over different targets.

Figure 6: Simple API: Action plus target, namespace style

get:document()
get:collection()
get:link()
delete:document()
delete:collection()
delete:link()
properties:document()
properties:collection()
properties:link()

From a conlang perspective we are collecting vocabulary lists here, and inventing grammatical categories and syntax rules for combining items in these categories. Users of these APIs can use these rules and implicit categories to navigate the API and make predictions about it. Developers of the API can use the rules to make decisions about how to extend it.

Where there are no rules, APIs become less predictable and therefore less usable. People being pattern-finding creatures, they will try to make sense of the mess in front of them, finding faces in the random rocks of Mars.

Figure 7: Simple API: Mayhem induces paradeidolia

get-document()
document-delete()
properties-of-document()
collection-get()
delete-collection()
collection-properties()
link()
link-delete()
link-properties()

Folks might assume from the collection of names in Figure 7 that "document", "collection", and "link" are all different. They might assume that "get" is an operation that doesn't apply to "link", that there is something fundamentally different about "properties-of" applied to "document" and "properties" applied to "collection". Perhaps getting properties is a more complex operation for "collection" and "link". What should we predict about "thingy" and "reckon"? thingy-reckon() or reckon-thingy()? Hard to know. As folklore has taught us for millennia: names matter.

People will invent reasons for differences and similarities, regardless of intention. Best make them intentional.

Word order

Let's take another example. Consider what it looks like when you are applying a series of operations one after another. Programming languages tend to conceive of operations as being done by some anonymous and ubiquitous agent (the computer) hence the imperative style. The functional language viewpoint, however, takes the "thing being operated on" (in the classic view) as the agent, the one performing the action. So let's run with that viewpoint and deem that agent as the subject (S), the action as the verb (V), and the operands as the object (O). When we combine a series, one whole operation (e.g. S+V+O) embeds in the next as its subject. What does embedding look like with different word orderings?

Consider this representation in XQuery arrow syntax of a series of operations: first a negative translation, then a scaling by half, and then a positive translation.

Figure 8: Left embedding: (S V O) V O

$g=>
  transform:translate(-800,-800)=>
  transform:scale(0.5,0.5)=>
  transform:translate(800,800)

This kind of embedding is like English syntax, and for English speakers reads naturally from left to right, telling us what happens in temporal order. It isn't the only way to go about things, however. The conlang perspective tells us to consider the range of possibilities, to look outside our familiar tongue.

Here is the same set of operations, but represented as the value of an SVG transform attribute. The operations apply from the right to the left.[1]

Figure 9: Right embedding: V O (V O S)

<g transform="
  translate(800, 800) scale(0.5, 0.5) translate(-800, -800)
"/>

SVG appears to use this ordering to be consistent with the ordering you'd see if the transforms were partitioned into child groups instead. You can add more operations by prepending them to this string or by wrapping the transformed element in a new group with a new transform attribute with the new operator. The transforms apply from the inside out: in text order, from right to left. You can think of this as the "but first" approach: "translate by 800, but first scale by half, but first translate by negative 800". You're saying the most important thing is what happened last, but straining the memory of your readers on figuring out the order of events.

XQuery functional syntax uses center embedding with VSO order.

Figure 10: Center embedding: V (V S O) O

transform:translate(
  transform:scale(
    transform:translate($g, -800, -800),
    0.5, 0.5
  ),
  800, 800
)

The English gloss of this is something like "translate the scaling of the translation of the item by -800 by half by 800". In natural language, center embedding often leads to comprehension difficulties, stressing human short-term memory as you build up a stack of pending items before getting to the center where you can start to unroll things.

There are other logical possibilities.

SOV ordering is actually more common in languages of the world than English's SVO. With left embedding this is the order of certain kinds of calculators.

Figure 11: Left embedding: (S O V) O V

$g ⟽ (-800, -800)translate ⟽ (0.5, 0.5)scale ⟽ (800, 800)translate

This ordering tends to emphasize the operands over the operators and creates a sense of closure for each operation. This is also an "about-that, and-then" kind of formulation, something like "take minus 800 and translate by that, and then take half and scale by that, and then take 800 and translate by that".

OVS and OSV orders are quite rare in natural languages, but they do occur.

Figure 12: Center embedding: O (O S V) V

(800, 800) (0.5, 0.5) (-800, -800) $g translate scale translate

This ordering reminds me a little COBOL data and procedure sections: first all the data you need, then the series of operations. For that matter, a lot of GPU programs look like this too. An English gloss of is the "respectively" construction, something like "take 800, and half, and -800: now translate, scale, and translate by those respectively".

Finally, for completeness' sake, here's what OSV with right embedding looks like, a combination of "but first" and operand-forward:

Figure 13: Right embedding: O V (O V S)

(800, 800)translate (0.5, 0.5)scale (-800, -800) translate $g

"Take 800 and translate by it, but first take half and scale by it, but first take -800 and scale by it."

Grammatical gender

English is relatively impoverished when it comes to markings for case, mood, tense, person, gender, and so on. Our computer languages need not be similarly impoverished. One way to think about Hungarian notation is that it constructing grammatical gender classes for variables and marking them with gender prefixes. For example the variables named szName, pName, and nName are marked with the affixes sz, p, and n to indicate that they are, respectively, a zero-terminated string, a pointer, and a count. Number (1 versus many) is also marked: aszName for an array of strings as opposed to szName for a singleton.

Other gender and number systems are possible. Sometimes APIs will use their own markers for some idiosyncratic purpose. For example, the Windows' API uses m_ to mark a data member of a class, g_ for a global, cr for a colour reference value, and so forth. These combine, so that m_psz is a pointer (p) to a zero-terminated string (sz) that is a member of the class (m_).[2]

This kind of thing gets criticized for getting in the way of readability (when the markers are small abbreviations), writability (when they are longer words), and for introducing the possibility of a mismatch between what a name claims and what it actually is and thus impeding refactoring. It is most useful when the naming convention tells you something the compiler or interpreter couldn't already tell you, such as whether a string is safe or unsafe, that is, an internal string or a string coming from input that needs to be sanitized.

Mood

Computer languages tend to operate in an imperative mood, giving commands to the computer, as it were. But let's consider, in the conlang way, the full range of possibilities. Is imperative really the best for a particular situation? Is there really only one mood in play? And if not, how should we indicate the variation?

First, let's look at some of the possibilities (by no means an exhaustive list):

Mood Paradigm Usage
imperative (you) do that commands
interrogative does he do that? queries
indicative he does that statements
subjunctive he would do that if he were able counterfactuals
conditional he would do that if he were able conditionals
presumptive he must be doing that probable truths
potential he likely did that likely truths
hypothetical he might have done that possible truths
inferential I conclude that he did that inferences

Natural languages don't generally distinguish all these situations in a systematic way, and English typically just uses modal verbs of some sort for most of these. We can analyze what some computer languages do in these terms as well.

Consider SPARQL CONSTRUCT versus ASK. It is in the usual imperative style, but we could also analyze it as "CONSTRUCT" being a modal verb to mark the indicative and "ASK" as marking the interrogative. Given that analysis, the distinction could have been marked in some other way.

If we consider a language such as RDF for representing knowledge, wouldn't it be interesting to consider the range of moods to express state of certainty (inferential, presumptive, hypothetical, et al) and finding a way to express that in the knowledge representation language?

At the very least, it is worth considering whether a particular part of the API is operating in imperative, interrogative, or indicative mood. This does get reflected even in names. For example, get-properties() is imperative ("(computer) get me the properties of ..."), and properties() is indicative ("the properties of ... are ..."). I find my libraries in functional languages are drawn towards the indicative because it describes an outcome: this and that will happen (somehow). The imperative tends to suggest an order of operations: do this then do that.

The language is for humans

The conlang approach suggests that we consider that the language is for use by human beings. Software developers often forget this, assuming that the purpose of a computer language is to be interpreted by some bit of software. While that is true, it is also a misleading and limited view. Software APIs need to communicate with human beings as well: humans who use them, humans who debug them, humans who refactor and extend them.

The argument for Hungarian notation markers is exactly this: knowing at a glance that something is a counter versus an index helps humans better understand what the software does. Software libraries are full of all kinds of little conventions like this, even if they are not systematically defined. Personally, I will use ix to mark an index that may be unordered rather than strictly sequential. Thus $jix is a positive integer in some range, but $j will be a positive integer running from 1 to the upper limit.

A consequence of keeping humans in mind is that it becomes important to maintain fairly consistent rules even for aspects of the API that aren't important to the computer. We've seen a few examples of this already: a standard ordering for targets and actions in the naming of functions, or the use of markers to indicate classifications of variables that aren't important to the compiler.

Another example to consider is the robust, cross-linguistic observation that higher frequency (Zipf36), more predictable (Levshina22), or simpler (Lewis16) terms tend to be shorter. Thus: "a", "it", "cat" versus "jaguar", "caterpillar", "transcendentalism". Default case, gender, tense, etc. markers tend to disappear entirely. Thus we have "il entend" (present) but "il entendra" (future) and "il entendit" (past).

The application to APIs is direct: simple common functions should have short names. Markers for default or common situations should be shorter than those for non-default and less common situations, or absent entirely. This is a plausible rationale for the use of "properties" as an action in the prior examples: keeping common operations shorter in their expression.

Another thing humans expect to see is short and consistent case frames. Here's one analysis of case roles for English clauses (Winograd83, Figure B-13):

Process type Central participants
Material Process Agent, Medium, Range, Beneficiary
Mental Process Cognizant, Phenomenon
Verbal Process Sayer, Addressee, Verbalization
Attributive Carrier, Attribute
Equative Identified, Identifier
 
Circumstances (all types) Location, Extent, Motion, Cause, Manner, Accompaniment, Purpose, Matter

The idea is that there are a few central roles for each kind of process. Given a set of API functions it is useful to think through the types of processes they represent, the central participants for a collection of similar operations, and be consistent in how they are ordered and indicated. The types of processes and their participants are not necessarily identical to those of English. It is also useful to distinguish central participants (such as Medium in Material Process) from secondary participants (such as Manner in Circumstances). There is always a temptation in software to start larding up functions with an endless array of parameters and options. The conlang lesson here is that less is more and that a little analysis and organization can prevent a complicated and unusable mess down the road.

Invisible Fish

Experiments with textual art interface

The textual art shell operates in the following fashion:

  1. Text is input to a command prompt, for example:

    art> 5 small silver fish cover deepsea canvas at random

  2. IXML parses and normalizes the input. A small XSLT stylesheet provides a bit of contextual defaulting. The result is an XML description of the desired scene:

    <canvas tries="1" resolution="medium">
       <drawing>
          <compound repeat="singly" fill="cover" placement="at">
             <chosen-group>
                <group-op group-kind="ordered"/>
                <group>
                   <counted-group count="5" diversity="unique">
                      <group>
                         <thing kind="fish">
                            <option name="scaling" number="0.0625"/>
                            <option name="colour" colour="silver"/>
                         </thing>
                      </group>
                   </counted-group>
                </group>
             </chosen-group>
             <quantified-division>
                <quantifier quantifier-kind="100"/>
                <division>
                   <space kind="canvas">
                      <option name="scaling" number="NaN"/>
                      <option name="visibility" boolean="true"/>
                      <option name="colour" colour="deepsea"/>
                   </space>
                </division>
             </quantified-division>
             <quantified-arrangement>
                <quantifier quantifier-kind="100"/>
                <arrangement>
                   <location count="5" location-kind="random"/>
                </arrangement>
             </quantified-arrangement>
          </compound>
       </drawing>
    </canvas>

  3. The art shell driver looks up, dynamically loads, and calls the appropriate components and performs the appropriate scene adjustments, generating the result as an SVG file.

    5 small silver cartoonish fish scattered randomly across a deep blue canvas, as requested

What I'm going to focus on here is using IXML normalization to experiment with different kinds of expression of the desired outcome, and some possible variations driven from a conlang perspective.

The initial goal of this project was straightforward enough: drastically reduce the effort in combining some of the drawing components into a new work. Creating a new work already involved starting with a template and filling in the gaps with the appropriate combination of function calls. I was interested to see how much of this effort could be abstracted away and still be able to produce a variety of interesting results. The driving conlang principle here is: this is a language for humans (specifically: me). I wanted to keep things simple enough to remember, while retaining as much power as possible.

The first question was one of mood: imperative or indicative? Since for me the only possible command was "draw", adding it seemed redundant. Operating in an indicative mood made it possible to use verbs to represent scene relationships in a way that worked out well.

A quick note on ambiguity

A number of the grammar fragments in this section are ambiguous, allowing multiple possible parses. While it is prudent to avoid ambiguity as much as possible, I don't tend to worry about it much. I use a system (CoffeeSacks) that lets me define a function to resolve ambiguity appropriately.

The function I use takes a greedy approach and always picks the option with the longest matching prefix. This resolution mechanism is easy to implement, easy to understand, and gives me good control over the results. It is conformant with how many regular expression processors resolve ambiguity, so it also has the advantage of familiarity. I recommend it.

Representing things: characteristics of things

When something is drawn, it will have various characteristics: size, colour, and so on. How should these be expressed? There are a number of options: as a single function with a set of parameters, as a specific function with named parameters, in a more natural English style where the parameters become attributes modifying a noun, or expanding our horizons in the conlang way, through using attributive affixes. Each of these variations uses a different IXML grammar to produce the same XML for the driver to process. Let's look at a few of them, all to express the idea of a group of five small fish rendered in red with a speckled stylistic effect.

Function-style syntax with ordered arguments

First let's use a functional representation, resembling a light gloss on an actual underlying function call. The expression would be:

thing(fish, 5, 0.25, red, speckled)

The relevant part of the IXML grammar turns the arguments into attributes on the thing element:

thing = -"thing", -"(", s?, (kind, -",", s), (count, -",", s), (size, -",", s), (colour, -",", s), (styling), s?, -")" .
@kind = "fish" | "octopus" | "jellyfish" | "ship" .
@count = integer .
@colour = name .
@size = number .
@styling = ("speckled" | "crosshatched") .

The XML output from parsing the example input with this grammar therefore looks like this:

<thing kind='fish' count='5' size='0.25' colour='red' styling='speckled'/>

This is straightforward, but we are already veering into fairly long parameter lists, and that's before we get into things like brush strokes, pen angles, and texture variations. Which parameter was colour, again? Let's see if we can do better.

Functions with named arguments

The expression in this case shifts the kind of thing to the function name, and uses name equals value pairs for the drawing characteristics, allowing them in any order:

fish(count=5, size=0.25, colour=red, style=speckled)

The relevant fragment of the IXML grammar drops the intermediate options and option elements and uses the argument names to identify which attribute to create, dropping the extra bits of syntax:

thing = kind, (s?, options)? .
@kind = "fish" | "octopus" | "jellyfish" | "ship" .
-options = -"(", s?, option++(-",", s?), s?, -")" .
-option = count | colour | size | styling .
@count = -"count", -"=", integer .
@colour = -"colour", -"=", name .
@size = -"size", -"=", number .
@styling = -"style", -"=", ("speckled" | "crosshatched") .

Because of the normalization done via IXML, we still get the same XML output:

<thing kind="fish" count="5" colour="red" size="0.25" styling="speckled"/>

This formulation keeps us from having to remember the proper parameter order. It also lets us leave out some parameters, so defaults will need to be supplied. A more sophisticated grammar could do so, or we could rely on the driver to supply them.

Adjectives before noun

What about using a more natural English style with attributes modifying a noun? Something like:

5 wee red speckled fish

We could allow the adjectives in any order, but the truth is that English speakers expect certain classes of adjectives to come before other classes of adjectives, so that "five small red fish" is fine, but "five red small fish" is weird.[3] So this grammar fixes the order, which helps manage the ambiguity caused by colour being an open class.

The IXML grammar normalizes size adjectives to numeric values and suppresses the intermediate options element:

thing = (options, s)?, kind .
@kind = "fish" | "octopus" | "jellyfish" | "ship" .
-options = (count, s)?, (size, s)?, (colour, s)?, (styling)? .
@count = integer .
@size = (-"enormous", +"4.0") |
        (-"large", +"2.0") |
        (-"standard", +"1.0") |
        (-"small", +"0.5") |
        (-"wee", +"0.25")
.
@colour = name .
@styling = ("speckled" | "crosshatched") .

Again, we end up with the same XML output:

<thing count="5" colour="red" size="0.25" styling="speckled" kind="fish"/>

Using affixes

It is time to push the envelope a bit in the conlang fashion. Let's use diminutive and augmentative suffixes for the size and an attributive prefix for the styling. An example of that looks a little like a snippet of Moby Dick:

5 red bespeck-fishies

The IXML grammar normalizes away a number of non-terminals, and turns the affixes back into the standard values we had before:

thing = (options, s)?, (prefixes)?, kind, (suffixes)? .
@kind = "fish" | "octopus" | "jellyfish" | "ship" .
-options = (count, s)?, (colour)? .
@count = integer .
@colour = name .
-prefixes = styling, -"-" .
@styling = (-"bespeck",+"speckled") | (-"behatch",+"crosshatched") .
-suffixes = size .
@size = ((-"zilla"|-"zillas"), +"4.0") | ((-"adon"|-"adons"), +"2.0") |
        (+"1.0") |
        ((-"ling"|-"lings"), +"0.5") | ((-"y"|-"ies"), +"0.25") .

And once again, the XML output is exactly the same:

<thing count="5" colour="red" size="0.25" styling="speckled" kind="fish"/>

Things, spaces, arrangements

The basic drawing unit is not just some things (such as 5 wee red fish) but a thing at a particular location within some space. More generally it is a group of things within a set of spaces arranged in some particular way. For example, we might say "5 fish fill small box in spiral". The fish are the things, the box is the space, and the spiral is the arrangement. Each of these is constructed and managed in its own way. Arrangements are not rendered directly: only the locations associated with them matter. Groups may or may not be rendered, depending on their characteristics. A blue box would need to be rendered, but a box with no colour or other drawing characteristics would not. A thing is always rendered. The arrangement is relative to the space. Here's the thing: the same kind of item may operate as a thing, a space, or an arrangement. How can we indicate and distinguish these roles?

Another way to think about this is: the major participants in a Drawing process are a Group (of things), a Division (of spaces), and an Arrangement (of locations). Given that case frame, how should we mark the case roles?

Word order

The first thing to try is basic word order: Group fill Division in Arrangement. For example:

5 wee circles fill large circle in standard circle

The IXML grammar fragment does the usual kinds of normalization we've seen before, dropping extraneous information from the output:

drawing = thing, s, -"fill", s, space, s, -"in", s, arrangement .

thing = (options, s)?, kind .
@kind = "circle",(-"s")? .
space = (options, s)?, kind .
arrangement = (options, s)?, kind .
-options = (count, s)?, (size)? .
@count = integer .
@size = (-"enormous", +"4.0") | (-"large", +"2.0") | (-"standard", +"1.0") |
        (-"small", +"0.5") | (-"wee", +"0.25") .

The result is the set of participants in this case frame:

<drawing>
   <thing count='5' size='0.25' kind='circle'/>
   <space size='2.0' kind='circle'/>
   <arrangement size='1.0' kind='circle'/>
</drawing>

This approach gives us something more like a natural English sentence. The normalization of plural forms helps here as well. The input text reads easily and could be used as the basis for the alternative text of the image directly. The presence of a verb offers us the opportunity for different verbs to express important aspects of the scene relationships. In the full grammar I use "fill" to indicate that the things should be scaled to fit the space, whereas "cover" means to leave them be, and "inhabit" means they should be clipped to the space.

On the other hand, this formulation is "natural" with strong limits on what can be expressed, making it tricky to know where the boundaries are until we trip over them. So it is naturalistic, but not really natural. Depending on the specific spaces and arrangements used, the particular verb "fill" and the preposition "in" may be more or less apt. For example "in" reads OK in "in standard circle", but "at" reads better for "at top-left". Furthermore, once we get into more complex scenarios where we are constructing divisions of spaces created from the points of the boundaries of things and so on, "in" because an interesting preposition to use for other purposes.

Using case markers

In the conlang fashion, we consider some alternatives. Many languages indicate case roles not with word order but with particles or inflections. Thus Latin "equus" (nominative) versus "equum" (accusative) versus "equo" (dative) or Russian "работа" versus "работу" versus "работе". Here I use no marking for the Group and prepositions as particles for the Division and Arrangement.

5 wee circles large in-circle standard at-circle

The IXML grammar uses the particles to determine which participant is which and removes the extraneous item element.

drawing = item++s .

-item = thing | space | arrangement .
thing = (options, s)?, kind .
@kind = "circle",(-"s")? .
space = (options, s)?, -"in-", kind .
arrangement = (options, s)?, -"at-", kind .
-options = (count, s)?, (size)? .
@count = integer .
@size = (-"enormous", +"4.0") | (-"large", +"2.0") | (-"standard", +"1.0") |
        (-"small", +"0.5") | (-"wee", +"0.25") .

The result is the same as before:

<drawing>
   <thing count='5' size='0.25' kind='circle'/>
   <space size='2.0' kind='circle'/>
   <arrangement size='1.0' kind='circle'/>
</drawing>

Notice that the grammar does not impose an order on the participants any more. We could give them in any order. For example:

large in-circle 5 wee circles standard at-circle

The XML output is different, in that the participants are given in a different order, but the interpreter doesn't care about that, so this amounts to the same thing:

<drawing>
   <space size='2.0' kind='circle'/>
   <thing count='5' size='0.25' kind='circle'/>
   <arrangement size='1.0' kind='circle'/>
</drawing>

The pluses and minuses of this formulation are the inverse of the word order approach. It reads less naturally, even using English prepositions rather than Latin case endings or something made up, but its artificiality makes it less likely to confuse. Free word order seems like a boon, but in natural language it mainly helps by allowing us to emphasize new or important information in conversation. That may apply to image descriptions for a series of related works: this image is about using a large circle as the space, and that image is about using a large square instead. It seems a fairly tenuous benefit, especially given the artificiality of the expression.

Let's get agglutinative

Finally, let's just have a little fun pushing the envelope and taking a non-English approach entirely, where both size and case role are represented in suffixes. Here's the example:

5 circley circleadontown circledot

The suffix "y" means wee, "adon" means large, "town" marks the Division, and "dot" marks the Arrangement. Standard size and Group are unmarked.

In this grammar the order is still free, and we normalize the suffixes:

drawing = item++s .

-item = thing | space | arrangement .
thing = (quantifier, s)?, kind, suffixes .
@kind = "circle",(-"s")? .
space = (quantifier, s)?, kind, suffixes, -"town" .
arrangement = (quantifier, s)?, kind, suffixes, -"dot" .
-quantifier = count .
@count = integer .
-suffixes = size .
@size = ((-"zilla"|-"zillas"), +"4.0") | ((-"adon"|-"adons"), +"2.0") |
        (+"1.0") |
        ((-"ling"|-"lings"), +"0.5") | ((-"y"|-"ies"), +"0.25") .

The result, as ever, is the same:

<drawing>
   <thing count='5' kind='circle' size='0.25'/>
   <space kind='circle' size='2.0'/>
   <arrangement kind='circle' size='1.0'/>
</drawing>

This is playful and allows for the free order, but requires extra knowledge to understand, even using suffixes that attempt to resonate with English speakers.

Open classes, closed classes

So far the grammar fragments have enumerated the allowable things, spaces, and so on. The real more complex grammar leaves some of these open. A thing can be anything. This is an open class. A simple space is also an open class, but started as a closed class, with an enumeration of valid members:

thing = thing-pre-options, kind, (s?, options)? .
@kind = simplename .

space = space-pre-options, space-kind, (s?, options?) .
@space-kind = "box" | "circle" | "ngon" | "canvas" .

The driver looked up all the kind attributes to load the appropriate components. As the language developed, and the component bindings for more spaces were created, I shifted space to an open class. The grammar uses the rename marker (working draft) to turn the space-kind attribute to kind. The lookup code stayed the same and any rules in the grammar using space-kind stayed the same:

thing = thing-pre-options, kind, (s?, options)? .
@kind = simplename .

space = space-pre-options, space-kind, (s?, options?) .
@space-kind>kind = simplename .

I could have just changed space-kind to kind throughout the grammar, but I liked having the grammar rules be more specific in this way.

Locations created a stronger challenge. Location is a closed class, complicated by the presence of additional information for some kinds of locations and not others. Specific locations such as "center" are unique. There can be only one. But random locations could be multiple and accept a count. I wanted to simplify the representation for the processing so that there would always be a location-kind attribute along with a count attribute where relevant. Renaming solves this:

location =
  location-kind |
  (random-location-kind, ((-"-" | s), count)?)
.
@location-kind =
  ("center" |
  "top-left" | "bottom-right" | "top-right" | "bottom-left" |
  "top-center" | "bottom-center" | "left-center" | "right-center"
  ), -"s"?
.
@random-location-kind>location-kind =
  ("random" | "radial" | "poisson"), -"s"?
.

Both the location kind and the count will appear as attributes on the location element, and the naming of the kind will be consistent regardless of whether it is a countable kind or a unique kind. The grammar properly constrains the use of the count.

A similar problem arises with drawing options. Rather than having a special attribute for colour, and another for size, and then separate elements for other various options, I wanted it all unified and consistently represented using the element option with a name attribute and then a type-based attribute as appropriate.

The grammar uses renaming extensively to normalize the names:

thing = thing-pre-options, kind, (s?, options)? .
@kind = simplename .
-thing-pre-options = size-option?, colour-option? .
size-option>option = size-option-name, size, s .
@size-option-name>name = +"size" .
@size>number =
  (-"enormous", +"4.0") |
  (-"large", +"2.0") |
  (-"standard", +"1.0") |
  (-"small", +"0.5") |
  (-"wee", +"0.25")
.

colour-option>option = colour-option-name, @colour, s .
@colour-option-name>name = +"colour" .

-options = -"(", option++(s?,-",",s?), -")" .
option = @name, s?, -"=", s?, value .
-value = @number | @string | @boolean | @colour .

Although the size option is defined in terms of strings, it ends up being represented as an option with a numerical value. A string like "wee red fish(styling="speckled")" produces a consistent set of options:

<thing kind="fish">
  <options>
    <option name="size" number="0.25"/>
    <option name="colour" colour="red"/>
    <option name="styling" string="speckled"/>
  </options>
</thing>

This simplifies passing drawing options to the maker functions.

Grammatical gender

Some things cover area. We could use them as clipping bounds or as spaces. We could decorate the area they cover in various ways (crosshatching or speckling, for example). Some things consist of strokes. They can be used as paths. We could draw them with different pens (tapered strokes or dashed strokes, for example). They have no interior to decorate. Some things may be able to be viewed in either light. Is a red circle a circular red line or a circular red area?

It is the kind of distinction that grammatical gender can help with.

thing = thing-pre-options, kind, gender?, (s?, options) .
@gender = (-"-", "stroke") | (-"-", "zone") .

The driver can then determine that:

5 wee red dashed circles-stroke
should produce a dashed red circular line, and that:
5 wee red dashed circles-zone
should produce a red circular area filled with dashes.

Conclusions and Summary

IXML can be used to normalize the parse tree for textual languages. That is, we can get the same parse tree from radically different textual interfaces by using the normalization capabilities of IXML, applying different grammars to produce the same results. That enables evolution of the rules of textual interface language without having to modify the underlying interpretation of what the textual interface is describing.

IXML accomplishes this normalization by providing controls for which non-terminals to include as elements or attributes, and which non-terminals to exclude entirely. Terminals can be elided or introduced. Finally, in the working draft at least, non-terminals can be renamed for further normalization.

One approach to how to develop a textual API is to treat it as a constructed language, or conlang, which means being mindful of the choices, looking further afield for the true range of options, and remembering that even a computer language is a language for humans. Computer languages and APIs may not be as complex and subtle as natural languages, but applying a linguistic analysis can help us understand the moving parts nevertheless.

We've seen various examples drawn from a textual interface created for exposing complex drawing operations in a simplified way.

Applying the conlang approach helped me realize what the different kinds of participants in the drawing were, which led to construction of maker functions that took this role into account. It also led me to an understanding of the different gender classes of things, which also led to augmentations of the construction functions.

There is much more to be done.

References

[Invisible XML] W3C: Steven Pemberton, editor. Invisible XML Specification Final Community Group Report. W3C, 12 December 2023. https://www.w3.org/community/reports/ixml/CG-FINAL-ixml-20231212/.

[Levshina22] Natalia Levshina. "Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora". In Entropy (Basel) 2022;24(2):280. doi:https://doi.org/10.3390/e24020280. Available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8870940/.

[Lewis16] Molly L. Lewis and Michael C. Frank. "The length of words reflects their conceptual complexity". In Cognition 2016;153:182-195. doi:https://doi.org/10.1016/j.cognition.2016.04.003. Available at https://www.sciencedirect.com/science/article/pii/S0010027716300919.

[XSLT] W3C: Michael Kay, editor. XSL Transformations (XSLT) Version 3.0 Recommendation. W3C, 8 June 2017. http://www.w3.org/TR/xslt-30/.

[XQuery] W3C: Jonathan Robie, Michael Dyck, Josh Spiegel, editors. XQuery 3.1: An XML Query Language Recommendation. W3C, 21 March 2017. http://www.w3.org/TR/xquery-31/.

[Saxon] Saxonica. Saxon, https://www.saxonica.com/products/products.xml.

[CoffeeSacks] Norman Walsh. CoffeeSacks: A Saxon API for Invisible XML, https://docs.nineml.org/current/coffeesacks/.

[Winograd83] Terry Winograd. Language as a Cognitive Process: Volume I: Syntax. Addison-Wesley, 1983.

[Zipf36] George Kingsley Zipf. The psycho-biology of language: An introduction to dynamic philology. Routledge, 1936. Reprinted, republished as eBook 2013. doi:https://doi.org/10.4324/9781315009421. Available at https://www.taylorfrancis.com/books/mono/10.4324/9781315009421/psycho-biology-language-george-kingsley-zipf.

[Scontras23] Gregory Scontras. "Adjective Ordering Across Languages". Annual Review of Linguistics 2023;9:357-76. doi:https://doi.org/10.1146/annurev-linguistics-030521-041835.



[1] Although the SVG specification says the operations are applied in the order provided, that is not the case at all. The specification says:

<g transform="translate(-10,-20) scale(2) rotate(45) translate(5,10)">
  <!-- graphics elements go here -->
</g>
is functionally equivalent to:
<g transform="translate(-10,-20)">
  <g transform="scale(2)">
    <g transform="rotate(45)">
      <g transform="translate(5,10)">
        <!-- graphics elements go here -->
      </g>
    </g>
  </g>
</g>
But that means the first operation applied to the graphics elements is translate(5,10), not translate(-10,-20). So: right to left.

[2] I am tickled by the idea that there is an unwritten ordering rule here. m_pszThing is OK, but *pszm_Thing and *m_szpThing are not.

[3] The standard ordering is (at least by some accounts): opinion size age shape color origin material purpose noun. Thre is good evidence this is even a linguistic universal, see Scontras23.

Author's keywords for this paper:
InvisibleXML; Linguistic analysis; API design; Conlangs

Mary Holstege

Mary Holstege spent decades developing software in Silicon Valley, in and around markup technologies and information extraction. She has most recently been pursuing artistic endeavours. She holds a Ph.D. from Stanford University in Computer Science, and has a long history of throwing concepts from linguistics at computing problems.