Note: Acknowledgments

This paper describes research being submitted as part of a PhD in the Department of Applied Psychology, UCC (Human Factors Research Group).

Background

Writers and editors have until recently used a wide range of editing software. In the Humanities, document-based collaboration is relatively uncommon [Lariviere2006], so authors were less constrained by compatibility issues; whereas within the IT and scientific communities, the range has been smaller, partly because of the need to use specialist notations, and partly because of the need to remain compatible with co-authors [Anghelache2004]. In the wider world of writing outside research and academic institutions the difference was wider, because writers do not typically form homogeneous categories. Not all such software was necessarily ideal for the purpose, and much of it has fallen by the wayside in the path of technological advance; particularly in the face of the predominance of a single operating system and a single wordprocessor.

The onus of interpretation, customization, and rendering of publishable material has customarily been the responsibility of the publisher or other intermediary, who expends large sums on specialist labor for this purpose. However, in some fields, publishers have been asking authors for camera-ready copy for over 20 years to minimize costs, and the effort involved and the quality of these submitted documents has been a cause for complaint on both sides [Luey2007]. Businesses, database publishers, libraries, search-engine optimizers, and printers have similar concerns: the quality of the documents (in source or camera-ready form) is often insufficient for meaningful capture, re-use, or formatting [Williams1995]. The causes are many, and have been well-established for many years, but among them is a lack of suitable software and lack of author education [Denning1986, Heck1993].

As we showed in [Flynn2006], there is no lack of software as such, only a lack of suitable software — for some value of suitable described by the respondents. In that report, we discussed three principal findings:

  1. In a study of editing software, we found that all XML editors are basically the same XML editor: the facilities provided are virtually identical, and the differences are in the interface to these facilities, such as the depth of menu traversal and the naming of actions, rather than in the facilities themselves. The same is nearly as true of editors for LaTeX, but as the markup can be changed arbitrarily by the author, the manipulations required of the editing software are not mandated or implied by the system as they are with XML. Some specific deficiencies were noted which inform the adaptations made in section “The interface is the product”.

  2. A survey of expert practitioners in the field of markup-directed authoring and editing (taken to obtain a baseline of recognized problems), we found that the principal criterion for the recommendation of software was familiarity or acceptability to the user, rather than applicability to the tasks, largely because of the difficulties in [re-]training users to an interface seen as less suitable for non-experts.

  3. An analysis of requests to the principal discussion forums on XML and LaTeX asking for recommendations, suggestions, or advice on the selection of editing software was used to estimate the requirements of users. The four most frequently-cited criteria were Cost (free or Open Source software), a WYSIWYG interface, Ease of Use, and Simplicity — over and above any structure-related or markup-related facilities.

The investigations into software and user requirements (items 1 and 3 above) were updated to 2008, measuring more recent applications and requests, but this showed that the original findings still hold. An additional inquiry was then undertaken to resolve the software facilities and the user requirements with actual current practice (section “How authors and editors use their software”) and to build a model of a user interface to test the findings (section “The interface is the product”).

How authors and editors use their software

The survey of users was constructed to find out what software the authors and editors used and how they used it. The survey was piloted for a week in April 2006 and the revised version made available online between February and April 2008. It used the phpESP web survey package, and was publicized via the online forums to which it was addressed (the comp.text.tex, comp.text.xml, and comp.text.sgml Usenet newsgroups, and the XML-L, TeXhax, and LaTeX (Google) mailing lists).

There were 62 valid responses. All respondents were guaranteed anonymity. An estimate of the population (those who might have seen the announcements of the survey) is difficult to make: the membership of the mailing lists was approximately 700; but the readership of the Usenet newsgroups is not knowable, and may extend to many thousands. The responses therefore represent a small interested sample, and there will have been many readers whose interests in XML were in its use for data representation, not structured documents.

Several attempts were made to widen the scope and seek the interest of publishers and writers' organisations in participating, but all were unsuccessful. The survey does not therefore represent the interests of the generality of writers but concentrates on identifying how users of structured editing software use their interfaces. Further work would be required to extend the reach of this enquiry into other fields.

The questions asked the respondents to describe their way of performing a set of actions, which were obtained from the requirements expressed in the updated analysis of user requests. The available responses in each case were presented as a multiple-choice list which was constructed from the updated survey of editing software to represent the known affordances (ways of doing things which are more or less obvious to the user: [Gibson1979]). An Other category was also provided, but rarely used.

Figure 1: User Survey: background variables

Number of subjects by industry and occupation by years of experience and multiple responses to operating system

A preliminary set of background questions was included to see if the responses varied by experience (years), operating system, occupation, or document types most commonly used. Unfortunately there was insufficient variability among the background data for any such effects to be detected, possibly due to the nature of an unavoidably self-selected sample (Figure 1).

Most respondents worked principally with text documents (77%); other document classes were in the single digit percentages. XML and HTML accounted for 44% of usage of markup systems; LaTeX, Word, and OpenOffice were roughly equal at 13–14% each. Wiki experience was surprisingly high at 8% but all others (including SGML) were at 2% or below (see Figure 2).

Figure 2: User Survey: Markup types used

Percentages of all multiple responses

The editors most respondents had experience of were oXygen (24%) followed by Word (13%), Emacs (11%), and vi (7%) [both used for both XML and LaTeX]. Other LaTeX editors accounted for another 12%. The Arbortext editor rated 6% along with OpenOffice, but there was a long tail of other products.

Results

The questions used a multiple-choice answer format because respondents were expected to use (or have used) many different systems. This enabled the survey to reflect a much wider range of behavior than would otherwise have been the case. In the following list, percentages are therefore of all responses to the question, not of the number of individuals.

Creating a new document

The method of creating a new document was divided approximately equally between creating an empty file (31%), emptying an existing document (29%) and using the New Document menu item (28%). Most of the remainder did not find the menu system useful for creating new documents as it was faster to do it by hand.

Adding standard metadata

To add metadata, 34% of responses used markup insertion menus; 28% used fill-in-the-blanks customizations; and 14% typed the markup manually. Most of the rest used preset values or the metadata was determined elsewhere.

Starting a new sectional division

Nearly two-thirds of new sections were started by positioning the cursor manually and typing or otherwise inserting the markup (63%). Style menus were used in 14% of responses and splitting an existing section in 11%. Only 5% of responses used a New section menu item.

Styling a document

Nearly half the responses said styling was automated via a stylesheet (47%). The use of a style menu and manual styling were about equal at 15–16%, and for most of the others it was done by a production team. Only 2% of responses mentioned using manual methods like toolbar buttons.

Moving element content

Moving blocks of text around was overwhelmingly done by cursor highlighting and cut-and-paste. Only 15% used a structure window to select the material. However, 11% said they had to reorganize the markup manually after a move (to promote or demote sectioning).

Adding new elements

To add new block-level material (elements in element content), 35% of responses mentioned manual insertion by keyboard shortcuts or typing; 26% via toolbar buttons; and 21% via the Insert menu. Style-driven addition was listed by 7% and there was another 7% of Other which were application-specific.

Linking and cross-referencing

For linking items (cross-references, footnotes, citations, and hyperlinks; which are largely mixed-content insertions), 35% again specified keyboard shortcuts or typing, and 23% via toolbar buttons. 21% mentioned having to create the ID target before they could add an IDREF link to it, but 16% used a menu-and-dialog mechanism.

Viewing the formatted result

Fully 40% of responses did not preview a document in development, and relied on the structuring of the markup to ensure it would be formatted correctly. 14% felt that the synchronous typographic display was adequate as a check, another 14% used toolbar button to show a typeset preview window, and yet another 14% used a browser preview. 11% ran a continuous synchronous previewer and 7% relied on a separate production team.

General approach to markup

On the general question of the respondents' approach to editing, roughly equal numbers (37–38%) were comfortable with a system that allowed them to specify markup without prescription, and with systems than encouraged but did not enforce a specific way of marking up (for example, optional structural-based styling in wordprocessors). However, 20% did prefer a prescriptive system.

Respondents were also asked in open-ended questions about what features they would recommend to others looking for software; what features they found best and worst in their software; and for any features they would like to see added.

Advice to others

The most important feature was seen as robust compliance with standards (15%). 7% recommended customizability, and 6% felt integration with other systems was important. Level at 4% were Avoid WYSIWYG, Avoid Wordprocessors, and Keep it simple (the converse, Use WYSIWYG rated 3%). Again there was a very long tail of other factors, but without significant distinction.

Most useful features

For usefulness, keyboard shortcuts rated the highest at 14% of mentions. Regular Expression searches followed at 11%, and Integration and Validation features at 8% each. A cluster of editorial functions came next at 5–6% each: context-sensitive or structure-sensitive editing; spell-checking, grammar-checking, and thesaurus; autocompletion; and WYSIWG display. At a lower level (3%) were colored editing, adaptability and customizability, and the quality of formatting, DTD/Schema handling, and cross-referencing. Price and multi-platform availability were not significant at 2%.

Worst features

By contrast, the worst features of editors were led by automated or pre-emptive insertion problems at 20% — wrong types, interference with formatting, refusal to mark up as instructed, or insistence on adding incorrect markup. Validation errors (faulty editors, not faulty documents) followed at 11%, and WYSIWYG problems at 9%. Quirkiness or obtuseness of the interface were rated at 7% and 5% respectively (failure to follow established patterns), and matters related to styling and formatting also at 5%. Poor documentation, the need for manual intervention, lack of Unicode support, program stability, and support for different file formats all rated 4%, and there were others in a low long tail.

Failure to fulfil a task

When asked if they had ever failed to find out how to do something that an editor was actually capable of, 35% mentioned the difficulty of finding items in documentation, and 26% mentioned problems in getting an editor's special feature (one of its unique selling points) to work. 13% mentioned trying to overcome unnecessary complexity, and 6% felt that such failure was down to lack of awareness of a product's capabilities. The tail included mathematical features, searching, validation, and WYSIWYG problems.

Wish list

The opportunity to add to a Wish list led the Other category to account for 27% of responses. Heading the remainder were improved documentation (20%), better WYSIWYG (17%), and better interfaces (13%) and menu systems (10%). The addition of Regular Expressions rated 7% and better access to styling also 7%.

Discussion

In the open-ended questions, and in the Other areas of each question, users were able to elaborate on their responses. In these, there was sometimes extensive and persuasive argument both for and against the exposure of markup, the limitation of structural control, the adaptability of editing systems (including DTDs and Schemas), and the conflict between how a writer perceives interaction with a document and how the creator of the editing system perceives it. These views — necessarily one-sided, as they come from long-term authors with technical understanding, rather than from non-technical writers or newcomers (see Figure 1) — illustrate an important point about structure which has not been widely considered at a technical level.

While it is accepted wisdom that structure is A Good Thing in all writing (and this has become an article of faith in markup theory), there is a difference between what markup experts mean by structure and what writers understand by it. Both parties accept that there is a framework underlying all formal documents, usually in the conventional part-chapter-section-subsection hierarchy, with other components adduced where needed (principally figures, tables, lists, and their derivatives). The differences appear to lie in the perception of the relationship of these elements to each other.

The classical theory, derived from computer science and graph theory, is that the document is a hierarchical tree (actually inverted: a root-system) and that all necessary actions can be seen in terms of navigation around the tree, and of insertion into and withdrawal from the the nodes which form the branches and leaves.

The conventional writer, however — and we expressly exclude the markup expert, as well as the experienced technical authors who responded to the survey — is by repute probably only marginally aware of this tree; but we have been unable to measure this at present. In this view, the document is seen as a continuous linear narrative, broken into successive divisions along semantic lines, and interspersed with explanatory material in the form of figures, tables, lists, and their derivatives. From inspection, this appears to hold true whether it is a sales report, a novel, a textbook, or an academic paper. The terminology used is therefore also different: inserting a node into the tree has meaning for the document engineer who designs the document type or the formatting engine, but is meaningless for the writer, who thinks in terms of new chapter or add a paragraph.[2]

This may explain to a considerable extent why the anything, anywhere document model in commonly-used wordprocessors has become so pervasive: it is virtually impossible expressly to allow an object to occur only in a specific place, or to forbid one from occurring at any point. The interface to such models has become widespread precisely because it allows this latitude, regardless of whether it makes structural sense or not, and because such interfaces are marketed for general-purpose, ad hoc, and trivial use, as well as for complex or sophisticated use. This is despite the result that in terms of formal structure, all wordprocessor documents are in effect a simple series of paragraphs one level deep (with a small exception for those that group list items in a container or provide containment at the mixed-content level).[3]

It would therefore appear that the lack of adoption of structured-editing interfaces could be due to a lack of understanding by authors of the tree model, or to a sense that it constrains them unreasonably during the writing process. But the existence of the tree, and its supervention in the interface, are artefacts of the way in which editing software has been written, and reflections of the preoccupations of the designers and programmers. This is made plain by the fact that the interface of structured editors implements the tree, rather than implementing a model of the document with which the author is more familiar. The use of the synchronous typographic interface (popularly, if erroneously, known as WYSIWYG) goes some way towards hiding the technicalities of tree-based editing, but our objective here is to investigate the extent to which it is possible to present writers with a model of the document which matches their expectations rather than those of the document engineer or programmer.

Taking this view, it is possible that an interface which provides the existing markup facilities (from a document engineering point of view) but replaces the engineering-oriented or technology-oriented approach with one more closely matched to the users' expectations, would stand a better chance of acceptance among authors. While this has been attempted in some recent products, it appears to have addressed specific individual demands rather than the general principle.

The interface is the product

The development of the graphical user interface, common support libraries, dynamic data exchange, object linking and embedding, context-awareness, and many other related technologies, has led to the frequent blurring of the distinction between applications for the user. Sending an email can now invoke the default wordprocessor as the editor; clicking on a hypertext link in a document will open a web browser; following a link in a browser will open the (usually) appropriate application for the type of file; and a table in a document could be provided by an embedded spreadsheet object.

The commonality of the interface framework (the position of the menus, arrangement of the toolbar, and availability of the other affordances) increases software reusability and makes it easier for the user to carry across skills from one application to another; but it also leads users into a state of unawareness of exactly which application is active at any one time. This also provides one of the building-blocks for the development of interface components which are generically grouped under the banner of Web 2.0, which attempts to imbue all visible objects with the status of an affordance.

A side-effect of this is that a large number of applications, even across platforms, share an increasingly common interface framework, and are increasingly expected by the users to provide the same affordances. User tolerance for differences based on platform, vendor, or application appears to be shrinking, such that a new product would have to offer some very radically new and valuable feature indeed for it to justify breaking the conceptual mold expected by the user.

Taking into account the expectations of users found in the survey above, there is a growing sense that the interface is the product, and the product is the interface, regardless of the technologies employed underneath. The structure-directed document editing model, which requires a foreknowledge or awareness of the underlying hierarchical document model, may prove to be unsatisfactory in the light of this approach.

Building on the information gathered in the surveys it was possible to construct a list of operations or actions (keystrokes, menu items, toolbar buttons) which were seen as problematic. This meant either that they were to be handled specially or even avoided because of their meaning or ambiguity (in the opinion of the expert practitioners); or that they were opaque to the user because of terminology, placement, expectation, or effect (in the opinion of the users).

From the requirements of users in the survey of requests to the forums, editing software is required by most users to be WYSIWYG; that is, to employ a synchronous typographic interface with no markup visible to the user. Whether or not an editor allows another form of access to the markup (tokenized, raw text, breadcrumb, or marching display), is not relevant for the present purpose: this is something the software creator can choose to do or not to do.

In all cases it was seen as a priority that the behavior of the interface should be what the user expects. Where in some cases this becomes context-dependent, it was regarded as essential that the behavior should not be the simple binary-strict IR or CS refusal on the grounds that you can't do that here. This was cited on numerous occasions both in the surveys and in related discussions as being The Wrong Thing, especially when the user's action was seen as perfectly reasonable, but simply happened to take place at a time when the cursor position indicated otherwise.

In all cases discussed, it was seen as important to avoid asking the user a question in order to determine what is The Right Thing unless absolutely essential. Additional interface features which learn from past behavior, and which allow preferences to be set where there might be ambiguity, were considered to be outside the scope of this model. While these have been implemented in some systems, the present author is unaware of any specifically related to structural editing, and this would be an important area for future work.

It cannot be emphasized too strongly that users, and especially intending users, vote with their feet when judging an application by its interface. In the absence of compelling direction from elsewhere in an organization, and where products are essentially comparable in function, an interface's appearance as well as its apparent usability are regarded as actually being the product.

By contrast, when functions are widely disparate, and the interfaces are roughly comparable, the functions may become the product. As we saw earlier, some interfaces fail to afford features that do actually exist in the product, and this may provide a third effect on the perceived usability.

In all three cases the quality, behavior, performance, and other attributes of the underlying engines and routines may only rarely be considered by the individual user except as part of a formal evaluation process, and may even then be dismissed in favor of the specific attractions of a particular interface. The importance of accurate interface usability testing before product release cannot therefore be ignored: while the market will always have the final say, releasing an untested interface is likely to be counterproductive.

The following changes to the interaction are derived from the findings of the four enquiries (experts' survey, software analysis, user requirements study, and user survey), and will be subject to testing in the final phase (see section “Testing”).

Keypresses

The Enter (Return) key

In most synchronous typographic wordprocessor environments the default action is to end the current paragraph and start a new one (the older conflation of new line and new paragraph has mostly disappeared from wordprocessors but it still present in the less capable web editors). In the case of list items, Enter starts a new item rather than a new paragraph within the same item.

The behavior in an environment like a list raises the question of how to exit the list environment — how to revert from list-item creation to normal paragraphs — when the markup is invisible. This is critically relevant where the system is unable to allow the placement of the cursor beyond the end of the last environment because no markup is visible.

The next-paragraph behavior can usually be modified in a stylesheet, so that a given paragraph style (for example, Title) can be programmed to create a new paragraph of another style (eg Author) when Enter is pressed. In tabular matter, it may navigate down a cell, or create a new empty row. In some SGML/XML editors investigated, however, it caused a system beep or the insertion of white-space in mixed content.

Because of its history, there is an expected down (linefeed) action associated with the key, and the first two examples above conform to this (new paragraph; new item), as does the stylesheet-directed creation of a specific following style, and this is the defined behavior.

The problem of terminating a list or similar second-level container was partly solved in Emacs psgml-mode, STiLO, and some other editors by detecting a repeated Enter or split-element instruction (C-c RET in Emacs) with no intervening keystroke, and interpreting this as signaling a demand to exit to the next level up in the hierarchy. In STiLO, a third and subsequent presses cycled through all element types available at that level. While this kind of complex behavior is very useful to the expert, it is not easily guessable, and is not obvious to the non-expert. Ctrl-Enter was adopted for this exit container action on the grounds that it is already familiar in the sense of a hard return, and with the possibility that this should be configurable by the user, perhaps via a beginner/expert mode. There appears to be no suitable alternative paradigm from online editing (wikis, blogs, IM) which could be adopted.

The TAB key

Informally, many experts would concur in banning this key altogether by disabling it. Its typewriter-style use to align text with locally-dependent locations across the user's window is a good example of a visual-only instantiation which is not stable.

In practice it appears to have two valid uses, given its association with the forward direction, especially in tabular matter.

One use is to navigate forward linearly through markup; that is, from one element or text node to the next in mixed content, identifying its location in a telltale or highlight (this might solve the problem of cursor placement beyond the last text node referred to above); and from one element to the the next in serial order in element content, in effect performing a width-first traverse.

The other use is as an Insert Table key when outside a table, moving to the next available location where a table make sense; and it would revert to the traditional spreadsheet-style cell-to-cell traverse when inside a table. Both uses were designed to be tested.

The spacebar

Apart from inserting spaces in character data content, there appears to be no other legitimate use for the key in a structured editing environment. Its use as a pager key in Unix-based systems and its adoption by web browsers for a similar purpose, as well as its use as a button or link selector, is be avoided in the current context except for accessibility functions when using the menus and toolbars.

Backspace and Delete

Backward and forward deletion in character data content would operate as expected. When adjacent to a markup boundary, however, it seems reasonable that deletion should continue in the same direction by jumping linearly to the next point where character data exists (if any; attribute values excluded), possibly accompanied by a transient audio or visual warning.

Another possibility is that when all character data content has been deleted from an element, and all descendant elements are similarly empty, an additional press of one or other of these keys should remove the containing element itself. This would conform to the expectation of deletion associated with both keys, but requires separate testing as it may or may not conform to the user's expectations when content has already been deleted, as the user will be unaware of the existence of any empty markup structure when there is no character data present.

Menu items

The NEW menu item

Many writers on interface usability deprecate the use of nouns and adjectives on toolbar and menu labels, and insist that using verbs or attributes allows greater comprehension (the canonical example being [Apple2008]). In many cases they are right, but in the case of software for writing, the terms commonly used (in English) include phrases such as new chapter, new paragraph, or new section, and these are so prevalent a way of expressing the action that they justify being collected under a menu or toolbar button labelled New.

The first encounter with this is already familiar in many editors as New Document, which allows selection from a set of precompiled DTDs or Schemas. The user indications in the survey were that such a set needs to be very much wider, and must allow a much easier method of adding new document types. (Although that activity is outside the scope of this study, it does have implications for document type and stylesheet designers and for the introduction of a robust means of element type and attribute hint documentation.)

The use of Insert (which we discussed earlier), or Surround/Enclose, are always restricted to the element types available at the current cursor location (which may be indeterminable by the user when no markup is visible). By contrast, a selection from a New menu moves to the next available location where the selected item can be inserted, if the current location precludes it. If the user asks for a new chapter, and their cursor is currently in the middle of an acronym, they do not mean literally insert the new chapter markup there, or recursively split elements until a valid insertion point is reached; they mean go to the next place where a chapter can start, and start it there.

When requesting the insertion of inline markup in mixed content, however, Add may be convenient semantic sugar for New (as in add quote, add emphasis). The same principle of next available location would be honored, as such markup can usually occur arbitrarily in mixed content.

As a corollary to this principle, where a new element has required element content, all required element types must be added, and the focus then returned to the first location of character data. Where there is a required choice, that must be presented to the user (one of the unavoidable occasions, and perhaps a suitable opportunity for the implementation of the first mode of TAB key operation explained above).

Formatting controls

Given that a synchronous typographical editor operating with a stylesheet would not normally have any use for the B, I, and U buttons, nor for the typeface and font-size dropdowns, it is tempting to abolish them completely except when in style-creation mode.

However, as the user expects to be able to control formatting from the menu and toolbars, the B, I, and U buttons should operate a drop-down of all the available markup which uses those styles in the current stylesheet. For example, as we have pointed out elsewhere, there are at least eight reasons[4] why an author or editor might want to use italics,

  • foreign words

  • scientific names

  • emphasis

  • titles of documents

  • names of products

  • mathematical variables

  • headings

  • decoration

and probably as many again for bold and underlining combined [Flynn2002].

By the same token, the typeface dropdown (restricted to those faces in use by the stylesheet) can be used to select from those elements currently employing those faces; and the font-size dropdown to select those employing those sizes. The effect for the user is identical to the existing usage, and requires no additional mouse-click, only a longer dwell time and a move to select the right usage.

A similar argument can be made in favor of other visual selectors such as color. In stylesheet-editing mode, if one is provided, the buttons and dropdowns may revert to conventional usage to allow new styles to be constructed or existing ones to be modified.

The toolbar

Many remaining items on a conventional toolbar can to a large extent be replaced by markup-oriented controls when working with a stylesheet, using the principles given above.

Toolbar items with an application in markup control, such as those for use with tabular setting, can of course be retained largely unchanged, but by the same token they must disappear from the toolbar when the DTD or Schema has no tabular elements: a corollary of providing the user with the best affordances possible is that inapplicable ones should be eliminated.

The non-markup document controls such as Save, Open, and Print are of course retained in their normal form.

Additional buttons for cross-reference management, citation, indexing, and other apparatus common in structured formal documents are added where the DTD or Schema provides for such facilities. These are already familiar to many users from reference management software.

Generic tools such as spellcheckers, thesauruses, and grammar-checkers remain unaffected, but they need to be relevant and up-to-date: a number of applications tested failed to include common technical terms like filetype as well as recent everyday words like blog and wiki.

Other

Referencing

For normal cross-references (assuming the ID/IDREF mechanism is used), adding a reference to an existing target is non-problematic, requiring only a pop-up of available targets, or acceptance of a scroll to the target and a click on it). An attempt to add a reference to a non-existent target must create a placeholder for the point of reference, and then require the user to identify the target, completing the resolution when the target is established. In both cases, the stylesheet must know the correct generated text to add at the point of reference, if any, either based on the element type of the target (table number, section number), or as a page number. In all cases, moving the target ID to another element will update all references to it.

For bibliographic references, the stylesheet must contain sufficient information for the correct formatting style (or choice) according to the conventions of the discipline. A similar behavior to the normal cross-reference action can be assumed when the reference entries are embedded in the document (as is possible with DocBook or LaTeX, for example), but this can be pre-empted by dragging and dropping a reference from an external citation database, either maintained locally like Zotero, Endnote, or BIBTeX, or from suitable data in a browser page on a journal or reference database site; with the ID resolution being satisfied by the inclusion of the referenced item in a suitable format at the end of the document.

Mathematics

The visual control of mathematics poses special problems which have been addressed in several models developed by software writers and vendors (Euromath, Arbortext, LyX, Scientific Word, and others), and is not considered here.

Unstructured editing

Several respondents to the surveys mentioned the need for systems which deduce structure while the author writes without structural controls; for systems which can open documents with broken structure (that is, badly-formed or invalid documents) in order to allow them to be mended; and for systems which allow incomplete but otherwise well-formed or valid documents to be saved for later completion. While these are unquestionably still needed [Birnbaum1997], and mechanisms for their instantiation have been available for many years [Shafer1995], they are outside the scope of this research.

External files

The use of drag-and-drop is an essential interface component for the inclusion of images, real-time updates, file objects, and other linking actions (like the bibliographic citations mentioned earlier), although traditional attribute entry of filenames and URIs must remain accessible. The embedding of local (file:///) URIs is deprecated for reasons of non-portability, but no viable solution is apparent for standalone usage without widespread adoption of a catalog method (below). The embedding of non-standard methods such as links to OLE objects and local email repositories is a particular difficulty.

A particular demand was seen for the management of external entities, both parsed and unparsed, as this was given as a deficiency in many editors. The use of XML Catalogs is regrettably under-developed.

Character data

It ought to be unnecessary to mention explicitly, but all visible (printable) keyboard characters — indeed all of the Unicode repertoire — must be accepted without error. With markup hidden, there can be no excuse for markup characters entered from the keyboard being interpreted as markup characters.

Where letters or symbols from outside the base character repertoire of the document are entered, editors for systems which require additional facilities to handle them (such as LaTeX) must automatically add the relevant modules (packages) to the Preamble (an approximate equivalent to the Internal Subset of an XML document).

Editing

The cut/copy/paste actions applied to character data in text nodes behave as normal. The three-button equivalent mouse actions common in some systems must remain available. Embedded whole-element markup in mixed content is cut/copied/pasted with any marked surrounding character data, but will silently disappear if pasted into a location where that markup would be invalid (see the rules governing Target Markup Adoption below).

The paradigm of clicking on the start-tag to mark the whole of an element is inapplicable when markup is invisible, and a tree or other diagrammatic representation of the document in a side-pane may be confusing for the non-expert, but an equivalent style-oriented margin similar to Word's allows whole-element selection in element content, as does the three-click selection in Mac OS X.

Cutting (or copying) whole elements in element content and pasting them elsewhere is subject to the rules of the DTD/Schema in use. If the user attempts to paste the material into a location where the markup would not be permitted (into mixed content, for example), the markup in the clipboard content is removed down to the mixed content level, and the result pasted as mixed content. Pasting whole-element material from element content into element content at a higher or lower level automatically promotes or demotes the container of the clipboard content to a suitable level to be allowed.

Highlighting across markup boundaries copies the marked character data and any embedded whole-element markup. As mentioned above, cut/copy and paste then work on the text nodes in mixed content and any whole element nodes included in the selection. A principle which we term Target Markup Adoption determines that pasting fragmentary mixed content adopts the style of the target container and not the source style, whereas pasting whole elements (in element content) retains the internal consistency of styling, subject to any inheritance or disinheritance at the target location. This principle is already in partial use in some embedded XML editors designed for web applications.

An attempt to apply (inline) styling to marked text across element boundaries will surround any text with the appropriate markup where permitted, but leave text unmarked in elements where the relevant subelements cannot be applied.

Testing

Implementing this in program code would, in effect, mean rewriting a large part of the interface of an existing editor, or writing an entire new one from scratch. As this is beyond the scope of the research, the Paper Prototyping method of testing is being used [Snyder2003].

This involves preparing sequences of screenshots or facsimiles on sheets of paper, and giving test subjects tasks which they carry out by indicating on the sheets what action they would take. The tester then replaces the sheet with the one which shows the result of that action, and the process is repeated. A record of the sequences is kept for analysis. The use of Personas (constructed psychological profiles of canonical users) enables experienced test subjects to match responses to those of the target audience. Testing will be carried out in the Usability Laboratory of the Human Factors Research Group at University College Cork.

The actions and behaviors are specified as sequences of keystrokes or mouse movements, and prepared for testing using generated simulations of screenshots (see Figure 3.

Figure 3: Paper prototyping: simulated screenshot of model editor

Use of the New menu in mixed content

Duplicates of each screen using the existing interfaces of OpenOffice, Word, or oXygen as appropriate will be used for a control sample (seeFigure 4.

Figure 4: Paper prototyping: control screenshot using OpenOffice

Inserting a paragraph break

Testing will be conducted in the autumn of 2009.

References

[Anghelache2004] Angelache, Romeo: The Meaning of Scientific Documents. In New Developments in Electronic Publishing (AMS/SMM Special Session, Houston, May 2004), ECM4 Satellite Conference, Stockholm, June 2004 pp. 5–7.

[Apple2008] Apple Corp, Menus: Naming Menu Items. In Human Interface Guidelines for OS X. Part III, at http://developer.apple.com/documentation/UserExperience/Conceptual/AppleHIGuidelines/XHIGMenus/XHIGMenus.html#//apple_ref/doc/uid/TP30000356-TPXREF117 (9 June 2008, retrieved 24 April 2009).

[Birnbaum1997] Birnbaum, David: In Defense of Invalid SGML. In Proc. Annual Joint Meeting of the Association for Computing in the Humanities and the Association for Literary and Linguistic Computing, Kingston, Ontario (1997)

[Denning1986] Denning, Peter J: Electronic Publishing. Technical Report 86:21, NASA Ames Research (Oct 1986)

[Flynn2002] Flynn, Peter: Formatting Information, TUGboat single issue 23:2 (2002), pp115–250

[Flynn2006] Flynn, Peter: If XML is so easy, how come it's so hard?: The usability of editing software for structured documents. Extreme Markup Conference 2006, Montréal, QC (Aug 2006)

[Gibson1979] Gibson, James J: The Ecological Approach to Visual Perception. Houghton Mifflin, Boston (1979), p.36 et seq.

[Heck1993] Heck, André: Electronic Publishing and Advanced Information Retrieval. In Astronomical Data Analysis Software and Systems II, 52 (1993)

[Lariviere2006] Larivière, Vincent; Gingras, Yves; and Archambault Éric: Canadian collaboration networks: A comparative analysis of the natural sciences, social sciences and the humanities. In Scientometrics 68:3 (Dec 2006) pp.519–533. doi:https://doi.org/10.1007/s11192-006-0127-8.

[Lombardi1983] Lombardi, John V: Computer Literacy: The Basic Concepts and Language, Indiana University Press (1983) 0253314011

[Luey2007] Luey, Beth: The education of academic authors. In Publishing Research Quarterly, 3:2 (June 1987) pp4–10. doi:https://doi.org/10.1007/BF02683607.

[Shafer1995] Shafer, Keith: Creating DTDs via the GB-Engine and Fred. OCLC Online Computer Library Center, Inc., Dublin, Ohio (1995)

[Snyder2003] Snyder, Carolyn: Paper Prototyping: The Fast and Easy Way to Design and Refine User Interfaces. Morgan Kaufmann, San Francisco (2003) 1558608702

[Williams1995] Williams, Martha: Database publishing statistics. In Publishing Research Quarterly, 11:3 (Sept 1995). doi:https://doi.org/10.1007/BF02680442.



[1] Or, indeed, LaTeX. Or even stylesheets for Word or OpenOffice.

[2] In the case of narrative or dramatic literature, structure has entirely other meanings, and concerns plot revelation, narrative pace, character development, and other factors completely unrelated to our use of the term.

[3] Containment has its own perils: the author has an example of an OpenOffice document, a book of a dozen chapters by different authors, in which the editor unwittingly pasted chapters two to twelve into the bounds of the last endnote at the end of the first chapter. The publisher asked for some endnotes to be subsumed into the text, and when the editor deleted the last endnote of the first chapter, all the remaining chapters vanished from the document.

[4] To which might validly be added illustrative for authors of manuals on typography.

Author's keywords for this paper:
documents; editing; interface; latex; software; structure; usability; xml

Peter Flynn

Department of Applied Psychology

University College Cork

Peter manages the academic advisory and electronic publishing unit at University College Cork, Ireland, and also runs text management consultancy, Silmaril. He was a member of the W3C's XML Special Interest Group and a member of the IETF's Working Group on HTML. He is maintainer of the XML FAQ and author of The World-Wide Web Handbook (ITCP, 1995) and Understanding SGML and XML Tools (Kluwer, 1998). He is completing a belated PhD in software usability, and in his copious spare time he surfs, cooks, and listens to early music.