How to cite this paper

Flynn, Peter. “Markup to generate markup to generate markup: Using XML to create and maintain LaTeX packages and classes.” Presented at Balisage: The Markup Conference 2013, Montréal, Canada, August 6 - 9, 2013. In Proceedings of Balisage: The Markup Conference 2013. Balisage Series on Markup Technologies, vol. 10 (2013). https://doi.org/10.4242/BalisageVol10.Flynn01.

Balisage: The Markup Conference 2013
August 6 - 9, 2013

Balisage Paper: Markup to generate markup to generate markup

Using XML to create and maintain LaTeX packages and classes

Peter Flynn

Peter Flynn runs the Electronic Publishing Group in IT Services at University College Cork. He is a graduate of the London College of Printing and the University of Westminster. He worked for the Printing and Publishing Industry Training Board and for United Information Services as IT consultant before joining UCC as Project Manager for academic and research computing. In 1990 he installed Ireland's first Web server and since then has been concentrating on electronic publishing support. He was Secretary of the TeX Users Group, and a member of the IETF Working Group on HTML and the W3C XML SIG, and he has published books on HTML, SGML/XML, and LaTeX. Peter is editor of the XML FAQ and an irregular contributor to conferences and journals in electronic publishing and Humanities computing. He is currently completing a part-time PhD in user interfaces with the Human Factors Research Group in UCC. He maintains a technical blog at http://blogs.silmaril.ie/peter

Abstract

This paper presents an experiment in using DocBook5 to mark up and maintain LaTeX classes and packages in the literate-programming style, using XSLT2 to generate the standard format of distribution files suitable for the CTAN repository. It identifies several benefits in automation and reusability of code; a number of areas where a customisation layer for DocBook would be useful; and a few unresolved restrictions that package and class authors or maintainers would need to be aware of when editing XML.

Background

Packages and Classes
Automation requirement

Implementation

Metadata

Annotated code

Options
Package specification
Modular code

User documentation

Preamble
Armoring the text
Inlines

Automation

Development from pilot to production
Markup load
Tag abuse

Conclusions

Background

The LaTeX document preparation system provides a framework of commands (markup) for the TeX typesetting program, designed to shield the writer from the need to know the internal programming required to format a document (Lamport1986, Lamport1994). It has been in widespread use in scientific, technical, and academic publishing since 1986, and more recently has experienced growth in the Humanities and in general publishing (Boggio2006, Ubuntu2012).

LaTeX relies for its extensibility on a library of over 4,000 style packages and document classes, which provide additional markup functionality, layouts, typography, and variant behaviour. The ltxdoc document class supplies features for maintaining these packages and classes in a literate programming style using interleaved code and annotation with end-user documentation in a single-file wrapper. The syntax to achieve this, however, is complex, as documentation must be shielded from interpretation as code, and vice versa.

Packages and Classes

A document class is a collection of macros providing both formatting and markup for a specific class of documents, such as the articles for a particular journal, the books by a particular publisher, the theses for a particular university, or any of over 400 other types of document. It is broadly equivalent to a DTD or Schema, although without prescription, and with formatting specifications embedded. The default document classes (report, book, article, and letter) are stylistically minimalist but provide sufficient markup for draft purposes.

A style package is a collection of macros providing a specific variant on formatting, such as hyperlinks in a PDF, the styling of footnotes or references, the use of additional typefaces, or any of over 3,600 other typographic or markup possibilities. There is no direct equivalent in the XML field, but a package can be regarded as broadly equivalent to a CSS or XSLT2 fragment, implementing a particular formatting requirement.

Document classes and packages are typically distributed as DocTeX (.dtx) files, which contain the LaTeX code implementing the features, interleaved with annotation in a literate programming manner, plus user documentation about how to use the additional markup provided (Carlisle2007). An installer (.ins) file uses LaTeX to extract the code as a class (.cls) or style package (.sty) file from the .dtx file, and LaTeX can then be run on the .dtx file directly to produce both user documentation and code annotation.^[1]

This method has proved a very reliable and compact means of distribution, but at the cost of some additional complexity in the construction of the master .dtx file:

documentation and annotation must be armored against extraction as code by prefixing each line with percent-space (%␣);
macro code must be identified for extraction by prefixing the \begin and \end commands (equivalent to start and end tags) with percent and exactly four spaces (%␣␣␣␣);
the regular comment character (%) must therefore be treated specially in some circumstances (doubled or tripled);
there are special tags (in pointy brackets!) like %<*driver> and %</driver> to identify certain sections or lines of the file that need extracting or ignoring in certain circumstances.

Against this must be set the advantages of robustness once constructed; the availability of all LaTeX facilities for writing and formatting the documentation; some added document-management features (version control, change-recording, checksumming, indexing of commands used in the code, etc), and the extensive supporting documentation (LaTeX2006, Mittelbach2004, Lehmann2011).

Automation requirement

In 2005, I undertook to create a new thesis document class in my university which would implement stricter controls on the content and sequence of front matter (title page, legal, table of contents, declaration of originality, etc), and particularly on the naming and identity of schools, departments and research centers, and the bibliographic reference format used by each. Many users had become accustomed to designing their own title page, and to the re-wording of the names of their unit to suit their own perceptions or requirements. In some cases this involved inventing entirely new department names or descriptions of their degrees, which conflicted with the university's statutory requirements. While the new class would initially only affect the title page and preliminaries of a thesis, this is exactly where the Library catalog staff look for the metadata (in the case of electronic submissions, the PDF metadata is also required to provide the same information).

The data on course names and codes, the abbreviations and full titles of degrees, and the official names of departments and centers were all available from the institutional database, but were subject to annual change, as there were complex and overlapping administrative and pedagogical requirements to be satisfied. This data needed to be converted to the parameter syntax used by LaTeX on an ongoing basis to make it usable as selectable options by users, so a more robust and programmatic solution was needed to automate the process. The data was already available in a consistent XML format, so XML and XSLT2 were obvious candidates for the task. As a long-time user of XML for documentation, I felt it would be an advantage from the maintenance point of view to use the same syntax and method for writing the documentation, and this led to the experiment in using DocBook and generating the .dtx file with XSLT2.

Beyond the title page and the settings for margins and type size, the remainder of a student's thesis document would be largely unaffected, as LaTeX's report document class and existing packages already provided all the facilities needed. However, it had become clear from local LaTeX training sessions that some requirements of thesis writing would benefit from more automation, and that better use could be made of the layout specifications, which were, and remain, relatively lax (Flynn2012), so the decision was taken to experiment with using DocBook for the whole project.

Implementation

The use of XSLT2 to generate XML from XML is standard practice, and its use to generate LaTeX from XML is also well-established. However, in this case, the resulting LaTeX (.dtx) was going to be used to generate more LaTeX code (the .cls document), which itself would generate ancillary LaTeX files (Table of Contents, Index, etc) as well as the student's thesis final PDF.

Metadata

A .dtx file is made up of a number of well-defined sections:

an initialization block;
the LaTeX Preamble for the documentation;
a character checksum table;
a change history;
an indexing control block;
the user documentation;
the annotated code;
any ancillary files to be distributed with the class or package.

The design of a DocBook document does not of course align directly with this, but there is provision in one form or another for most of it, and XSLT2 can easily vary the order of processing. The initial metadata (mostly effectivities) is stored in the book root element start-tag:

<book xml:id="uccthesis" version="1" revision="03" xml:lang="en"
  xml:base="ucc" remap="a4paper,12pt" arch="class" audience="lppl"
  condition="2009/09/24" conformance="LaTeX2e" os="all"
  security="2070" userlevel="cls" vendor="UCC" status="beta">

The xml:base specifies the ultimate destination directory within TeX's installation tree;
The remap attribute is [ab]used to hold document-class options for LaTeX so that the target document format can be switched between US Letter and ISO A4, and the base font size changed.
The audience attribute is used to select a boilerplate license document (here, the LaTeX Project Public License).
The security attribute holds a checksum which is validated during LaTeX processing, and which must be updated after changes to the code (or set to 0 to disable it).
The conformance and condition attributes hold the version and date of LaTeX required.

The info/cover element type was used to hold the document management data, principally the metadata, the lists of packages required by both the documentation and the class or package file itself; a file list for the manifest; and any setup commands for the documentation. The title, author, contact details, Abstract/Summary, and revision history are in the info container in the conventional manner.

Working from the DocTeX and ltxdoc specifications, with existing classes as examples, it was then possible to construct the .dtx initialization block as a literal result template, using the ID and version values from the book element's attributes. The preliminary LaTeX comments and the ‘driver’ block are shielded from processing by a conditional which always evaluates to false:

% \iffalse meta-comment
%
% Extracted from uccthesis.xml
[...licensing and descriptive comment...]
% \fi
% \iffalse
%<*driver>
\ProvidesFile{uccthesis.dtx}
%</driver>
%<class>\NeedsTeXFormat{LaTeX2e}[2009/09/24]
%<class>\ProvidesClass{uccthesis}[2012/12/18 v1.03 Typesetting a UCC thesis with LaTeX]
...
% \fi

Annotated code

The annotated code is stored in chapter elements in a part element with an ID of code. These can be subdivided into sections and subsections according to the modularity and complexity of the code. The annotations get output as part of the formatted documentation: the code gets extracted to the class or package file. The ltxdoc package uses LaTeX \sections as its top level, so a DocBook chapter is mapped in the XSLT to a LaTeX section, a sect1 to a \subsection and so on.

Options

The .dtx format requires any user-selectable options for the class or package to be declared and activated before any requisite style or utility packages are loaded, so the first chapter would typically contain the option code.

The large number of special-purpose definitions needed for the departmental controls in the UCC Thesis class were stored in methodsynopsis elements in external file entities per Faculty. This is probably the most blatant piece of tag abuse, but the structure seemed to offer an acceptable way to store the data transformed from the administrative system's export format:

<methodsynopsis xml:id="physio" arch="med">
  <methodname>Vancouver</methodname>
  <methodparam>
    <parameter role="department" remap="Department of">Physiology</parameter>
    <initializer>vancouver</initializer>
  </methodparam>
</methodsynopsis>

each department gets an ID value which becomes the departmental class option entered by the student (physio);
the school to which the department belongs (med) is stored in the arch attribute;
the method name becomes the printable name of the bibliographic format required (Vancouver);
the method parameters hold the type of organisational unit (department), the prefix for printing on the title page (Department of), and the actual name of the organisational unit (Physiology);
the initializer element is used for name of the BibTeX style for this discipline (vancouver).

The XSLT transforms these to package options which define the official name of the department and fix the bibliographic format in that discipline. These are output before the annotated code itself starts, as described above.

%␣␣␣␣\begin{macrocode}
\DeclareOption{physio}{%
  \department{Physiology}
  \@usebib{vancouver}{Vancouver}{}
}
%␣␣␣␣\end{macrocode}

Package specification

Classes and packages, as well as documentation, often use frequently-occurring sets of utility and style packages, with commonly-used setup commands before and after package invocation. To avoid class and package authors having to retype similar blocks of code for every class or package they create, an ancillary file prepost.xml stores an author's package preferences. The two lists of packages (for the documentation, and for the class or package itself) are therefore given in an XML structure rather than just typed in LaTeX format as code, so that preferences can be looked up and implemented. We used segmented lists in constraintdef elements in the info/cover to do this.

<info>
  <cover>
    <constraintdef xml:id="clspackages" linkend="options">
      <segmentedlist>
        <segtitle>Packages needed for this class</segtitle>
        <seglistitem>
          <seg>fix-cm</seg>
        </seglistitem>
        <seglistitem>
          <seg role="textwidth=159mm,textheight=229mm">geometry</seg>
        </seglistitem>
        <seglistitem>
          <seg>graphicx</seg>
        </seglistitem>
        [...]
      </segmentedlist>
    </constraintdef>
    [...]
  </cover>
</info>

Each seglistitem specifies a package required in the seg element. The role attribute holds any package options needed.^[2] A similar construct is used with an ID of docpackages for any packages required for the documentation.

The linkend attribute specifies the ID of a chapter or section in the annotated code after which the package loading commands are to be output.

\usepackage{fix-cm}
\usepackage[textwidth=159mm,textheight=229mm]{geometry}
\usepackage{graphicx}
...

Modular code

Code can be given in programlisting elements interspersed with para and other documentary elements of annotation. The amount of annotation and frequency of interruption is unrestricted: the ltxdoc extraction process simply stitches together all the code and outputs it; and the documentation formatting treats the code as verbatim blocks (line-numbered for convenience).

However, the literate-programming format for the uses annotation elements to define the LaTeX commands and environments being provided. The role attribute defines the class of object being annotated, and the xreflabel attribute gives its name. Each such annotation element can contain paragraphs, lists, etc, plus the programlisting code, broken into whatever granularity is needed to explain what is being done.

  <annotation role="environment" xreflabel="epigraph">
    <para>Define an environment for Epigraphs. These would normally go immediately after the
      <command>chapter</command> command. This is basically the <envar>quotation</envar>
      environment modified, but it has to allow for <emphasis>either</emphasis> manual
      <emphasis>or</emphasis> automated citation (because it may just be a phrase needing 
      no citation), whereas a normal quotation <emphasis>must</emphasis> be cited. It 
      therefore has <emphasis>two</emphasis> arguments, described below:</para>
    <remark version="0.92" revision="2011-05-31">Added Epigraphs.</remark>
    <programlisting>
\newenvironment{epigraph}[2][\relax]{%
    </programlisting>
    <para>Record the argument values now, because they are needed in the end of the 
      environment, so they have to pass across the group boundary. The compulsory 
      argument is for a &BiBTeX; citation key, so that a proper citation can be 
      formatted; the optional argument is for when a pre-formed, 
      <wordasword>full</wordasword> (actually often simpler, non-rigorous) citation
      is wanted.</para>
    <programlisting>
  \gdef\@fullcite{#1}%
  \gdef\@quotcite{#2}%
    </programlisting>
 ...
  </annotation>

The remark element is used for noting updates: these get extracted to the revision history. The annotations are output using the armored ltxdoc code; the actual lines of code from the programlisting elements are output unarmored for extraction. This results in LaTeX code in the .dtx as shown below:

% \begin{environment}{epigraph}
% Define an environment for Epigraphs. These would normally go immediate after the
% \DescribeMacro{\chapter}\verb`\chapter` command. This is basically the
% \DescribeEnv{quotation}\texttt{quotation} environment modified, but it has to 
% allow for \emph{either} manual \emph{or} automated citation (because it may 
% just be a phrase needing no citation), whereas a normal quotation \emph{must} be 
% cited. It therefore has \emph{two} arguments, described below:\par
% \changes{v0.92}{2011/05/31}{Added Epigraphs.}
%    \begin{macrocode}
\newenvironment{epigraph}[2][\relax]{%
%    \end{macrocode}
% Record the argument values now, because they are needed in the end of the environment, 
% so they have to pass across the group boundary. The compulsory argument is for a 
% \BibTeX{} citation key, so that a proper citation can be formatted; the optional 
% argument is for when a pre-formed, `full' (actually often simpler, non-rigorous) 
% citation is wanted.\par
%    \begin{macrocode}
  \gdef\@fullcite{#1}%
  \gdef\@quotcite{#2}%
%    \end{macrocode}
...
% \end{environment}

The formatted result in the documentation PDF is shown in Figure 1, where the marginal annotation of the commands being documented can be seen.

The ltxdoc package provides only two documentary environments for annotated code: macro and environment. The dox utility package has been used to provide additional environments for other declarations such as counters, classes, options, templates, etc.

User documentation

User documentation is similarly stored in a part element, this time with the ID of doc. In the .dtx file, the user documentation starts with an unarmored LaTeX Preamble where settings and packages needed for formatting the documentation are specified, followed by a self-reference to the same .dtx file in place of the actual text. This enables LaTeX to read the Preamble and then switch to armored mode to input the same document to process the armored documentation at high speed (doing it all in a single pass would entail a more computationally-intensive process).

Preamble

Using the remap attribute from the book root element shown earlier (for any changes to the ltxdoc options) we can now output the start of the documentation and add the \usepackage commands for the packages specified. These are given in exactly the same way as those for the code (above), stored in a separate constraindef element, and they use the same prepost.xml lookup mechanism for commonly-used options.

Unlike with the code, however, this mechanism is largely automated for documentation. This provides for a configurable basic set of packages (defined in prepost.xml) as well as the detection of packages required for specific formatting choices in the documentation. For example, using a compact list in the documentation (the spacing="compact" attribute on a list container) will automatically ensure that the relevant package (enumitem, in this case) is included in the .dtx file without the author needing to take any action (and removing it, should compact lists cease to be used).

%<*driver>
\documentclass[a4paper,12pt]{ltxdoc}
\usepackage[utf8x]{inputenc}
\usepackage[T1]{fontenc}
\usepackage[textwidth=159mm,textheight=229mm]{geometry}
\usepackage{graphicx}
\usepackage{fancyvrb}
[...]

Some additional ltxdoc commands are added to control the behaviour of the documentation cross-referencing and indexing. The \DocInput command then makes the .dtx file input itself as described earlier.

[...]
\EnableCrossrefs
\CodelineIndex
\RecordChanges
\begin{document}
\raggedright
\DocInput{uccthesis.dtx}
\end{document}
%</driver>

This driver block is followed by three blocks not illustrated here:

a character checksum table as a protection against file corruption in data transfer (output in a literal result template in the XSLT2 program);
a list of \changes commands for the Change History (taken from the DocBook revisionhistory and remark elements);
and a standard block of hard-coded \DoNotIndex commands to prevent ltxdoc indexing non-relevant internal LaTeX commands.

Armoring the text

After all this automated Preamble we can output the \title and \author, an Abstract or Summary, and then the chapters or sections of documentation text. These are all standard DocBook, handled with XSLT2 templates in the conventional manner, with the exception of adding the %␣ armor.

The armoring means that <sect1><title>Introduction</title>... is output as %␣\subsection{Introduction} (as noted above, the hierarchy is offset by one level to accommodate ltxdoc's default format). All text nodes are handed to a text() template which passes the content through a recursive named template filter, honoring hard-coded newlines but adding the %␣ prefix. The template also handles TeX special characters in filenames and other literals, detecting a parent::programlisting (where armoring is not required). It also removes any leading white-space after a newline (inserted by Emacs' psgml-mode's pretty-printing). The final token output is always a newline, so that we can start any element which occurs in element content with the armour.

Verbatim code in programlisting examples presented a special case: not only must the code itself not be armored, the processor must be able to escape from the armored text mode, otherwise the verbatim material itself would still contain leading % signs.

<variablelist>
  <varlistentry>
    <term><envar>dedication</envar></term>
    <listitem>
      <para>The <envar>dedication</envar>
	environment is for you to add a dedication.</para>
      <programlisting annotations="dedication" language="LaTeX">
\begin{dedication}
...
\end{dedication}
      </programlisting>
    </listitem>
  </varlistentry>
[...]

This is done by escaping the %<*ignore> tag separately with the same \iffalse...\fi method seen earlier (the same is done for the end-tag). Between them comes the unarmored verbatim content (formatted here with the listings package, which automates per-language colored pretty-printing of the code).

% \item[Dedication:] The \texttt{dedication} 
% environment is for you to add a dedication. 
% \iffalse 
%<*ignore> 
% \fi
\begin{lstlisting}[language={[LaTeX]TeX},emph={dedication}]
\begin{dedication}
 ... 
\end{dedication} 
\end{lstlisting} 
% \iffalse 
%</ignore> 
% \fi

This results in formatting like this (minus the color, and using this conference's default variablelist layout):

Dedication:
The dedication environment is for you to add a dedication.
\begin{dedication} 
 ...
\end{dedication}
		

A bibliography, if one is used, is output in a similar manner to the verbatim code mentioned above, using the %<*ignore> tags and the VerbatimOut environment from the fancyvrb package. When LaTeX is run on the .dtx file, this extracts the bibliographic content to an external (.bib) file so that it on a subsequent pass it can be reprocessed with BibTeX or biblatex to recreate its own bibliography.

Inlines

A number of elements in mixed content are used to identify terms and values for indexing. The envar element type is used to identify a LaTeX environment name; classname for a document class name, package for a package name, and option for an option.

Automation

The advantages of literate programming (Knuth1992) — modular construction, hermetic testability, debugging tools, interspersed documentation, even pretty-printing — are well known (Thompson2000) and well-criticised (static representation; lack of folding structures, version control, alternate views of variables). In itself, literate programming does not solve any specific requirement for automation (although modularity may contribute to this). In developing this method, one of the objectives was to remove as much as possible the tedious and repetitive typing that program development and documentation writing engenders.

Development from pilot to production

The original thesis document class was successfully implemented, and the XML-based system as described is used to maintain it. The 50 or so class options specifying department and degree are used to simplify and rationalize the setup for the department name, title-page layout, and style of references, while the class itself presets the rest of the formatting; see Figure 2).

Figure 2: Thesis set-up

\documentclass[history,phd]{uccthesis}
\begin{document}
\title{The Application of XML to the Lexicography 
       of Old, Middle and Early Modern Irish}
\author{Julianne Nyhan}
\qualifications{BA}
\professor{Prof Dermot Keogh}
\supervisor{Prof Donnchadh Ó Corráin}
\date{June 2005}
\maketitle
... 
\end{document}

However, that class was a pilot: the result is that this XML-based mechanism is usable for the creation and maintenance of almost any LaTeX class or package. The system is used for all the author's classes and packages, and has significantly reduced development time on a new class or package. In the development of additional classes or packages in a series or suite (such as occurs in corporate use) the reduction is greater because of the ease and reliability with which modules of code can be included (as entities or XIncludes). The reuse of imported data specifications also has an important place in industrial documentation, where sets of part numbers or known production components need to be pre-specified, and the system has now been adapted twice to use this method.

Markup load

Many of the templates in the XSLT2 program make decisions about the markup they should emit according to the content of the element type they match. As an example, a firstterm element type can be made to identify from its position if it is indeed the first occurrence, and if so, to add a bold LaTeX \index entry rather than a plain one. The careful author can add an attribute to suppress this behaviour in cases where a first or early occurrence may be used en passant.

In a more complex environment, such as a footnote or the term element of a variablelist containing code requiring a monospace font and LaTeX's verbatim formatting, the template will choose not to use LaTeX's \verb command because of its fragility inside other markup, and to use \texttt (simple monospace) instead, or even \url, according to content. This is something which would otherwise require the author to remember that certain special characters cause LaTeX problems when treated verbatim.

Cross-references which cannot be automated by LaTeX's otherwise excellent varioref package (such as references to an unnumbered list item, where by definition no reference number exists) are pre-empted in the XSLT2 code and the reference switches to the fmtcount package, which phrases a counter value as a spelled-out ordinal: see the third item in the list on p.42.

The objective in all these cases is to relieve class and package authors of the need to work manually around LaTeX's oddities and allow them to write unhindered, for example, by the need to remember that such-and-such a reference was to a table, or a figure, or a subsubsubsection, or a call-out; and to have the reference auto-adjust its semantics if the target element type gets changed.

As an example of the use of markup, the formatted annotation output (code documentation) usually requires a wider left margin than the user documentation because code fragments are identified by a marginal note showing the LaTeX command name. In order to accommodate the widest name used, a new value for the margin is calculated in the XSLT2 program, using the longest value of the various commands explained in the annotations. This ensures that an unexpectedly long command name will not extend beyond the left-hand edge of the page. This calculation, straightforward in XSLT2, would be computationally challenging in LaTeX and would need to be written to use the second pass of the document normally associated with LaTeX tables of content and cross-references. This calculation can therefore be done first, before processing the content of the part element for annotated code.

The use of XML also makes it straightforward to query the document structure for control purposes. For example, using standard command-line tools such as the LTxml toolkit provides, a list of macros and environments defined can be extracted, or a list made of the packages used:

$ lxprintf -e annotation "%s (%s)\n" @xreflabel @role uccthesis.xml | sort
ackname (macro)
acknowledgements (environment)
author (macro)
bibliography (macro)
bibname (macro)
cjk (option)
dedication (environment)
department (macro)
draft (environment)
epigraph (environment)
...
$ lxprintf -e \
'constraintdef[@xml:id="clspackages"]/segmentedlist/seglistitem' \
"%s\n" seg uccthesis.xml
inputenc
fontenc
geometry
lmodern
url
graphicx
array
calc
soul
textcomp
ucccrest
setspace
float
$

Tag abuse

We said earlier in section “Annotated code” that some element types have been used for purposes not envisaged by DocBook, and that part of this experiment was to identify what the nature of these use cases in class and package maintenance was likely to be. As there are areas of DocBook into which the present author has never had need to stray, suggestions are welcomed for element types with a better fit. A future task is to write an RNG specialist modification layer for the DocBook schema to create some additional element types to avoid the current level of abuse.

`exceptionname`	Used to hold keywords of RFC 2119:1997 (Bradner1997) for direction on requirement or optionality. Formatted as small caps.
`methodsynopsis`	Holds the structured data for the naming departments and degrees (here; extensible to other structured data).
`entry`	In a table, the attributes `wordsize`, `charoff`, `char`, and `morerows` are used to hold dimensions required for LaTeX to format a multi-row column containing a large vertical brace.
`classname`, `package`, `option`, `envar`	These are used to identify LaTeX class, package, option, and environment names or values.
`annotation`	Used as the container for modules or fragments of annotated code. In the `info/cover` element, this is used for the wording of the Notice which goes in the Preamble of the `.ins file.`
`cover`	Holds the setup specifications for packages.
`constraintdef`	Holds the structured lists of packages needed for documentation and for the class or package being written.
`procedure`	Used in the `prepost.xml` file to store the default settings for frequently-used packages with any ancillary commands needed before and after package load.
`cmdsynopsis`	Within a `constraintdef` in a `procedure/step`, holds commands which need to be ouput before (or after) a command.
`type`	In documentation, marks a span for which special typographical treatment is needed. The role attribute must be set to `font` and the remap attribute must be set to the NFSS2e three-character fontname code.

At the moment, the XSLT also generates a shell script file which can be used to build the relevant LaTeX distribution package (a specially-formed zip file). This needs to be replaced by a parameterised Makefile, using the latexmk script.

Conclusions

The experience of this experiment has been fourfold:

It is certainly possible to use XML to define and maintain LaTeX document class and package data and documentation, and to use XSLT2 to create the distribution files. In conjunction with a small shell script or Makefile and a suitable repository mechanism (eg Subversion, GIT, etc), a fairly complete process can be defined for versioning and production of LaTeX document classes and packages.
The benefits of reusability appear only when using this method for handling a number of classes or packages, where there is some re-use of commonly-occurring constructs (macros, environments, utilities, etc), or where the class or package is part of a series sharing common attributes.
It does require significant knowledge of XML and DocBook, regardless of the editor being used (it may be assumed that a class or package author is already well-skilled in the use of LaTeX).
It does save time and effort when actually writing the documentation, as there is no need to consider the various forms of escapement and armoring required by the .dtx file format, or the need to invoke particular packages when certain facilities are used.

The system has provisionally been called ClassPack, and is available on CTAN (Comprehensive TeX Archive Network) under the LaTeX Project Public License. At the moment there are substantial remnants of earlier code which need tidying up, and the mechanism for handling structured data for formal naming needs to be generalized.

References

[Lamport1986] Lamport, Leslie. LaTeX: A Document Preparation System. Addison-Wesley, 1986, 1st Ed., 0-201-15790-X. http://www.amazon.com/Latex-Document-Preparation-System-Users/dp/020115790X

[Lamport1994] Lamport, Leslie. LaTeX: A Document Preparation System. Addison-Wesley, 1994, 2nd Ed., 978-0201529838. http://www.amazon.com/LaTeX-Document-Preparation-System-2nd/dp/0201529831

[Boggio2006] Boggio-Togna, Gianfranco. Technica: Typesetting for the humanities. LaTeX package, November 2006. In CTAN, http://mirrors.ctan.org/macros/latex/contrib/technica/Technica.pdf

[Ubuntu2012] Ubuntu Core Developers. TeX Live: LaTeX support for the humanities. Debian package, June 2012. In Ubuntu repositories, http://packages.ubuntu.com/raring/texlive-humanities

[Carlisle2007] Carlisle, David. ltxdoc: Documentation support. LaTeX package, November 2007. In CTAN, http://ctan.org/pkg/ltxdoc

[Lehmann2011] Lehmann, Philipp. ltxdockit: Class for documented LaTeX macro files. LaTeX package, March 2011. In CTAN, http://ctan.org/pkg/ltxdockit

[LaTeX2006] The LaTeX3 Project. LaTeX2ε for class and package writers. LaTeX Project documentation, February 2006. In CTAN, http://mirrors.ctan.org/macros/latex/doc/clsguide.pdf

[Mittelbach2004] Mittelbach Frank; Goossens Michel; Braams, Johannes; Carlisle, David; Rowley, Chris. The LaTeX Companion. Addison-Wesley, May 2004, 2nd Ed., 978-0201362992. http://www.amazon.com/LaTeX-Companion-Techniques-Computer-Typesetting/dp/0201362996

[Bradner1997] Bradner, Scott. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, Internet Engineering Task Force, Fremont, CA, March 1997 http://www.ietf.org/rfc/rfc2119.txt

[Flynn2012] Flynn, Peter. A university thesis class: Automation and its pitfalls. Presented at TeX Users Group Conference 2012, Boston, MA, July 16–18, 2012. In TUGboat, 33:2, 2012, pp172–177. https://www.tug.org/members/TUGboat/tb33-2/tb104flynn.pdf

[Knuth1992] Knuth, Donald E. Literate Programming. Center for the Study of Language and Information, Stanford, CA (CSLI Lecture Notes, no.27) 1992, 0937073806, See http://www-cs-faculty.stanford.edu/~uno/lp.html

[Thompson2000] Thompson, David B. The Literate Programming FAQ. San Gabriel, CA, March 2000. http://www.literateprogramming.com/lpfaq.pdf

^[1] A few older packages are still distributed as raw .cls or .sty files with documentation in comments.

^[2] In review, it was suggested that reversing this and placing the package name in the role attribute and the options in the element content would be more natural. This would not be hard to change.

Lamport, Leslie. LaTeX: A Document Preparation System. Addison-Wesley, 1986, 1st Ed., 0-201-15790-X. http://www.amazon.com/Latex-Document-Preparation-System-Users/dp/020115790X

Lamport, Leslie. LaTeX: A Document Preparation System. Addison-Wesley, 1994, 2nd Ed., 978-0201529838. http://www.amazon.com/LaTeX-Document-Preparation-System-2nd/dp/0201529831

Boggio-Togna, Gianfranco. Technica: Typesetting for the humanities. LaTeX package, November 2006. In CTAN, http://mirrors.ctan.org/macros/latex/contrib/technica/Technica.pdf

Ubuntu Core Developers. TeX Live: LaTeX support for the humanities. Debian package, June 2012. In Ubuntu repositories, http://packages.ubuntu.com/raring/texlive-humanities

Carlisle, David. ltxdoc: Documentation support. LaTeX package, November 2007. In CTAN, http://ctan.org/pkg/ltxdoc

Lehmann, Philipp. ltxdockit: Class for documented LaTeX macro files. LaTeX package, March 2011. In CTAN, http://ctan.org/pkg/ltxdockit

The LaTeX3 Project. LaTeX2ε for class and package writers. LaTeX Project documentation, February 2006. In CTAN, http://mirrors.ctan.org/macros/latex/doc/clsguide.pdf

Mittelbach Frank; Goossens Michel; Braams, Johannes; Carlisle, David; Rowley, Chris. The LaTeX Companion. Addison-Wesley, May 2004, 2nd Ed., 978-0201362992. http://www.amazon.com/LaTeX-Companion-Techniques-Computer-Typesetting/dp/0201362996

Bradner, Scott. Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, Internet Engineering Task Force, Fremont, CA, March 1997 http://www.ietf.org/rfc/rfc2119.txt

Flynn, Peter. A university thesis class: Automation and its pitfalls. Presented at TeX Users Group Conference 2012, Boston, MA, July 16–18, 2012. In TUGboat, 33:2, 2012, pp172–177. https://www.tug.org/members/TUGboat/tb33-2/tb104flynn.pdf

Knuth, Donald E. Literate Programming. Center for the Study of Language and Information, Stanford, CA (CSLI Lecture Notes, no.27) 1992, 0937073806, See http://www-cs-faculty.stanford.edu/~uno/lp.html

Thompson, David B. The Literate Programming FAQ. San Gabriel, CA, March 2000. http://www.literateprogramming.com/lpfaq.pdf