How to cite this paper
Blažević, Mario. “Extending XML with SHORTREFs specified in RELAX NG.” Presented at Balisage: The Markup Conference 2012, Montréal, Canada, August 7 - 10, 2012. In Proceedings of Balisage: The Markup Conference 2012. Balisage Series on Markup Technologies, vol. 8 (2012). https://doi.org/10.4242/BalisageVol8.Blazevic01.
Balisage: The Markup Conference 2012
August 7 - 10, 2012
Balisage Paper:
Extending XML with SHORTREFs specified in RELAX NG
Mario Blažević
Senior software architect
Stilo International plc.
The author has a Master's degree in Computer Science from University of Novi Sad,
Yugoslavia. Since moving to
Canada in 2000, he has been working for OmniMark Technologies, later acquired by Stilo
International plc.,
mostly in the area of markup processing and on development of the OmniMark programming
language.
Copyright © 2012 Stilo International plc. All rights reserved.
Abstract
We present a novel method for specifying concrete syntax, based on and compatible
with the RELAX NG schema standard. A
parsing method is described for a well-formed XML document conforming to the given
concrete syntax specification. The
output of the parser is another XML document conforming to the abstract syntax described
by the base RELAX NG schema.
Table of Contents
- Introduction
- RELAX NG schema as a grammar
- Implementation
- Results and future directions
- Related work
- Appendix A. Concrete syntax schema extension for Balisage submissions
- Appendix B. Concrete syntax extension of XHTML schema
Note
This paper has been inspired in part by Sam Wilmott's 1993 internal report, Beyond SGML
[w93]. I
also want to thank my colleague Jacques Légaré for his valuable comments and clarifications,
and Stilo
International for giving me time to do interesting work.
Introduction
SGML had this feature called SHORTREF. It allowed the DTD designer to specify that
certain strings called shortrefs
should in some contexts be interpreted as markup tags. For the authors using an SGML
DTD with a well-designed set of
shortrefs, the effect was similar to using a kind of Wiki markup.
As with other parts of SGML, the specification syntax for shortrefs was idiosyncratic.[s86]
Furthermore, the method of their specification typically relied on some other rarely-used
features of SGML DTDs, such as
STARTTAG entities. This combination ensured that only an expert in SGML DTDs could
hope to design shortrefs correctly,
so they remained obscure and rarely used. When SGML was replaced by its simplified
successor XML, nobody regretted their
omission.
Or did they?
Many people stubbornly refuse to abandon their non-XML syntaxes. Programming language
designers still use the
old-fashioned EBNF grammars[b59] in their specifications instead of XML Schema. Even some languages
that are at the very core of various XML technologies, such as XPath, are not XML.
The RELAX NG schema language, though
specified in XML syntax[c01], defines a non-XML compact syntax
[c02c] as well.
The strongest evidence of yearning for shortrefs, however, is the myriad of Wiki languages in existence. Here we have a large family of actual markup
languages, whose main purpose is to be converted to HTML, another markup language,
and still they are not fully tagged
XML. SGML DTDs with shortrefs and appropriate declarations could accomplish the task.[j04] Instead,
Wiki engines typically store their pages as plain text, parse them using hand-coded
parsers written in various
general-purpose languages, and convert them directly to HTML for presentation.[b07]
There are many downsides to this architecture. Most Wiki pages are stored unvalidated
and unstructured, which makes them
suboptimal for searching and very difficult to automatically restructure. They are
missing all XML tool chain
support. All these problems are judged to be outweighed by the benefit of the special
notation. A solution that
preserves this notational convenience while keeping markup in XML documents would
be a clear winner.
The present paper aims to deliver one solution that satisfies these criteria: given
a relatively simple syntax
specification that follows the established standards, it allows the author to create
valid XML without entering XML
tags. In other words, it resurrects SGML shortrefs in a more modern context of well-formed
XML and RELAX NG schema
specifications.
RELAX NG schema as a grammar
If our job is to specify how some text is to be parsed, one obvious place to start
is from grammars, or more
specifically context-free grammars; they have been successfully used for this purpose
for more than half a
century[b59]. Here is an example of such a grammar for a small fragment of a Wiki markup language,
specified in a variant of the EBNF notation:
paragraph ::= (plain-text | bold | italic)* "\n\n"?
bold ::= "**" (plain-text | italic)* "**"
italic ::= "//" (plain-text | bold)* "//"
plain-text ::= ([^\n*/]+ | "\n" [^\n] | "*" [^*] | "/" [^/])+
The plain-text
production is rather tricky. This context-free grammar is working directly on plain-text
input with no
help from any lexical layer, so plain-text
has to exclude the three markers (**
, //
, and the newline) in order
to avoid ambiguity. The production would become even more complicated as more markup
is added to the grammar.
Once the input text is parsed according to the grammar, we can represent the resulting
abstract syntax tree as XML and
use the following compact RELAX NG schema for its validation:
paragraph = element para { (plain-text | bold | italic)* }
bold = element bold { (plain-text | italic)* }
italic = element italic { (plain-text | bold)* }
plain-text = text
The similarities between the two notations above are striking. The main difference
is that the former specifies a
concrete syntax, and the latter the abstract syntax[m62]. To become concrete, and thus
useful for parsing text, the RELAX NG schema needs to specify the string markers,
or terminal symbols. We could try
the following modification, which brings the schema even closer to the EBNF grammar:
paragraph = element para {
(plain-text | bold | italic)*,
"

"?
}
bold = element bold { "**", (plain-text | italic)*, "**" }
italic = element italic { "//", (plain-text | bold)*, "//" }
plain-text = text
The RELAX NG specification[c01] unfortunately does not allow text-matching and element-matching
patterns to be grouped together, and that makes the above schema invalid. To make
our concrete-syntax schema
syntactically correct, we need to enclose each string marker into an element of its
own. These elements will belong to
the special terminal
namespace so we can distinguish them from the structural elements:
paragraph = element para {
(plain-text | bold | italic)*,
paragraph_separator?
}
bold = element bold {
bold_marker,
(plain-text | italic)*,
bold_marker
}
italic = element italic {
italic_marker,
(plain-text | bold)*,
italic_marker
}
plain-text = text
bold_marker = element terminal:bold_marker { "**" }
italic_marker = element terminal:italic_marker { "//" }
paragraph_separator = element terminal:paragraph_separator {
"

"
}
We could also replace the text
pattern by string{pattern="([^\n*/]+|\n[^\n]|\*[^*]|/[^/])+"}
to replicate the
grammar even closer. As noted above, however, this pattern grows more complex as more
markers are added to the grammar,
which makes it difficult to maintain. Another downside is that the schema would lose
the modularity properties that
RELAX NG normally provides.
The plain-text
pattern is meant to match any text up to any marker that is allowed in the context.
Rather than require
the user to construct this pattern every time a new marker is introduced, we can change
the meaning of the text
pattern to match what we need. In the standard RELAX NG semantics, text
matches all text content up to the next
element tag; in our modified semantics, it will match all text content until the next
marker recognizable in the
context, or until the next element tag.
Our parser must construct an abstract syntax tree with element nodes like bold
that are not present in the input. To
achieve this, we need to add another semantic extension and infer the missing element
tags[b10]. This is especially necessary for features like Wiki lists, where a single indented
asterisk can denote
the beginning of both a list and a list item. This is similar to the OMITTAG feature
of SGML, the main difference being
that our input must be well-formed XML; the element's start-tag and its end-tag must
both be present or both omitted.
The only elements with omissible tags will be those in the terminal
namespace and those whose namespace URI begins
with the prefix omissible+
(which is perfectly legal according to RFC 2396). In the schema fragment above, the
default
namespace should be made omissible; in other words, the schema should be preceded
by
default namespace = "omissible+http://my.namespace.com/"
namespace terminal = "http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols"
The elements in the terminal
namespace are perfectly ordinary XML elements; what gives them a special meaning
is
that the parser deletes them from the constructed syntax tree together with their
content. The elements with the
omissible+
namespace prefix will be kept in the normalized XML output, but their URI prefix
will be removed. This
stripping of terminal elements and omissible namespace prefixes is the default mode
of operation. The parser can also be
made to emit all the terminal nodes and keep the omissible namespace prefixes. For
the above example schema and the
input paragraph
Here's a **fat
and somewhat //slanted
// text**
example.
the default output of the parser is
<paragraph xmlns="http://my.namespace.com">Here's a <bold>fat
and somewhat <italic>slanted
</italic> text</bold>
example.</paragraph>
and the raw output, if requested, would be
<paragraph
xmlns="omissible+http://my.namespace.com"
xmlns:terminal="http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols"
>Here's a <bold><terminal:bold_marker>**</terminal:bold_marker>fat
and somewhat <italic><terminal:italic_marker>//</terminal:italic_marker>slanted
<terminal:italic_marker>//</terminal:italic_marker></italic> text<terminal:bold_marker>**</terminal:bold_marker></bold>
example.<terminal:paragraph_separator>
</terminal:paragraph_separator></paragraph>
Both these outputs are well-formed XML and contain no text markers. The former is
valid against the original RELAX NG
schema, and the latter is valid against the enriched schema. If we want to replicate
the behaviour of an SGML DTD, where
one can alternate between shortrefs and regular element tags, all we need do is combine
the two schemata into one. The
cleanest way to accomplish the same effect is to have the concrete-syntax schema include
the original one, combining the
original definitions with its own. If the original schema was defined in file strict.rng
, the extended schema could be
defined in a separate file as follows:
default namespace = "omissible+http://my.namespace.com/"
namespace terminal = "http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols"
include "strict.rng"
paragraph |= element para {
(plain-text | bold | italic)*,
paragraph_separator?
}
bold |= element bold {
bold_marker,
(plain-text | italic)*,
bold_marker
}
italic |= element italic {
italic_marker,
(plain-text | bold)*,
italic_marker
}
plain-text = text
bold_marker = element terminal:bold_marker { "**" }
italic_marker = element terminal:italic_marker { "//" }
paragraph_separator = element terminal:paragraph_separator {
"

"
}
Both the default and the raw output (i.e., the abstract and the concrete syntax tree)
now conform to the same RELAX NG
schema, and we can use any conforming RELAX NG validator to verify this.
Implementation
The parser for the schema specifications described in the previous section has been
implemented in Haskell and can be
found at http://hackage.haskell.org/package/concrete-relaxng-parser. It compiles to a standalone executable
that requires two file names as arguments: the target RELAX NG schema (with or without
any concrete-syntax extensions),
and the input XML document.
The implementation of the concrete-syntax parser is based on the RELAX NG reference
implementation[c02] with its novel algorithm based on Brzozowski derivatives[b64] [s05], together with some extensions described in our previous work[b10]. In particular,
the inference of the missing element tags is the same as in [b10], the only change being its
restriction to the set of elements whose namespace URI begins with the string omissible+
. The rest of this section
will concentrate on details that have not been described elsewhere.
The biggest change from [b10] is in the textDeriv
function. Both in the reference validator and
in the previous normalizer implementation, this function must match its pattern argument
against its entire text node
argument. Now a pattern is allowed to consume only a prefix of the current text node,
so the Brzozowski derivatives
cannot be calculated as easily. One possible solution would be to calculate the derivative
character by character, but
its performance would be unacceptable. We also considered introducing a lexical layer
that separates all possible
syntactic markers from the rest of the text, but in the end we settled for a mixed
derivative/continuation-passing
algorithm. The textDeriv
function takes two continuations, one invoked in case the pattern consumes the entire
text
node and the other in case there is some leftover text. This way each pattern is free
to consume as much text as it can
match in a single try, and pass the rest to the continuation pattern.
This technique unfortunately does not implement the interleave
patterns properly. If their semantics from the RELAX NG
specification was carried over to the text nodes literally, it would imply that an
interleave
pattern should match any
interleaving of the character sequences matched by its two branches. This semantics
would be very difficult to implement
efficiently, but more importantly, it would probably be useless in practice. Instead,
textDeriv
implements the
interleave
pattern as an alternation: one of its branches is matched followed by the other.
This semantics is
unfortunately not composable. At this time we must recommend against the use of interleave
in concrete syntax
definitions. The semantics of interleave
across multiple XML elements and text nodes is not affected by this problem.
Another significant hurdle to overcome in the adaptation of RELAX NG to the task of
parsing text is its text
pattern. Having been designed for the validation of XML documents, RELAX NG allows
the text
pattern to match any
arbitrary contiguous region of text. The boundaries of this region are determined
by the surrounding markup tags. Since
we cannot count on these hard boundaries, we must keep track of all syntactic markers
that can appear instead of element
tags. These markers are divided into two sets, the alternate set and the follow set. The former contains all
markers that can begin an alternative to the current pattern, while the latter contains
all markers that can appear
after the end of the current pattern.
The same approach is applied to data
and dataExcept
patterns: they are bounded by the next following
marker. They consume the longest possible prefix, recognized by the data type, of
the text preceding the marker.
Whitespace is for the most part handled the same as all other text. The only two exceptions
are that the whitespace
consumption does not affect the alternate set and follow set of syntactic markers,
and that any amount of whitespace can
precede an explicit element tag. The latter feature follows the behaviour of the standard
RELAX NG validator, which
ignores whitespace between elements.
Results and future directions
The presented RELAX NG extension could be applied to many RELAX NG schemata and used
to shorten their instances. Whether
it should be applied to any particular schema depends mostly on outside factors like the target
audience and
document corpus. There are also, however, several technical factors that must be taken
into consideration.
-
Syntactic markers can only be used to infer element tags without any specified attributes.
This shortcoming is partly
a consequence of the inability to specify fixed attribute values in RELAX NG, and
could potentially be remedied by
future extensions.
-
While a schema extended with syntactic markers and omissible element tags can replicate
most common uses of SGML
SHORTREF feature, it is a fundamentally different mechanism. A SHORTREF can expand
to any general entity, which is
free to include multiple elements with specified attributes and arbitrary content.
A syntactic marker serves only to
guide the parser in which omissible elements should be inferred, and these inferred
elements are the only possible
addition to the parsed output.
-
SGML derives some benefit from being a large and integrated specification. In particular,
we can offer no equivalent
to SGML usemap
declaration which can activate an arbitrary set of shortrefs in any position in the
document, or turn
them all off. Since our input is well-formed XML, we could instead introduce special
processing instructions that
affect the parser's behaviour. The main obstacle currently is that the RELAX NG infrastructure
normalizes the XML
input, removing all processing instructions prior to validation and parsing. The CDATA
marked sections are also
normalized away, which presents an even more serious problem because the parser may
infer elements within them.
-
The current performance of the parser is sufficient for authoring documents with
syntactic markers and occasional
one-off conversion to a fully tagged instance, but it would impose a significant overhead
in a repeatedly invoked
markup-processing pipeline. The worst-case performance of any parser implementation
will depend on the details of the
schema; since RELAX NG does not impose LL(1) or similar constraints, neither do we.
-
A judicious use of syntactic markers can ease the XML document authoring in a text
editor. Their benefits would be
diminished if used with an XML editor; they could even degrade the experience in this
context.
-
There is currently no support for automatic inference of the desired element nesting
level, like Wiki for example does
with the indentation of the list item bullets. To allow an element to be nested within
itself, the schema must specify
a different syntactic marker for each element nesting level. Alternatively, one can
always nest explicit element tags.
-
On the positive side, the concrete-syntax schema can be as modular as a regular,
abstract-syntax RELAX NG schema. It
is possible to experiment with multiple different concrete syntaxes for the same abstract
syntax, for example, or
vice versa.
-
The parser translates an XML document from concrete to abstract syntax. There is
currently no tool support for
performing a reverse translation. This would be a problem for any deployment scenario
which allows a document to be
edited in both the explicitly-tagged and its concrete syntax variant.
As a proof of concept, the present paper has been written in concrete syntax and translated
to the abstract syntax
conforming with the target schema. The concrete-syntax schema extension is given in
Appendix A.
The sample schema extension modifies seven elements: code
, emphasis
, listitem
, para
, programlisting
, quote
,
and title
. Their tags are made omissible in all contexts where they can occur, with the exception
of emphasis
which
must be explicitly tagged inside programlisting
and inside an inferred emphasis
. Each of the seven elements is also
given a concrete syntax with different terminal symbols. Authored with the full use
of these extensions, the present
paper contains a total of 141 element tags — mostly of elements with required attributes.
Once parsed into an
explicitly tagged XML instance, it gains additional 284 element tags.
Another example in Appendix B presents a small extension of the modularized RELAX NG schema for XHTML 1.0[c08]. We hope to prepare more concrete syntax extensions like these for other XML schemata
in the
future.
Related work
The tool presented herein treats the RELAX NG schema as an abstract syntax description,
and sprinkles it with some
extensions for describing the concrete syntax of the language. There have been other
tools[p09] [q11] using the same approach of starting with the abstract syntax and extending it
with concrete syntax annotations. The abstract syntax notation in these related works
is tool-specific, since they don't
use XML as the abstract syntax tree.
On the other hand, there are numerous reports[b00] [c03] [m04] [r05] that focus on using XML as the target abstract syntax tree (AST) notation of a
parser for some concrete syntax. To perform their parsing, however, they use parser-generators
such as ANTLR[p95] and other traditional parsing tools, so they specify their concrete syntax in the
formalism those tools
require. Those that use an XML schema at all, use it only to validate the generated
AST.
Appendix A. Concrete syntax schema extension for Balisage submissions
<?xml version="1.0" encoding="UTF-8"?>
<grammar ns="omissible+http://docbook.org/ns/docbook"
xmlns:explicit="http://docbook.org/ns/docbook"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:terminal="http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols"
xmlns:non-syntactic="http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Nonterminal_symbols"
xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<!-- The balisage-1-3a.rng schema included below is semantically equivalent to the original Balisage
schema, but slightly refactored with the following definitions added for reuse:
- code.content
- emphasis.content
- para.content
- programlisting.content
- quote.content
- title.content
-->
<include href="balisage-1-3a.rng">
<define name="programlisting.content">
<ref name="programlisting.content.explicit"/>
</define>
</include>
<define name="title" combine="choice">
<element name="title">
<ref name="title.attlist"/>
<ref name="title.content"/>
<ref name="paragraph_separator"/>
</element>
</define>
<define name="para" combine="choice">
<element name="para">
<ref name="para.attlist"/>
<ref name="para.content.non-recursive"/>
<ref name="paragraph_separator"/>
</element>
</define>
<define name="programlisting" combine="choice">
<element name="programlisting">
<ref name="programlisting.attlist"/>
<ref name="programlisting_open_marker"/>
<ref name="programlisting.content.explicit"/>
<ref name="programlisting_close_marker"/>
</element>
</define>
<define name="listitem" combine="choice">
<element name="listitem">
<ref name="listitem.attlist"/>
<ref name="listitem_marker"/>
<oneOrMore>
<ref name="para.level"/>
</oneOrMore>
</element>
</define>
<define name="code" combine="choice">
<element name="code">
<ref name="code.attlist"/>
<ref name="code_marker"/>
<ref name="code.content"/>
<ref name="code_marker"/>
</element>
</define>
<define name="emphasis" combine="choice">
<element name="emphasis">
<ref name="emphasis.attlist"/>
<ref name="emphasis_marker"/>
<ref name="emphasis.content.non-recursive"/>
<ref name="emphasis_marker"/>
</element>
</define>
<define name="quote" combine="choice">
<element name="quote">
<ref name="quote.attlist"/>
<ref name="quote_marker"/>
<ref name="quote.content"/>
<ref name="quote_marker"/>
</element>
</define>
<!-- inlined emphasis.content, but with only explicit nested emphasis -->
<define name="emphasis.content.non-recursive">
<zeroOrMore>
<choice>
<text/>
<ref name="link"/>
<ref name="citation"/>
<ref name="emphasis.explicit"/>
<ref name="footnote"/>
<ref name="trademark"/>
<ref name="email"/>
<ref name="code"/>
<ref name="superscript"/>
<ref name="subscript"/>
<ref name="quote"/>
<ref name="xref"/>
</choice>
</zeroOrMore>
</define>
<!-- emphasis element with explicit tags -->
<define name="emphasis.explicit">
<element name="explicit:emphasis">
<ref name="emphasis.attlist"/>
<ref name="emphasis.content"/>
</element>
</define>
<!-- para.content minus the block-level elements which can recursively nest a paragraph -->
<define name="para.content.non-recursive">
<zeroOrMore>
<choice>
<text/>
<ref name="citation"/>
<ref name="code"/>
<ref name="email"/>
<ref name="emphasis"/>
<ref name="equation"/>
<ref name="inlinemediaobject"/>
<ref name="link"/>
<ref name="subscript"/>
<ref name="superscript"/>
<ref name="trademark"/>
<ref name="quote"/>
<ref name="xref"/>
</choice>
</zeroOrMore>
</define>
<!-- programlisting.content with only the explicit emphasis -->
<define name="programlisting.content.explicit">
<zeroOrMore>
<choice>
<text/>
<ref name="emphasis.explicit"/>
<ref name="superscript"/>
<ref name="subscript"/>
</choice>
</zeroOrMore>
</define>
<define name="emphasis_marker">
<element name="terminal:emphasis_marker">
<value type="string">''</value>
</element>
</define>
<define name="paragraph_separator">
<element name="terminal:paragraph_separator">
<value type="string">

</value>
</element>
</define>
<define name="programlisting_open_marker">
<element name="terminal:programlisting_open_marker">
<value type="string">{{{
</value>
</element>
</define>
<define name="programlisting_close_marker">
<element name="terminal:programlisting_close_marker">
<value type="string">
}}}</value>
</element>
</define>
<define name="listitem_marker">
<element name="terminal:listitem_marker">
<value type="token">*</value>
</element>
</define>
<define name="code_marker">
<element name="terminal:code_marker">
<value type="string">`</value>
</element>
</define>
<define name="quote_marker">
<element name="terminal:quote_marker">
<value type="string">"</value>
</element>
</define>
</grammar>
Appendix B. Concrete syntax extension of XHTML schema
<grammar ns="omissible+http://www.w3.org/1999/xhtml"
xmlns:explicit="http://www.w3.org/1999/xhtml"
xmlns:terminal="http://en.wikipedia.org/wiki/Terminal_and_nonterminal_symbols#Terminal_symbols"
xmlns="http://relaxng.org/ns/structure/1.0">
<include href="xhtml/xhtml-strict.rng"/>
<define name="head" combine="choice">
<element name="head">
<ref name="head.content"/>
</element>
</define>
<define name="title" combine="choice">
<element name="title">
<text/>
</element>
</define>
<define name="body" combine="choice">
<element name="body">
<ref name="Block.model"/>
</element>
</define>
<define name="p" combine="choice">
<element name="p">
<ref name="paragraph_separator"/>
<ref name="Inline.model"/>
</element>
</define>
<define name="ol" combine="choice">
<element name="ol">
<oneOrMore>
<ref name="ol.li"/>
</oneOrMore>
</element>
</define>
<define name="ul" combine="choice">
<element name="ul">
<oneOrMore>
<ref name="ul.li"/>
</oneOrMore>
</element>
</define>
<define name="hr" combine="choice">
<element name="hr">
<ref name="hr_marker"/>
</element>
</define>
<define name="em" combine="choice">
<element name="em">
<ref name="emphasis_marker"/>
<ref name="em.content.non-recursive"/>
<ref name="emphasis_marker"/>
</element>
</define>
<define name="ol.li">
<element name="li">
<ref name="ol_item_marker"/>
<ref name="li.content.non-recursive"/>
</element>
</define>
<define name="ul.li">
<element name="li">
<ref name="ul_item_marker"/>
<ref name="li.content.non-recursive"/>
</element>
</define>
<define name="em.content.non-recursive">
<zeroOrMore>
<choice>
<text/>
<ref name="abbr"/>
<ref name="acronym"/>
<ref name="br"/>
<ref name="cite"/>
<ref name="code"/>
<ref name="dfn"/>
<ref name="kbd"/>
<ref name="q"/>
<ref name="samp"/>
<ref name="span"/>
<ref name="strong"/>
<ref name="var"/>
<ref name="em.explicit"/>
</choice>
</zeroOrMore>
</define>
<define name="li.content.non-recursive">
<zeroOrMore>
<choice>
<text/>
<ref name="Inline.class"/>
<ref name="address"/>
<ref name="blockquote"/>
<ref name="div"/>
<ref name="pre"/>
<ref name="Heading.class"/>
<ref name="dl"/>
<ref name="p.explicit"/>
<ref name="ol.explicit"/>
<ref name="ul.explicit"/>
</choice>
</zeroOrMore>
</define>
<define name="em.explicit">
<element name="explicit:em">
<ref name="em.attlist"/>
<ref name="Inline.model"/>
</element>
</define>
<define name="p.explicit">
<element name="explicit:p">
<ref name="p.attlist"/>
<ref name="Inline.model"/>
</element>
</define>
<define name="ol.explicit">
<element name="explicit:ol">
<ref name="ol.attlist"/>
<oneOrMore>
<ref name="li"/>
</oneOrMore>
</element>
</define>
<define name="ul.explicit">
<element name="explicit:ul">
<ref name="ul.attlist"/>
<oneOrMore>
<ref name="li"/>
</oneOrMore>
</element>
</define>
<define name="emphasis_marker">
<element name="terminal:emphasis_marker">
<value type="string">*</value>
</element>
</define>
<define name="paragraph_separator">
<element name="terminal:paragraph_separator">
<value type="string">

</value>
</element>
</define>
<define name="line_separator">
<element name="terminal:line_separator">
<value type="string">
</value>
</element>
</define>
<define name="ol_item_marker">
<element name="terminal:ol_item_marker">
<value type="token">
# </value>
</element>
</define>
<define name="ul_item_marker">
<element name="terminal:ul_item_marker">
<value type="token">
* </value>
</element>
</define>
<define name="hr_marker">
<element name="terminal:hr_marker">
<value type="token">
----</value>
</element>
</define>
</grammar>
References
[b59]
Backus, J.W.,
The Syntax and Semantics of the Proposed International Algebraic Language of Zürich
ACM-GAMM Conference,
Proceedings of the International Conference on Information Processing, UNESCO,
1959, pp.125-132.
[b64]
Brzozowski, J. A. 1964. Derivatives of Regular Expressions. J. ACM 11,
4 (Oct. 1964), 481-494.
doi:https://doi.org/10.1145/321239.321249.
[b00]
Greg J. Badros. 2000.
JavaML: a markup language for Java source code.
Computer Networks 33, 1-6 (June 2000), 159-177.
doi:https://doi.org/10.1016/S1389-1286(00)00037-2.
[b07]
Mark Bergsma, 2007. Wikimedia architecture
http://www.nedworks.org/~mark/presentations/kennisnet/Wikimedia%20architecture%20(kennisnet).pdf
[b10]
Mario Blažević, 2010. Grammar-driven Markup Generation.
In Proceedings of Balisage: The Markup Conference 2010.
Balisage Series on Markup Technologies, vol. 5 (2010).
http://www.balisage.net/Proceedings/vol5/html/Blazevic01/BalisageVol5-Blazevic01.html.
doi:https://doi.org/10.4242/BalisageVol5.Blazevic01.
[c01]
James Clark and Makoto Murata. RELAX NG Specification.
http://relaxng.org/spec-20011203.html, 2001. ISO/IEC 19757-2:2003.
[c02]
James Clark. An algorithm for RELAX NG validation
http://www.thaiopensource.com/relaxng/derivative.html
[c02c]
James Clark. RELAX NG compact syntax, Committee Specification 21 November 2002, OASIS
http://relaxng.org/compact-20021121.html
[c08]
James Clark. Modularization of XHTML in RELAX NG
http://www.thaiopensource.com/relaxng/xhtml/
[c03]
James R. Cordy, 2003.
Generalized Selective XML Markup of Source Code Using Agile Parsing.
In Proceedings of the 11th IEEE International Workshop on Program Comprehension (IWPC '03).
IEEE Computer Society, Washington, DC, USA, 144-
[j04]
Rick Jeliffe. From Wiki to XML, through SGML.
http://www.xml.com/pub/a/2004/03/03/sgmlwiki.html
[m62]
John McCarthy,
Towards a Mathematical Science of Computation,
Proceedings of IFIP Congress 1962,
pages 21-28,
North Holland Publishing Company, Amsterdam
[m04]
J.I. Maletic, M. Collard, and H. Kagdi,
Leveraging XML technologies in developing program analysis tools.
IEEE Digest 2004, 80 (2004), doi:https://doi.org/10.1049/ic:20040255.
[p95]
Parr, T. J. and Quong, R. W. ANTLR: A predicated-LL(k) parser generator.
Software: Practice and Experience,
volume 25, issue 7, 1995. John Wiley & Sons, Ltd.
doi:https://doi.org/10.1002/spe.4380250705
[p09]
Jaroslav Porubän, Michal Forgáč, and Miroslav Sabo, Annotation Based Parser Generator.
Proceedings of the International Multiconference on Computer Science and Information
Technology, 2009, pp. 707–714
[q11]
Luis Quesada, Fernando Berzal, and Juan-Carlos Cubero,
A Tool for Model-Based Language Specification.
Department of Computer Science and Artificial Intelligence, CITIC, University of Granada,
http://arxiv.org/abs/1111.3970v1
[r05]
Raihan Al-Ekram and Kostas Kontogiannis. 2005.
An XML-Based Framework for Language Neutral Program Representation Generic Analysis.
In Proceedings of the Ninth European Conference on Software Maintenance and Reengineering
(CSMR '05).
IEEE Computer Society, Washington, DC, USA, 42-51.
doi:https://doi.org/10.1109/CSMR.2005.10
[s05]
Sperberg-McQueen, C. M. Applications of Brzozowski derivatives to XML schema processing.
In Extreme Markup Languages 2005, page 26, Internet, 2005. IDEAlliance.
[s86]
Standard Generalized Markup Language (SGML)
International Organization for Standardization ISO 8879:1986
[w93]
Sam Wilmott, Beyond SGML.
Exoterica Technical Report ETR-9, 1993.
http://developers.omnimark.com/etcetera/etr09/
×
Backus, J.W.,
The Syntax and Semantics of the Proposed International Algebraic Language of Zürich
ACM-GAMM Conference,
Proceedings of the International Conference on Information Processing, UNESCO,
1959, pp.125-132.
×
James R. Cordy, 2003.
Generalized Selective XML Markup of Source Code Using Agile Parsing.
In Proceedings of the 11th IEEE International Workshop on Program Comprehension (IWPC '03).
IEEE Computer Society, Washington, DC, USA, 144-
×
John McCarthy,
Towards a Mathematical Science of Computation,
Proceedings of IFIP Congress 1962,
pages 21-28,
North Holland Publishing Company, Amsterdam
×
Parr, T. J. and Quong, R. W. ANTLR: A predicated-LL(k) parser generator.
Software: Practice and Experience,
volume 25, issue 7, 1995. John Wiley & Sons, Ltd.
doi:https://doi.org/10.1002/spe.4380250705
×
Jaroslav Porubän, Michal Forgáč, and Miroslav Sabo, Annotation Based Parser Generator.
Proceedings of the International Multiconference on Computer Science and Information
Technology, 2009, pp. 707–714
×
Luis Quesada, Fernando Berzal, and Juan-Carlos Cubero,
A Tool for Model-Based Language Specification.
Department of Computer Science and Artificial Intelligence, CITIC, University of Granada,
http://arxiv.org/abs/1111.3970v1
×
Raihan Al-Ekram and Kostas Kontogiannis. 2005.
An XML-Based Framework for Language Neutral Program Representation Generic Analysis.
In Proceedings of the Ninth European Conference on Software Maintenance and Reengineering
(CSMR '05).
IEEE Computer Society, Washington, DC, USA, 42-51.
doi:https://doi.org/10.1109/CSMR.2005.10
×
Sperberg-McQueen, C. M. Applications of Brzozowski derivatives to XML schema processing.
In Extreme Markup Languages 2005, page 26, Internet, 2005. IDEAlliance.
×
Standard Generalized Markup Language (SGML)
International Organization for Standardization ISO 8879:1986