Why Do We Have Problems?
The naming problems (that is, keeping the semantics of a profile unchanged but
changing the displayed value) have a fairly obvious basic cause: Values
are handled directly, instead of addressing the basic semantics of the
profile. They inevitable change over time, but a simple product name
change may be just that, a mere name change, meaning that the semantics remain
unchanged. Yet, the profiling information that is available does not reflect
this.
As with any changing content, any profile value should be version handled, yet
they can't be when handled directly as strings.
The scoping problems offer further revelations:
-
We confuse semantics with values. Changed semantics may or may not
result in a changed value; filtering should be based on semantics rather
than representations.
-
The semantics evolve over time, as do the values, but the values are
only there to represent the semantics.
In the car example, D5
is used for both scopes because
for the manufacturer's aftersales organisation, the engine variant is
the same, regardless of the components used. In other words, we happen
to have two different versions of the basic semantics but the same value
to represent them.
-
Because we confuse semantics and presentation, we can either describe
the changes in presentation or describe the changes in semantics, but
not both.
-
A change in a profile's semantics should mean a new version of the
profile but not necessarily new values.
Or, in so many words, we confuse semantics and current values, using them
interchangeably and frequently changing the wrong one. We need to separate the
two.
Abstraction Layers
The solution is to separate semantics from presentation, like this:
Table I
Semantics
|
Presentation
|
D5 old
|
D5
|
D5 new
|
D5
|
Or, if changing profiles according to localisation, like this:
Table II
Semantics
|
Presentation
|
Platform X, GB
|
Vauxhall
|
Platform X, DE
|
Opel
Saab
|
Platform X, SE
|
Opel
Saab
|
And so on. In the former example, we have a basic name for the semantics
(D5
) and two versions, both represented by
the same value. In the latter, we have three localisations of
the basic platform name (X
), GB, DE and SE. Interestingly, the
localisations of the platform use three different values, Vauxhall, Opel and Saab.
In this case, this represents the fact that the same basic platform is used to
create three separate vehicle brands.
Obviously, all may be required to completely describe the correlation between the
semantics and every intended representation of the profile, like so:
PROFILE-VERSION-LOCALISATION
The different versions and localisations could then be assigned values:
Table III
Profile
|
Values
|
D5.1-GB
D5.1-DE
D5.1-SE
|
D5
|
D5.2-GB
D5.2-DE
D5.2-SE
|
D5
|
X.1-GB
|
Vauxhall
|
X.1-DE
|
Saab Opel
|
X.1-SE
|
Saab Opel
|
Note that the table represents incomplete semantics rather than a real-life
problem. More is required to determine which value to use and when.
If the core semantics change, the corresponding values may or may not change; if changed values are desired, the corresponding semantics must change.
The core
profile, the intended semantics of the filtering
condition, should be uniquely and persistently named. That name should be version
handled and localised as needed. So, I wonder, is there a convenient way to separate
semantics from presentation?
Use URNs to Name Filters
I'm partial to URNs when it comes to uniquely identifying things. I'd have used
URNs to name my kids, had I been allowed to.
It's easy to define a URN namespace for unique names. And if you control the
scope, they can also be persistent. For URN-based profiling, something like this
should do:
PROFILE:LANG-COUNTRY:VERSION
PROFILE
, of course, is the core profile, the semantic filter concept,
LANG-COUNTRY
the localisation and VERSION
a specific
milestone. Combined, they should describe the examples above, but
PROFILE
can be further broken down if needed. For example,
Platform X in the above table could solve the
semantic problems: X:OPEL
, X:SAAB
, etc.
A semantically identical profile used for different markets requiring different
presentation (values) is solved like so:
Table IV
URN
|
Values
|
URN-X:sv-SE:12
|
V1
|
URN-X:en-GB:12
|
V2
|
The values (V1
for Sweden, V2
for the UK) are
different because the target localisation varies, but the core profile
(URN-X
) is the same, as is the version (12
). The
values V1
and V2
are therefore equivalent with each
other.
Here's the introductory XML example using URNs as profiles:
<doc profile="urn:x-profile:a:sv-SE:12">
<p>Information common to products A, B, and C.</p>
<p profile="urn:x-profile:a:sv-SE:12">Information about product A.</p>
<p profile="urn:x-profile:b:sv-SE:7">Information about product B.</p>
<p profile="urn:x-profile:c:sv-SE:3">Information about product C.</p>
</doc>
A variable might be included like so:
<p>Information about product <phrase profile="urn:x-profile:a:sv-SE:12"/>.</p>
As the phrase
element is a placeholder for variable content, the URN
needs to be processed accordingly so that the right values are used when publishing.
This construct, of course, can still result in a linguistic nightmare.
Can representing profiles with URNs solve the problems we've outlined?
-
If a profile is updated, either when changing the values or their
scope, a system that can fully resolve the URNs will support both the
old and new profiles. A new document can use the new values because it
uses a later URN version while a legacy document can keep on using the
old values because it uses the older URN version.
-
As a consequence, no processing of legacy documents beyond resolving
URNs is necessary.
-
It is still easy to string match profiles when publishing, even if
localisation is required.
-
It is also easy to publish a legacy document that uses old URNs with
new values by preprocessing the old URNs.
Processing
Editor
To make URNs practical, the writer will need help to identify and insert a
profile (while URNs are unique, they are not necessarily human-readable).
Similarly, when editing existing profiled nodes, the profiles must be easily
identifiable.
The problem, of course, is that a string like
urn:x-cassis:r1:cos:xplatform:000359:sv-SE:0.12
is not very
descriptive. Identifying it requires asking the CMS, which might prove
cumbersome if one ever wanted to work offline.
A cop-out solution is to use strictly human-readable URNs, but problems such
as identifying the variations in scope in the D5
example above
(see section “Changing Scope”) require
more.
Perhaps better and certainly easier to process is to insert descriptive
throwaway attributes containing current profile values when checking out or
opening a document in the editor. Such an attribute, say, values
,
would be for convenience only and be stripped from the document at
check-in:
<p profile="urn:x-profile:a:sv-SE:12" values="A">Information about product A.</p>
An more powerful alternative requiring a bit more processing is to use a
mapping document listing any required profile-and-value pairs for any checked
out or open documents, like so:
<maps>
...
<pair>
<profile>urn:x-profile:a:sv-SE:12</profile>
<values>A</values>
</pair>
<pair>
<profile>urn:x-profile:a:en-GB:12</profile>
<values>B</values>
</pair>
...
</maps>
Or some variation thereof. A mapping document might also provide the basis for
a profiling GUI, listing the available profiles and their versions in some
human-readable form, an immediate advantage being that once populated, the
mapping document would give access to the available profiles without requiring a
server connection.
I've used a similar approach with a mapping document when matching URNs for
checked-out or open documents with their temporary URLs in the editor:
<Repository>
<RepositoryName>CosTI</RepositoryName>
<Map>
<UrnUrlPair>
<Urn>urn:x-cassis:r1:cos:00002730:sv-SE:0.7</Urn>
<Url>C:\Users\arin\Documents\condesign\cassis\ti\xmetal\2880321bb5d24b08a95e2854bccf859b\prox-för-cassis.xml</Url>
<Writable>false</Writable>
<EditUrl />
</UrnUrlPair>
</Map>
<ShowMetadataDialog>true</ShowMetadataDialog>
</Repository>
Expanding this to include profiling would be relatively easy.
Variable Text and Localisation
Variable text in the editor can be inserted using both techniques above: a
throwaway values
attribute or a separate mapping document both do
the trick. The former alternative requires less processing while the latter
gives access to more features. Localised values, for example, would require the
mapping document.
Combining Profiles
URNs (and indeed any type of abstraction layer) can help simplify complex
profiles, such as the logical expressions mentioned in section “Boolean Logic”. Instead of having to
process the expression in an attribute, the expression can be represented using
another URN, like so (with apologies for the pseudo-code):
URN-EXPRESSION = URN1 AND URN2 AND NOT(URN3 OR URN4)
The replacement URN represents the expression and is used instead of it when
processing. Of course, to be more than a theoretical exercise in neat ways of
doing the unneeded, the situations in which boolean expressions can occur must
be clearly defined. Such situations are common when describing complex modular
products and their many variants; such products are frequently sold as
individuals, requiring individualised documentation. A closer look of those
situations is outside the scope of this document, but the point I want to make
here is nevertheless an important one: rather than processing
2*(3+2)
, process 10
. An abstraction layer is
simply some suitable representation of semantics.
Thus, a writer might use a shortcut URN to represent a group of profiles
comprising several URNs. Such a user-defined URN
could be paired
with descriptive metadata to help identify it and other URNs created for similar
purposes. The right systems support could easily provide the user with a listing
of the underlying profiles.
Base Profiles
A complete profile includes localisation and version information, but
sometimes it is useful to process the base profile
regardless of language, country or version. This is easily done by defining
wildcard behaviour:
URN:*:*
This basically ignores the wildcards; it matches every single one. With the
URN semantics well defined (I use EBNF for mine) this should be easy.
Other useful variations here might define processing for, say, the latest
version of a profile. A stylesheet treating URN:sv-SE:*
as the
latest is not hard to do but will, of course, require access to the
corresponding values, either at runtime or when populating a mapping
document.
Assertions
Sometimes, filtering profiled content causes structural problems in the
resulting document, with required elements missing. Consider this admittedly
simplistic example:
<doc profile="A">
...
<warning>
<p profile="A">Some content.</p>
</warning>
...
</doc>
If a warning
must always contain at least one p
, the
above will result in an invalid warning
if published in context
B
rather than A
. This is an easy mistake to
make, and more complex nodes could easily end up being invalid without the user
noticing, especially in modularised documents, resulting in the problem
remaining undiscovered until the document is published.
As these problems will only appear later, they can be difficult to spot. This can be solved using schematron (ISO standard; see id-idso-sch)
assertions and validation on a document to check for problems and missing
content after applying profiles. Such tests can be automated and used to
validate the profiled nodes only. Here's a schematron fragment for checking if
the warning
contents match the publishing context:
<!-- Profiling status for node -->
<pattern>
<rule context="warning">
<assert test="p/@profile">No profiling information.</assert>
<report test="p/@profile">Profiling present.</report>
</rule>
</pattern>
<!-- Match -->
<pattern>
<rule context="warning">
<report test="contains(/*/@profile,p/@profile)">Profile matches
publishing context.</report>
</rule>
</pattern>
<!-- No match -->
<pattern>
<rule context="warning">
<assert test="contains(/*/@profile,p/@profile)">Profile does not
match publishing context.</assert>
</rule>
</pattern>
Note that complex schematron documents can be automatically generated if the
possible profiles are known and the possible changes are defined in a
schema.
It might be possible to use XML Schema 1.1 assertions but since an assertion
on an element cannot refer to siblings or ancestors (id-xsdassertions),
the assertion would have to be made on descendants only, like so:
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="doc">
<xs:complexType>
<xs:sequence
maxOccurs="unbounded">
<xs:element
name="warning">
<xs:complexType>
<xs:sequence
maxOccurs="unbounded">
<xs:element
name="p">
<xs:complexType
mixed="true">
<xs:attribute
name="profile"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute
name="profile"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute
name="profile"/>
<xs:assert
test="contains(@profile,.//*/@profile)"/>
</xs:complexType>
</xs:element>
</xs:schema>
This might result in some rather complex expressions, if the assertion
required needed to go beyond the basics as illustrated above. I have not further
explored this at the time of this writing.
Publishing
Publishing documents that include URN profiles remains easy; the URNs can be
processed as strings, using string matching, so the filtering of nodes should
not be a problem. Processing a translated document that uses untranslated
profiles might prove tricky, however. Here is an example of a document
originally profiled in Swedish but now translated to English:
<doc profile="urn:x-profile:a:en-GB:12">
<p>Information common to products A, B, and C.</p>
<p profile="urn:x-profile:a:sv-SE:12">Information about product A.</p>
<p profile="urn:x-profile:b:sv-SE:7">Information about product B.</p>
<p profile="urn:x-profile:c:sv-SE:3">Information about product C.</p>
</doc>
None of the profiled p
elements is
included in the resulting publication. This, of course, could be the desired
result, but more likely is that the profiles need to be preprocessed. One way
could be to prep the file going to translation, replacing any language/country
information in the URNs before translation. More flexible is to define the exact
preprocess according to need. For one thing, if the profiled node is not
relevant in the target localisation, the profile should remain unchanged.
Note
It might be better to include every applicable profile localisation
directly in the above example, rather than replacing the original one during
preprocessing, as suggested by a reviewer of this paper. Or, if the profile
was always applicable, leave out the localisation altogether by using a
wildcard convention ( such as profile="urn:x-profile:a:*:12"
)
with suitable assertions when preprocessing. More complex localisation
requirements could be similarly handled (sv-SE
and de-DE, but not
en-GB
, etc) using more complex
assertions.
Also, the translators should be made aware of any processing requiring exact
values (most notably when using profiles for variable text in content); the
profile values in a localisation are far more important to
the translator than their corresponding URNs. The latter, then, need to be
mapped to any relevant values, including values resulting from localisation or
from some special processing (i.e. if the latest version of a profile is
preferred), before the original document is translated. The values can be placed
in a mapping document, provided to the translators but they'd almost certainly prefer
preprocessed documents where text variables such as the phrase
element in section “Variable Text” include their values rather
than the URNs:
<p>Information about product <phrase profile="A B C">A, B and C</>.</p>
Note
This will not solve the grammatical problem. It simply helps translators
by showing the actual values rather than the URNs.
The Grammatical Problem Solved
The following sentence using a text variable will potentially cause problems
if the number of applicable profiles varies:
-
A single profile, say A
, is uncomplicated to use in
a variable: A is the latest-generation
diesel engine for the environmentally conscious driver.
-
A variable that might result from possibly multiple matching
profiles is more difficult: B and C are
high-performance turbo engines for the
demanding racing driver.
<p>The <phrase profile="A B C">is the latest generation diesel engine
for the environmentally conscious driver.</>.</p>
Brute force solutions involving marking up inline content to identify
grammatical constructs might be manageable if only two need to be handled, if
Boolean constructs are accepted, for example, by using expressions such as
profile="(A AND NOT(B)) OR (B AND NOT(A))"
for singular and
profile="A AND B"
for plural form, but even this will quickly
become unmanageable for the writer.
Far more useful is to add an abstraction layer that defines the
types of profiles, for example, diesel
engines
or turbo engines
. A mapping document might
define a group of profiles for the purpose, like so:
<group>
<profile>urn:x-profile:abc</profile>
<included>
<profile>urn:x-profile:a</profile>
<profile>urn:x-profile:b</profile>
<values>D5</values>
</included>
...
</group>
Here, all variants are called D5
but the value could just as
well be D Series Diesel Engine
or something else. The point is
that the abstraction is needed to a) group the participating profiles into a
meaningful semantic group while b) keeping either singular
or plural form, but not both, regardless of the number of exact profiles
used.
A different but useful way to solve the problem is to count the context
profiles in the root (one or more) and include markup to handle only the
grammatically relevant differences. Singular might be marked up as
<wrap context="s">is</wrap>
and plural as
<wrap context="p">are</wrap>
or similar.