Note: Acknowledgements
My thanks go to all those friends in cooking and markup who contributed with suggestions and food. I am particularly indebted to Liam Quin and Tony Graham for their help in clarifying those bits of CSS which I had misunderstood or just failed to see.
Recipe data and ingredient syntax
In the earlier paper, the background and rationale was presented for storing recipe data in XML, specifically the ingredient data disaggregated into attributes on an element for the ingredient. In this paper, the algorithms devised in XSLT to recreate both the List of Ingredients and the references to those ingredients in the Method have informed the creation of CSS rules to recreate the same output, so that the generation of a separate print format is rendered unnecessary.
The conventions for how lists of ingredients are presented in an English-language recipe are relatively straightforward:
[ quantity ] [ units ] [ modifiers [ … ] ] item [ form ] [ , treatment ] [ command ]
for example
40 g red chili powder, sieved
In some recipe-book styles, the order is reversed (eg
chili powder, 40 g, sieved
) but this could be
achieved by a change in the order of code blocks in the handler
module; it is sufficiently unusual not to warrant a
configuration switch at this stage.
In ℞, the disaggregated data for each ingredient is stored as attributes, eg:
<ingredient xml:id="chili_powder" quantity="40" unit="g" colour="red" form="powder" spice="chili" treatment="sieved"/> <ingredient xml:id="oj" fruit="orange" part="juice" quantity="2"/> <ingredient xml:id="potatoes" quantity="1.25" unit="lb" vegetable="potato"/> <ingredient xml:id="flour" quantity="2.667" unit="cup" basic="flour"/> <ingredient xml:id="tabasco" quantity="2" unit="dash" spice="Tabasco"/>
The attributes can be summarized as:
-
quantity / unit
-
quality / part / color / size / container
-
item (in categories, summarized below)
-
form (how it comes, if different from nature)
-
treatment (additional prep before use)
-
comments / alternatives
The item itself is described in one of the 14 attributes
(also listed in the earlier paper) related to the identity of
the ingredient: such as fish, meat, dairy, fruit, vegetable,
spice, herb, basic (eg store-cupboard), etc. These are by their
nature mutually exclusive and therefore subject to a principal
validity constraint that only one of them can be present in any
given ingredient. It may even be present in isolation: for
example, spice="salt"
is perfectly acceptable as an
ingredient, on its own, with no indication in other attributes
about unit or quantity or color or anything else. The only other
compulsory attribute on an ingredient is @xml:id
, used for correlation between
the List of Ingredients and the mention of ingredients in the
Method.
There are a few other attributes, listed in detail in the earlier paper, which are are used to enable finer editorial control, rather than to change the way in which ingredients are presented to the cook. These are not discussed here.
Lists
The majority of these ‘category’
attributes are declared as enumerated (token) lists to avoid
accidental miskeying by authors and editors, and to prevent
unnecessary duplication and ambiguity, which is common in some
recipe books. This does of course not
protect against genuine
accidents, such as the
use of Kg instead of g — human editorial checking will always
be required, although warning levels could of course be
included in code.
These lists are not hard-coded into the schema: the DTD format used for the original experiment provided a simple mechanism for including them, so they are stored as external plain text lists with each entry followed by a vertical bar delimiter. Authors and editors can therefore add new values where required by using a plain-text editor, without specialist XML knowledge, assuming that they follow some simple rules, and that some in-house controls exist to periodically collate the additions for editing and distribution.
In the following sections, the procedures and their assertions can be tested by examination of the raw data in any of the test files at http://xml.silmaril.ie/recipes/ by clicking on the Print icon, which will display the underlying XML file, formatted with CSS.
Quantities and units
The first task was to deal with quantities and their
units. The @quantity
attribute
holds a number, sometimes an integer, but also sometimes a
floating-point value expressing a common decimal fraction. The
currently supported fractions are those for which a single
Unicode code point exists, such as half, the thirds, quarters,
fifths, sixths and eighths. For example, 1½ lb
potatoes
or 2⅔ cups flour
must be
given as
<ingredient xml:id="potatoes" quantity="1.25" unit="lb" vegetable="potato"/> <ingredient xml:id="flour" quantity="2.667" unit="cup" basic="flour"/>
There were two challenges for CSS in handling quantities and units:
-
Values using (eg) ⅓s or ⅙s must be given rounded to exactly three places of decimal (0.333, 0.167). While this was straightforward to handle in XSLT with
xsl:choose
, CSS has no such facility. -
With quantities more then one, standard units (eg lb, oz, g, pt, ℓ, etc) do not get pluralized, but many of the common (non-standard) units (eg bunch, capful, cube, dash, handful, pinch, sprig, etc) usually require pluralization.
Processing for quantities and units
-
Pre-empt juices with a prefix of
Glass of
(if ‘glass’ is specified as a unit) orJuice of
(where no unit is given); -
Output the integer portion of the quantity; add the fractional part as a vulgar fraction;
-
If the size if specified and the unit is absent or is one of the non-standard units, give the size here;
-
If the unit-weight is a number (ie no trailing units), output the multiplication sign and the value;
-
Output inch sign or curly-ℓ sign or the unit
glass
if required (except in the case of juices, which were handled earlier); -
Pluralise the common units if the quantity is more than one, inserting the required
e
for units such asdash
before adding the terminals
; -
If a container was specified, output that value here.
Examples:
<ingredient xml:id="oj" fruit="orange" part="juice" quantity="2"/> <ingredient xml:id="potatoes" quantity="1.25" unit="lb" vegetable="potato"/> <ingredient xml:id="flour" quantity="2.667" unit="cup" basic="flour"/> <ingredient xml:id="tabasco" quantity="2" unit="dash" spice="Tabasco"/>
Juice of 2 oranges
1¼ lb potatoes
2⅔ cups flour
2 dashes Tabasco
Note that here and elsewhere, white-space is used liberally to separate components, as the surplus space is elided by the browser.
Modifiers
Many ingredients are described with some form of
modification: we speak in a List of Ingredients of
500 g bread flour
rather than just
flour
, or 8 red chilis
rather
than just chilis
; but we normally use the item
name alone in references in the Method because they have
already been described. In most cases the modifier is
positioned as a prefix on the formatted item,
after the quantity and units.
There are two special cases here, the attributes @part
(for parts of an animal or
plant) and @form
(for the form in
which an ingredient comes). These may occur in combination
with the categories (eg handful of parsley
stalks
denominates stalks
as part of
the plant). These must be positioned before or after the item
category according to usage.
The @colour
attribute (note UK
spelling) is any textual color information to identify the
item accurately, and may go before or after the item; the @size
attribute is an enumerated list
of common terms (big, large, medium, small, little, giant,
tiny, etc) for the same purpose. The @container
and @unit-weight
attributes enables
quantities related to cans, boxes, tins, and other packets,
such as 2 × 400 g cans chickpeas
where the
quantity refers to the number of cans, not the count of
ingredients.
Processing for the modifiers
-
@size
comes first, when it applies to the ingredient (eg large onions), not the@unit
(small bunches); -
Insert
of
when there is no@quantity
and the item is not a@part
or@dairy
or@basic
or a@spice
or a@herb
, or the@quantity
is not a number and the item is a@spice
or@herb
; -
Use a
@colour
first in the case of flour; -
Then the
@quality
; -
Then the
@colour
when the item is not flour; -
@treatment
can come before the item when it acts as a prefix (eg ground, grated, shredded, etc).
Examples:
<ingredient xml:id="flour" quantity="500" unit="g" colour="white" quality="bread" basic="flour"/> <ingredient xml:id="pork-belly" quantity=".5" unit="lb" part="belly" meat="pork"/> <ingredient xml:id="parsley" unit="handful" part="stalk" herb="parsley"/> <ingredient xml:id="chickpeas" quantity="2" unit-weight="400" unit="g" container="can" vegetable="chickpea"/>
500 g white bread flour
½ lb pork belly
handful of parsley stalks
2 × 400 g cans chickpeas
Items
Each ingredient item is categorized using one of the 14
mutually exclusive attributes. The selection can be refined by
the use of the @part
attribute
mentioned above, where an item is a part of a greater whole;
and by the use of the @form
attribute, where an item such as peanuts actually comes in the
form of a butter
. This avoids overloading the
main attributes with many variants.
<ingredient quantity="2" unit="Kg" part="leg" meat="lamb"/> <ingredient quantity="2" units="tbsp" nut="peanut" form="butter"/>
Item categories
-
@meat
, a list of meats, eg beef, chicken, lamb, pork, etc -
@fish
, a list of seafood, eg salmon, hake, prawn, lobster, etc -
@dairy
, a list of dairy products, eg milk, cheese, cream, yoghurt, etc -
@fruit
, a list of fruits -
@alcohol
, a list of drinks -
@herb
, a list of herbs -
@vegetable
, a list of vegetables, including pulses -
@bean
, a list of beans (pulses) -
@nut
, a list of nuts -
@seed
, a list of seeds -
@pasta
, a list of types of pasta, noodles, etc -
@spice
, a list of spices -
@basic
, a catch-all list of common store-cupboard ingredients which have no other category, eg flour, oil, yeast, etc -
@sprinkles
, a list of edible decorative items, eg Streusel, grated chocolate, toasted almonds -
@prep
, text for any class of ready-prepared ingredient (eg a can of soup)
As these are by definition mutually exclusive, and can be constrained as such in a schema or constraint processor, the output can just concatenate the values of all of them because the extra spaces will be elided.
As explained in the earlier paper, there are some existing industrial taxonomies of foodstuffs, but none suitable for culinary use. The current set of categories is merely pragmatic, based on the natural separation of ingredients into groups, and could easily be changed; but in discussion with recipe editors there has been no significant dissent.
Pluralization
With a very few exceptions, the values in the enumerated
attributes are stored in the singular. Pluralization where the
quantity is more than one is required in two places: the
common units (not the standard units), and the name of the
item itself. Pluralization of the name of the item is also
required in references to the ingredient. In most cases in
English, adding s
is sufficient, but there is a
hard-coded list of exceptions that require a preceding
e
(tomato, potato), and a preliminary check for
names ending in erry
(berry, cherry, sherry)
which may require the replacement of the y
with
ie
.
The code for pluralizability in the XSLT uses the following logic:
-
Quantity is greater than one or non-numeric; OR the unit is a standard unit except tsp, tbsp, and dsp; OR it’s nuts or seeds; BUT NOT parts, meat, fish, dairy, basic, spices, pasta, herbs, alcohol, sprinkles, or prepared ingredients;
-
OR the quantity is numeric but there are no units (this would be the case for items used whole); BUT NOT a few individual uses such as
asparagus
orcrayfish
which take no plural form; -
AND NOT a few individual uses which do take a plural (peppercorn, chive, biscuit, breadcrumb, icecube).
There is also an exclusion list of names which take no plural form. In CSS, a different approach was taken which is explained below.
Treatment, comments and alternatives
Once the hard data of quantity, units, and the item itself have been output, the conventional format typically uses various forms of typography for additional information: comma, slash, vertical bar, semicolon, italics, and various forms of brackets.
These were unproblematic in both XSLT and CSS, as they could simply be appended to the ingredient if specified.
Reproduction of ingredient names in references
Throughout the Method in a recipe, the names of the ingredients occur in mixed content along what to do with them. The earlier paper presented solutions to some requirements of recipe editors: ; a) errors or inconsistencies in spelling or naming ingredients; b) omission of an ingredient which was listed; c) inclusion of an ingredient which was not listed; and d) the order of List of Ingredients must be the order in which ingredients are used. This paper offers some solutions to the problem of implementing the same cross-references in XML/CSS as are created in the XHTML.
To use
an ingredient in the Method the
author or editor inserts an ingref
element with an IDREFS attribute called @i
, which must contain one or more of
the @xml:id
values assigned to the
ingredients. [This element was called ing
in the earlier paper, but has been
renamed.]
The XSLT handles this by taking each value in turn and constructing a semantically and contextually valid word or phrase from the data attributes in the ingredient, using the same logic for the placement of prefix and suffix information if needed, and using the same logic for the expression of plurals. CSS cannot of course do this unaided.
Implementing the CSS
CSS implementation in browsers can be uneven, but the features needed for the current task appear to be supported in Chrome, Firefox, Safari, MSIE, Edge and their spawn.
The original site used for testing these recipes was an extension of the author’s XML FAQ pages, as a demonstration of the possibility of formatting XML with CSS alone. Even without the lessons learned from ℞ development, it was indeed possible to format the recipes, as shown in [Figure 1]. However, at that stage, ingredients were plain character data, not attributes.
With the ℞ markup replacing the unmarked text with data in attributes, the challenge was essentially to formulate the CSS so that the information was presented in the same way as the formatted XHTML generated by XSLT.
The formatting of the hierarchy and pool elements (titling,
preamble, and paragraphs) in CSS is conventional and
unproblematic. The two requirements were ; a) to make use of the attributes on an ingredient
element now declared
EMPTY; and b) to cause the relevant name[s] to appear in
cross-references to those elements in
the absence of wider CSS support for indirect reference
via ID/IDREF links.
The current solution to the first requirement makes very
extensive use of CSS variables, harvesting the data needed from
the attributes, and then exposing it in a single rule that
operates on an ingredient:after
selector. This
technique makes use of the specificity
and order features of CSS: a more
specific rule will take precedence over a less specific one, but
between rules of the same specificity, a later rule will
supersede an earlier one.
Consideration was given to the adoption of the CSS Within method due to Liam Quin, in which nonce CSS is stored in additional elements in the XSLT code, and emitted by a separate process. However, the current architecture is based on the assumption that each recipe occupies its own XML document, and shares a single CSS file for display. In other circumstances CSS Within could be a useful tool.
Ingredient setup
Before going into the individual detail of ingredients,
the ingredients
container for the
List of Ingredients (not itself a part of ℞) provides a good
example of this mechanism, shown in [Figure 2].
The element has @serves
and @makes
attributes to hold the number
of expected servings and/or the number of individual items
that the recipe provides. The harvesting of the data relies on
the specificity of the selector, here and elsewhere:
-
Along with the basic styling, the rule for
ingredients
specifies a variable (custom property)$numbers
which is set to null. -
If the element has a
@makes
attribute, it is used to set that variable to a dash, the word ‘makes’, and the value of the attribute. -
Exactly the same happens mutatis mutandis when the element has a
@serves
attribute. -
However, if the element has both attributes, the
$numbers
variable is set to both values. -
Finally, before the output of the first
ingredients
element starts, the resulting variable value is output as part of a heading-style display. -
(This is in fact heralded by the value of another variable
$ingred
which the root element rule has previously set to ‘Ingredients’ or its equivalent in a recognized language.)
The unadorned ingredient
rule in fact sets a
large number of values to null ([Figure 3]), so that using them in the
final formatting rule will not throw an error due to their
absence. The majority are there to handle pluralization
exceptions; the others to handle a few special cases, and the
textual values of comments and alternatives.
Attribute handling with specificity
Fortunately, only a few illustrative examples of the
attribute handling need be shown here, as the mechanism is
identical for all of them, with only the data and exceptions
varying. In each case, the unadorned rule sets a variable for
the most common plural form of the item, usually null or
‘s’. There are then more specific rules
for individual values known to require a different plural,
should they occur in quantities more than one; followed by a
rule bound to the @quantity
which
sets the pluralization back to null for quantities of one (see
[Figure 4]). Many values require no
pluralization (eg flour, sugar, oil, vinegar, etc) except in
very specialist circumstances, and therefore need no
specification.
For vegetables, the default is a plural ‘s’, with exceptions for those items taking no plural form ([Figure 5]) but also for quantities of one and less. An exception is given for canned tomatoes which always occur in quantities greater than one.
Note that these values are pragmatic: that is, the rules are there because they have occurred in the test suite, whereas no fractional values have been encountered for any basic ingredient [yet].
An exception was introduced for fruit, as the English
spelling of some plurals requires changing the word-ending
rather than simply adding to it, for example
‘cherry’ to
‘cherries’. This meant harvesting the
attribute value into a separate variable $fruitname
so that the stem could be
changed. This is then reflected in the output in [Figure 10], which references the variable, not the
attribute. Consideration was also given to the use of parts of
fruits (eg zest, juice, etc) which modifies the plural; and to
the use of containers.
(The eagle-eyed reader will notice that tomatoes occur in both fruit and vegetable lists — the categorisation is an ongoing work as mentioned earlier. Sugar is currently still a spice, as it always was, historically speaking.)
We saw earlier ([Figure 3]) that the
variable $quant
was set to
the value of the @quantity
attribute by the base rule for the ingredient. However, as
three-digit decimals need converting, we provide rules for all
the common ones available as Unicode vulgar fractions ([Figure 7]).
Units provide a slightly different requirement, as they may need to be bound to both value and quantity ([Figure 8]). As noted in the earlier paper, pluralization may need to be applied both to the ingredient item and to the unit by which it is measured, and the rules are not identical.
Maintenance of the CSS
Additional rules are easily added as more recipes are added with ingredients or quantities or units not hitherto encountered.
Periodic maintenance is also relatively straightforward: frequencies for each attribute are trivially extractable from the recipes at each site update, and a brief visual inspection is enough to identify any values requiring a new rule. In [Figure 9] it is easily seen that the majority of values are not meaningfully pluralizable and need no additional rule.
(The lxprintf utility is part of the LT-XML 2 toolkit available from the Language Technology Group at Edinburgh University.)
Rendering
The final stage is the rendering of all the harvested
values, using the ::after
selector on
the ingredient
rule (the element
itself is by definition EMPTY), as in [Figure 10]. This emits the relevant (unprocessed)
attribute values in the correct syntactic order, with plurals
where defined.
The chocolate mousse recipe shown in [Figure 1] has been edited to place the ingredient values in attributes (and to to remove the qualitative judgments), and the ingredients shown in [Figure 11] now render correctly in both the (fancy) XHTML and in the CSS-formatted XML.
Ingredient references
CSS has no facilities for the re-use of data from elsewhere
in the document (barring the limited in-scope
‘carry-forward’ mechanism of custom-property
variables). It is therefore not possible to perform the kind of
extended lookup done in the XSLT rendering, using the ID[s] in
the @i
IDREFS attribute of the ingref
reference element type to access
all the data of each ingredient from a context location deep
inside the Method. Instead, we cheat.
Use the ID, Cook
The ID of each ingredient is not exposed in the List of
Ingredients, nor in the other calculations about the sequence
of usage, nor in the detection of the presence or absence of
ingredients (see details in the parallel paper) except as
error messages in a log file. It was therefore open to require
the ID for each element to be the actual word that would be
needed in the ingredient reference, and for the CSS to
reproduce it as-is. That is, for 500 g onions, the @xml:id
value would be
‘onions’.
This involves no extra effort for the cook, author, or editor except in a few rare cases where several different types of the same ingredient are used, for which we are developing some simple rules (below).
In the test suite, there are over 800 ingredient IDs, many
the same between recipes (flour is most
often just ‘flour’), but we are only
concerned here about within recipes,
where IDs must be unique. Of these, 560 have @xml:id
values which are the same as
the name of the item itself (as with flour above), and because
there is only one item of that type in the recipe, no
ambiguity can occur, and no plural form is needed, eg
<ingredient xml:id="pork" meat="pork" ... /> <ingredient xml:id="sugar" spice="sugar" ... />
In the case of a plural, the pluralized form must be used for the ID:
<ingredient xml:id="eggs" quantity="3" part="egg" ... /> <ingredient xml:id="chilis" quantity="6" colour="red" spice="chili" ... />
The CSS for the ingref
element
referencing a single ingredient therefore just needs to
reproduce the value of the @i
attribute — its existence and validity is already guaranteed
by the XML parser.
Multiple ingredients
Where is it necessary to distinguish between multiple similar ingredients (eg three different types of sugar), a compound word for the ID must be used, with an underscore as the separator. Of the remaining 240 values identified above, just under 100 needed underscores for this purpose. The underscore can also be used where two words would normally be used to describe the ingredient: not this in this case the list value of the attribute uses a hyphen separator while the ID values uses an underscore. This too is a post hoc validation test.
<ingredient xml:id="parsley" unit="bunch" size="small" quality="chopped" herb="parsley"/> <ingredient xml:id="mint" unit="bunch" size="small" quality="chopped" herb="mint"/> <ingredient xml:id="tomato_puree" quantity="2" unit="tbsp" basic="tomato-purée"/> <ingredient xml:id="salt" quantity="½" unit="tsp" spice="salt"/> <ingredient xml:id="white_pepper" quantity="½" unit="tsp" colour="white" quality="ground" spice="pepper"/> <ingredient xml:id="vinegar" quantity="1" unit="tbsp" quality="wine" basic="vinegar"/> <ingredient xml:id="oil" quantity="2" unit="tbsp" quality="olive" basic="oil" alt="rapeseed oil"/>
In XSLT, the underscore can trivially be converted to a
space with translate()
, but no such feature
exists in CSS, so we cheat again.
Because every recipe gets processed by the ℞ XSLT, it was simple to include code to output a new CSS result document containing the required custom CSS for that recipe only:
ingref[i="tomato_puree"]:after { content:"tomato puree"; } ingref[i="white_pepper"]:after { content:"white pepper"; }
A final problem arose in handling multiple ingredients
referenced from the same ingref
element. Again this is trivial in XSLT (comma-separation, with
‘and’ before the last one) but not in
CSS alone. However, as that logic was already in the XSLT,
using it to output to the same per-document custom CSS made it
possible to write the rule:
ingref[i="garlic onion"]:after { content:"garlic and onion"; } ingref[i="parsley mint tomato_puree salt white_pepper vinegar oil"]:after { content:"parsley, mint, tomato purée, salt, pepper, vinegar, and oil"; } ingref[i="chicken ice"]:after { content:"chicken and icecubes"; }
In doing this, a significant concern was that the authors or editors should not have to concern themselves with the details of how the recipe gets rendered, only that when they reference multiple ingredients at a single point, it ‘just works’.
Results and conclusions
The system has now been tested successfully on 70 or so recipes but needs more, especially those using unusual ingredients, strange quantities, weird measures, and other outliers. At the moment it is regarded as an unstable beta: it works, but new data will certainly expose areas needing more work.
The ℞ code has been kept separate from the general XSLT creating the XHTML file body, so that the handlers for the ingredients occupy a single 900–line XSLT file, with three named templates for generating the ingredients, testing the order, and creating the ingredient references. This means the system can be implemented in any XML vocabulary, as it operates only on current-context, and makes no reference to any element type names or attributes from the surrounding XML environment. The only restriction is that the naming of the ingredient attributes themselves are still hard-coded, on the assumption that where ℞ is being implemented, no such attributes will yet exist.
The only additional code remaining outside this module at the moment covers the creation of the per-recipe custom CSS described in [section “Multiple ingredients”], but this will be moved into the core handler soon.
This project is an experiment to test if it is possible to create a system which applied sanity checks to recipes (ordering, presence, absence, etc) and which would help guarantee that ingredients listed got used, and that ingredients referenced were actually listed.
This has proved to be perfectly possible, as demonstrated in the earlier paper [Flynn 2020], and has now been shown to be extensible to cover more than the 70 or so recipes tested. The drawback is that the ingredients must be stored in fully disaggregated form, a task which neither authors nor editors have the time to do in the absence of an editing interface capable of handling the multi-choice selection needed. It may be possible to apply a trained algorithm to a textual ingredient descriptions and perform an initial disaggregation for the editor to work with.
A by-product of this method is that the creation of a ‘plain-and-simple’ print version of each recipe is now simplified, and needs no additional transformation of the XML, only the periodic updating of the CSS.
Future work involves migration testing, to identify the effort needed to implement the system on a virgin XML publishing structure; and to resolve a few outstanding difficulties in CSS.
References
[Bos 2016] Bos, Bert; Çelik, Tantek; Hickson, Ian; and Lie, Håkon Wium (Eds). ‘Cascading Style
Sheets Level 2 Revision 1 (CSS 2.1) Specification: W3C Recommendation 07 June 2011,
edited in place 12 April 2016 to point to new work.’ W3C, Boston, MA (2016). URI:https://www.w3.org/TR/CSS2/. (Latest editor’s draft, URI: http://dev.w3.org/csswg/css2/
).
[Flynn 2020] Flynn, Peter. ‘Cooking up something new: An XML and XSLT experiment with recipe data.’ Presented at Balisage: The Markup Conference 2020, Washington, DC, July 27–31 2020. In Proceedings of Balisage: The Markup Conference 2020. Balisage Series on Markup Technologies, vol.25 (2020). DOI:https://doi.org/10.4242/BalisageVol25.Flynn01.