Note: Acknowledgements
My thanks go to all those friends in cooking and markup who contributed with suggestions and food.
Background
There is a conventional formality to the way in which recipes are presented in western cultures which has been common since the middle of the nineteenth century. Before that, ‘receipts’ (as they were then known, from the Latin for ‘Take…’) were largely narratives, so you had to read them all the way through and note down what ingredients you would need.[1]; this was true from the earliest clay tablets [Anon 2016] through the Greek and Latin recipes of the Classical period [Vehling 1936] to the end of the manuscript era with the first large-scale cookbook, The Forme of Cury; and from the subsequent rise of the printed cookbook from the 1470s [Sitwell 2012], including the extensive body of household manuscript cookery books and ephemera (see, for example, Figure 2) that continued to flourish until the end of the 18th century [Masters 2013], to the conventional modern style which was pioneered by Eliza Acton (1845) and popularised by Isabella Beeton (1861) in the UK and Fannie Farmer (1896) in the USA. This style has a structure something like this:
-
Title and/or Description, sometimes with a picture
-
Number of portions (sometimes)
-
List of Ingredients (quantities, materials, treatments)
-
Method of preparation (steps)
-
Comments or serving suggestions
It is nowadays also common to provide an extended narration after the Description, perhaps explaining where the recipe originated, or what changes have been made, but this is a matter of taste and style, and not an essential component. The key components remain the List of Ingredients and the steps of the Method.
There has been some interesting work on encoding recipes, particularly in the historical field (and therefore by default using TEI) [Knauf 2017][Klug 2017], but these are typically done to enable the recipe[s] to be identified within a much larger corpus, not for the purposes of analysing the ingredients or method, so they do not tend to use markup down to the level proposed here.
Ingredients
Ingredients are usually given in the order in which they get used in the steps, but sometimes they may be in order of importance (for example, a recipe for a beef stew could start with the beef, even though the onions may be the first thing you begin the cooking with); and sometimes they may be grouped, especially in complex recipes (all spices together, or all ingredients for a sauce together).
The convention for ingredients is to give the quantity,
units, and item in that order (eg 3 Kg onions
,
but some authors or editors give the item first
(Onions, 3 Kg
). It is not important, except
that from a publishing point of view it needs to be done the
same way in each recipe to avoid confusing the reader.
<ingredient xml:id="onions" quantity="3" unit="Kg" size="small" colour="red" vegetable="onion" treatment="peeled and chopped fine"/>
Other material about quality, size, shape, and treatment
may be interspersed: the example above would be needed for
3 Kg small red onions, peeled and chopped fine
.
The borderline between the preparation or treatment being
attached to the ingredient, or being mentioned in the steps of
the method is sometimes hard to determine: both are common,
and the discussion above about style and consistency applies
here also.
Measurement
Measurements have their own cultural conventions. In modern western cultures there are three common ‘standards’:
-
In most European-influenced cultures, metric units are standard (grams, kilos, liters).
-
In the UK and some of its former spheres of influence, metric units are the norm for published recipes, but Imperial units are still often used domestically (ounces, pounds, pints, with 20 fluid ounces to the pint).
-
In the USA, measurements are given by volume (cups, pints, with 16 fluid ounces to the pint) but also (in larger quantities) in pounds and quarts; the word ‘ounce’ is used to mean ‘fluid ounce’, as an ounce weight is rarely used. Canada and Australia officially use the metric system but many people still habitually use Imperial or US measurements. New Zealand uses metric measures but has a standard metric cup (250ml).
However, all these cultures tend to use similar measures for very small quantities, subject to some minor differences related to eating and drinking habits (see Figure 3):
-
tea-spoon or coffee-spoon (tsp, cuillère à café or càc, etc: about 5ml), although in cultures where tea-drinking predominates, a coffee-spoon is smaller than a tea-spoon, about 3ml
-
dessert-spoon (dsp, cuillère à dessert or càd, etc: about 10ml), common in UK and French-speaking cultures only, so far as I have been able to determine
-
table-spoon or soup-spoon (tbsp, cuillère à soupe or càs, etc: about 15ml, but a tbsp is 20ml in Australia); in the UK, a soup-spoon is the same size as a dessert-spoon (although a different shape) —
[e]veryone knows how big a table-spoon is: it will just go into your mouth, though not if you have nice manners.
[Freeling 1972] -
Other spoons exist, of course: mustard-spoons and salt-spoons, for example, but I am not aware of any standard capacities. Extensive internationalisation would be needed for more widespread applicability: while the sizes appear not to vary much, the names and abbreviations are of course different.
Errors
It is not uncommon for recipes published in books and magazines and on the web to contain mistakes that can confuse even experienced cooks. This may be caused by many factors, including writing or typing up the recipe in a hurry; changing it while you experiment, and forgetting to update it; failing to get it edited professionally before publishing; working from illegible or out-of-date sources; misunderstanding a translation; or not testing the recipe — and doubtless many others including plain ordinary typographic errors.
Errors in recipes are an annoyance to readers when a dish fails; they are an embarrassment to their authors; they are damaging to the reputation of the publishers; and occasionally they can be the cause of serious financial loss, if a book has to be withdrawn because of them. It is therefore in everyone’s interests that recipes be as correct as possible. This research is an attempt to see if markup can contribute to a solution.
Complete omission of an ingredient (both from the list and from the method) is an editorial and testing problem, easily fixed online but not in print [Cloake 2011]. This class of error is not susceptible to treatment in software as the relevant data is by definition entirely absent in the first place, so there is nothing for a program to do anything with.
Cloake (2011) also quotes an example of the much more common problem of omitting the ingredient in one place but not the other:
Nigella’s Feast […] contains a recipe for a chocolate orange cake that includes a direction to ‘cream together the butter and sugar’ — which would come as a nasty surprise to the prospective baker, given no butter is mentioned in the ingredients. (When chocolatier Paul A Young tried both versions, he concluded the butter was a red herring — the cake turns out much better without it.)
Mismatched quantities can also confuse the cook. Jacob (2016) describes an error where the list of ingredients specified four cups (of shredded sharp cheddar cheese), but the method only used half a cup.
In an earlier article, Jacob (2010) identified seven classes of error (14 if we include a later list of seven more). Most of these are editorial problems which are important but out of scope for this research. The key concerns here are (using [Jacob 2010]’s original numbering for the first and second lists):
-
Ingredients out of order (1/1)
-
Missing ingredient (1/2)
-
Wrong amounts (1/3)
-
Making every step a separate number (2/6)
In item 2, [Jacob 2010] groups together the errors of omission and of commission (a listed ingredient which does not get used; and a step referring to an ingredient that is not listed), but we would argue that these are technically two separate classes of error.
Testing is probably the most essential part of recipe development, but for this very reason, each cycle of testing means changes to the recipe. Hart, in an article on writing cookery books, emphasises that while there are things a good editor will catch, it’s up to the cook/author to get it right to start with [Hart 2012].
An additional class of error is the inconsistent use of names, that is, using a different name for an ingredient in the List of Ingredients to the one used in the Method. This can occur where different cultures name things differently, and either lack of editorial oversight or authorial absent-mindedness results in both names being used for the same things in different places (‘spring onions’ and ‘scallions’ is one example that might need explaining out of its cultural context).
These problems are not new. Burros (1997) was blunter about it:
The prevalence of errors in cookbooks is the publishing world’s dirty little secret. The problem is likely to get worse as an industry mired in economic doldrums resorts to cost-cutting, practically guaranteeing less editing and testing before publication.
The publishing industry has indeed continued to get worse, and it is now a rare publisher who can offer to copyedit and proofread a manuscript, and the online publishing business has regrettably mirrored the worst practices of its print forebears. Burros (1997) goes on to explain the division of blame between publisher and author, both of whom feel the other could do more, and concludes that
[i]t is a haphazard system — further complicated by typesetting errors and editing that too often fails to eliminate confusion.
Elsewhere she refers to human error or computer
gremlins
, which is where the present research comes
in.
Scope
With rare exceptions, published recipes nowadays are either on the web, which means HTML in one form or another; or in print, which means typesetting to PDF. The source format for new recipes is likely to be a Microsoft Word file, or a blog entry (perhaps Markdown), or an email message, or possibly still a typescript or manuscript. They may be original to an author (even though something similar may have existed for centuries elsewhere, unknown to the author), or they may have been copied or converted in many ways from recipes passed between friends and family, and they may of course also have been pirated: copyright notwithstanding, photocopies of recipes from magazines and books are legion, and it is not hard to do an OCR from a scan.
The scope for errors is enormous: the author’s own
experience includes an edit of a typescript which listed half a
pint of milk, originally typed (on a typewriter, from a
handwritten recipe) as
.
This became 1/2 pt milk
1 or 2 pints milk
in the editing (by
someone unfamiliar with the lack of a ½ sign on an old
typewriter), but it was corrected at proofing stage to
½ pt milk
— and then the defective software
used by the typesetter could only manage □ pt
milk
.
Recipe management software is available industrially, but
tends to focus on very large volume production in the food
industry and the automation of mixing and cooking equipment.
However, a Belgian software company,
youmeal.io
, produces kitchen-oriented
food analysis products for the catering and restaurant industry,
and emphasises that using correct food data is of primary
importance. They quote a study of their own claiming that
50% of technical sheets for compound products were
incomplete or incorrect.
A software solution to at least some of the problems above was considered to be potentially of use to the cookery author, editor, or publisher, as well as to cooks who wants to write up their own recipes in a way that will pass the test of time — but many other problems will continue to rely on humans for a solution. (Historical recipes are interesting for the lack of detail as well as for the actual food: some of them read like recipes from a professional cook’s manual such as Le Répertoire de la Cuisine [Saulnier 1982] where for brevity the reader is assumed already to know everything from experience; others are virtually unusable because not all the relevant ingredients are mentioned, so expert guesswork is needed.)
From the errors discussed earlier, a candidate list of topics emerged, based on susceptibility to solution by software:
-
Ingredient referred to in method was never listed
-
Ingredient listed was never referred to in method
-
Ingredients out of order
-
Bogus quantities (eg too big or too small)
-
Mismatched quantities (different between the list of ingredients and the step of the method)
-
Inconsistent naming of ingredient between list and step
-
Steps too small (ie too many of them)
Of these, the control on bogus quantities was seen as unimplementable without a data history and suitable limits, which places it outside the scope of this experiment. The step size problem is also not easily susceptible to machine judgement. Both these classes were therefore dropped at this stage
Schematron was suggested by two reviewers, and could be used to calculate ‘reasonable’ measurements and highlight deviations, as well as to identify ingredient item conflicts, but in the time available this was not possible.
The objective, therefore, was to see if adding markup to the ingredients and steps could be used at or before the rendering stage to limit the remaining classes of error without creating too much work for the author or editor.
It was seen as important for potential solutions that they could be implemented in any programming language, and the data could be stored in a number of different ways, so while this implementation is in XML and XSLT, the data structure (50 lines) and the code (600 lines) are both small and should be easy to reimplement. The choice of XML was based on a number of considerations: ; a) many publishers already use XML as part of their workflow; b) it is commonplace in web systems; and c) a recipe is essentially narrative text (still), even if it is presented in the form of two lists, and XML was designed for dealing with mixed content (plain text mixed with special meanings). XML editing software also has controls which can be used on elements and attributes and references to them, early in the workflow, as well as at the point where output is created.
Taking this as a starting-point, some common
XML markup features could readily be seen as
having potential use: for example the built-in
ID
/IDREF
checks could be used to test for
the presence or absence of ingredients in the steps and
vice versa; and enumerated (token
list) attributes could be used to represent the options for
different categories of ingredients. This would improve
the accuracy of reproducing the textual form of the ingredients;
allow for finer-grained checking; and enable indexing for book
publication and for online searching.
During initial development it became apparent that a sufficiently accurate categorisation of the ingredient metadata could provide a solution to error class 6 by [re]generating the textual form of each ingredient programmatically from the categorised data.
Implementation
The implementation proceeded in two phases: developing and
testing the ID
/IDREF
mechanism, used for
error classes 1, 2, 3, and 5 in the list in §5, and
developing the categorisation for ingredients, used in
class 6.
Identity checks
Some initial tests showed that detecting the use of an
ingredient was trivial. Given a schema that makes
xml:id
a REQUIRED
attribute on an
ingredient element, a conditional using an XPath statement
such as count(idref(@xml:id))=0
is sufficient to
determine if the ingredient is not referenced anywhere else in
the recipe. Note that at this level, it does not control for
where such a reference ought to occur,
nor whether it would be meaningful in context: those are still
tasks for a human editor or proofreader.
The reverse is simpler and even less controlled: if the
references from the steps to ingredients are done using an
element with an IDREF
attribute, then standard
validation techniques will throw an error on any such
references that have no matching ID
, even before
regular processing starts.
As a first stage, therefore, we can use two declarations, one for the ingredients and one for references to them:
<!ELEMENT ingredient (#PCDATA)> <!ATTLIST ingredient xml:id ID #REQUIRED> ... <!ELEMENT ing EMPTY> <!ATTLIST ing i IDREFS #REQUIRED>
The first element would occur as part of the content model for the list of ingredients, and the second element would be valid in mixed content in the steps of the method, as the reference to the ingredient[s] being used. In fact, if this system is to be implemented in an existing schema/DTD (as opposed to the nonce schema used for testing), only the attributes are required: the names of the element types could be anything.
Consistency
The ID
/IDREF
link used in section “Identity checks” can also be used to reproduce the name of
the ingredient at the point of reference, instead of requiring
it to be entered manually during composition, unless some special
wording is required. In effect, if we write
<ingredients> <ingredient xml:id="flour">brown flour</ingredient> <ingredient xml:id="sugar">muscovado sugar</ingredient> </ingredients> ... <method> ... <step>Add the <ing i="flour sugar"/> and mix well.</step> </method>
it is straightforward to write code which will produce
3. Add the brown flour and muscovado sugar and mix well.
This makes use of the binding between ingredient and mention which addressed the missing ingredients problem. However, merely reproducing the name of the linked ingredient does not solve the problem of the wrong ingredient being accidentally referenced, an in many cases the full name is not required (eg just ‘flour’ and ‘sugar’ are enough). Proofreading and recipe-testing are still important to prevent this.
Order
A test for the order or sequence of ingredients could be
encoded into the handling of the mixed-content element type
(ing
in the example in section “Identity checks”),
but in order to take account of potential previous references
to the same ingredient, which encumbers the coding, it is
preferable to do this at another stage, for example in the
handling of the container of the steps of the Method.
For each grouped unique occurrence of descendant
ID
values (that is, in the steps of the Method),
the position within the Method is compared with the position
of the matching ingredient in the List of
Ingredients.[2]
This means (using the example in section “Consistency”) that
<step>Add the <ing i="sugar flour"/> and mix well</step>
would throw an error because the flour is listed as an earlier ingredient than the sugar.
While it is conventional to list the ingredients in order of their mention, it is by no means universal; but where ingredients are grouped (for example into component parts of the recipe), then there are usually also multiple matching Method steps, and within them the rule of order-of-mention appears to be observed.
Categorisation
It became apparent that the disaggregation of the
ingredient data could lead to the generation of the
human-readable ingredient items both in the List of
Ingredients and in the mentions in the Method. There is a
formality here too, in the way in which ingredients are
expressed, and there are conventions which vary by culture. It
is possible to say 100 g walnuts, chopped fine
as well as 100 g finely-chopped walnuts
: both
mean the same thing, although in English there is an implicit
presumption in the first form that you take whole walnuts and
chop them fine yourself; and in the second, that you buy the
walnuts ready-chopped. While these variants are largely
stylistic, published collections of recipes try to standardise
on one way of saying things in order not to confuse the
readers, especially if they are likely to be beginners and
unfamiliar with the conventions.
It therefore became an additional task to equip the system with the ability to store the ingredient data as separate identities for units, quantities, different classes of foodstuffs, qualities, treatments, etc, so that the ingredients list could be generated in an acceptable format, especially across many recipes following a pattern. A side-benefit is that it could also result in the consistent use of names between ingredients and method. The categorisation of the ingredients required considerably more work, and remains open to much discussion.
Many categorisations or classifications are based on
nutrition or source, both of which would require specialist
knowledge to enter as data. Wikipedia suggests Dairy, Fruits,
Grains/Beans/Legumes, Meat, Confections, Vegetables, and Water
[Northamerica1000 2020], based largely on work
by Nestlé (2013), which
is closer to how a cook would think of ingredients. Bearing in
mind that a categorisation for this purpose needs to be useful
for decision-making (Is this recipe vegetarian
,
Is there alcohol in this recipe
, Does it
contains nuts?
), a few changes were made to this
scheme:
-
the Meat category was split into Meat and Fish (to cover seafood)
-
Nuts were separated out from other Vegetable materials, as was Pasta
-
Confections was ignored as a separate category (sugar is subsumed under Spices)
-
store-cupboard ingredients were given their own category of Basic (although there could be much dispute over what one person has in this category compared with another person)
Five additional categories were Herbs; Spices; Alcohol; Toppings, which covers edible decoration; and Prep, intended for ready-prepared ingredients usually bought pre-packaged.
This leaves unsolved some problems of categorisation which are not dealt with elsewhere because traditional food classifications omit items such as chocolate (technically a ready-prepared item, although humorists would have it a food group in is own right). In the current settings, chocolate is a store-cupboard item but chocolate-chips are a topping.
Markup
The current system provides for the following attributes
on the ingredient
element:
-
@xml:id
, uniqueID
for the ingredient -
@quantity
, a number, possibly including a decimal fraction (but restricted to the half, quarters, eighths, thirds, and fifths, as these can be represented in text with existing Unicode fractions) -
@unit
, a list of standardised abbreviations (dl, dsp, fl.oz, g, Kg, lb, l, ml, oz, pt, tbsp, tsp) plus common measures such as cup, can, dash, drop, handful, etc -
@unit-weight
, text for describing a standard size of one of the common measures, like a 400 g can -
@container
, text for the name of the container of the@unit-weight
-
@size
, a list of adjectives, eg large, medium, small, etc -
@colour
, a colour name used for description, likered apple
-
@quality
, any adjective describing a pre-existing condition, eg dry, smooth, unsalted, etc (not a@treatment
, see below) -
Items (the material ingredients) — these are mutually exclusive (with the exception of
@part
):-
@meat
, a list of meats, eg beef, chicken, lamb, pork, etc -
@fish
, a list of seafood, eg salmon, hake, prawn, lobster, etc -
@part
, a list of body parts or products, eg breast, kidney, wing, egg, seed, etc -
@dairy
, a list of dairy products, eg milk, cheese, cream, yoghurt, etc -
@fruit
, a list of fruits -
@alcohol
, a list of drinks -
@herb
, a list of herbs -
@vegetable
, a list of vegetables -
@nuts
, a list of nuts -
@pasta
, a list of types of pasta, noodles, etc -
@spice
, a list of spices -
@basic
, a list of common store-cupboard ingredients, eg flour, oil, yeast, etc -
@toppings
, a list of edible decorative items, eg Streusel -
@prep
, text for any class of ready-prepared ingredient
-
-
@treatment
, an adjective such as chopped, ground, melted, etc (something done to the foodstuff) -
@note
, a digit, for use in referring to footnotes (deprecated) -
@comment
, any text -
@symbol
, a symbol or emoji, provision for bullet labelling -
@alt
, text describing an alternative for substitution if the exact foodstuff is not available -
@status
, an enumerated listoptional
orrequired
, so that optional ingredients can be identified
These are used to describe the foodstuff in a way that avoids the need for extensive typing in most cases, as the enumerated list values can be selected from a menu. It was regarded as important that the actual names of items should not be subject to typing errors on each occasion of entry.
<ingredients> <ingredient xml:id="avo" quantity="4" size="large" quality="very ripe" treatment="chopped fine" vegetable="avocado"/> <ingredient xml:id="toms" quantity="2" size="medium" treatment="chopped just as fine" vegetable="tomato"/> <ingredient xml:id="oil" quantity="1" size="hefty" unit="dash" note="1" quality="pimento" basic="oil"/> <ingredient xml:id="lj" quantity="2" unit="tsp" fruit="lemon" part="juice"/> <ingredient xml:id="garlic" quantity="1" unit="clove" size="fat" vegetable="garlic"/> <ingredient xml:id="ff" quantity="2–4" unit="fl.oz" dairy="fromage-frais" comment="or double [heavy] cream if not on a diet" alt="Sour cream is also good here"/> <ingredient xml:id="salt" spice="salt"/> <ingredient xml:id="pep" spice="pepper"/> </ingredients>
A set of rules was developed in XSLT which implements the grammatical precedence of the attribute descriptive values (described below). This results in a list such as:
4 large very ripe avocados, chopped fine
2 medium tomatoes, chopped just as fine
1 hefty dash pimento oil¹
2 tsp lemon juice
1 fat clove garlic
2–4 fl.oz fromage frais (or double [heavy] cream if not on a diet). Sour cream is also good here.
Footnotes in ingredient lists are extremely rare and largely inadvisable, so they are not provided for; the one in this example was implemented manually.
In tests, all the classes of ingredient could be represented without the need for character data content. However, much more extensive testing would be needed to ensure the coverage of the enumerated lists, and to tighten up the rules on how the wording is generated.
The lists mentioned in the attributes are plain text
files, one value per line, ending in a vertical bar (the
standard delimiter for enumerated attributes), so for
example the test file meat.list
currently says:
beef| chicken| duck| ham| lamb| pork| turkey|
As they are plain text files, they can be customised to
the author’s desire, and can be as long or as short as
needed provided they follow the rules for enumerated list
items (compounds need a hyphen, not a space, like
fromage-frais
; this is removed in the
XSLT on output), so there is no limit on the
number of items or their order (alphabetic order was used
purely for convenience) and they don’t need to be one per
line: any additional spacing is entirely optional.
Rules using categorization
From inspection of existing recipes, it was possible to come up with a first conjecture on the order and precedence for expressing the ingredients in natural language, using the data in the attributes. Such a mechanism would require a much larger amount of data than was available for the rigorous regression testing needed before it could be widely used, but the current rules appear to work acceptably in many circumstances.
Quantity |
This always comes first, except where it is
implicit ( |
Size |
Size is used as a prefix to the unit when the unit is common (eg large handful) |
Unit weight |
This is used when the quantity refers to an ingredient that comes supplied in a measured container, like a 400 g can of tomatoes. If it follows a numeric quantity, it gets a multiplication delimiter (×) |
Container |
This is only meaningful when |
Unit |
Unit follows quantity (but may have been prefixed by size and unit weight). Common units are pluralised if the quantity is more than one or is non-numeric (intervention: ‘dash’ requires an ‘e’) |
Size |
When the unit is standardised or absent, it is applied to the ingredient, not the unit (eg medium eggs) |
Quality |
This is a predetermined feature of the ingredient
like |
Colour |
Any colour; accepted as-is |
Treatment |
The actions |
Ingredient |
There are currently ten groups as described
earlier. These are based on observation, and are
largely pragmatic or conjectural:
; a) alcohol; b) basic (ie store-cupboard items); c) dairy; d) fruit; e) herb; f) meat; g) pasta; h) spice; i) toppings (decorative sprinkles); and j) vegetable. Order is not significant, as they must
be mutually exclusive for any given ingredient. The
lists can be tailored ad
infinitum. If a value contains a
hyphen, replace it with a space. This enables the use
of hyphenated compounds like baking-powder, and
two-word names like soy-sauce (the case where
retention of the hyphen is needed is unresolved).
Pluralisation of ingredients is a little more tricky
than for quantities: if the quantity is more than one,
or it is non-numeric, or the unit is a standardised
unit (excluding tsp, tbsp, and dsp),
and the ingredient is not among
the values for meat, dairy, spice, pasta, basic, or
herb (excluding spinach, seed, rice, and garlic), then
pluralise it, adding an |
Part |
If the ingredient is a part of a greater whole, like a flower, seedpod, kidney, skin, or egg, use it as-is, and pluralise it if the quantity is more than one or the unit is lb or Kg. |
Treatment |
The remaining actions (ie not
|
Alternative ingredients, if any, are added verbatim in parentheses; footnote marks are added if given; the [optional] indicator is added if required, and any comments are added in another set of parentheses.
At the time of writing, smaller, experimental, changes
are being made, principally to accommodate syntactic needs
revealed as more recipes are encoded. Two of the more common
are the selective elision of adjectival @part
and @colour
values in references, where
only the substantive is required; and the need for grouping,
as in ‘add the spices’, which at the moment
will cause omission of the order and reference tests.
Handling of conflicts
In examining the syntax of ingredient description compared with those of references in the method, it was clear that there were places where additional information was needed in the references, for example to distinguish between two or more sugars, or group them together or to highlight the fact that an ingredient needed to be referred to by more than just name at this stage.
As a palliative measure, a @mod
attribute was added to the
ing
element type. This is an enumerated attribute
whose values are the names of all the control attributes on
the ingredient
element type; that is, all the
descriptive ones but not the actual food-item attributes: @quantity
, @unit
, @unit-weight
, @container
@size
, @colour
, @quality
, @treatment
.
Using this on the example in Appendix A, we could write
<ing i="sugar" mod="quality"/>
which would result in dark
brown sugar
. This does not solve the problem of
(hopefully edge) cases where identifying an ingredient
accurately would need more than one such qualifier.
A related requirement is to disambiguate multiple related
ingredients, such as all-purpose flour and whole-wheat flour.
Currently, the XSLT code checks for the
existence of one or more other ingredients with the same item
name, and checks if they all have at least one of the control
attributes in common (set to different values, like @quality
). If so, the attribute value
is used as a prefix on the items to make the reference.
Results and conclusions
Testing
The testing of ingredient and reference co-presence was
shown to be trivial using the ID
/IDREF
mechanism in XML, which covers error
classes 1 and 2.
The testing of ingredient order for error class 3 was not as trivial, but relatively straightforward to implement in XSLT. No attempt was made to implement any other order, such as quantity or semantic relevance.
The potential mismatch in quantities between ingredient list and step (error class 5) was not tested: in the sample recipes used, there were no occurrences of partial quantities being used in one step, with the remainder used in another. There were indeed recipes using a single ingredient type in two or more places, but in those cases the quantities were given as separate ingredient items. An aggregate quantity test is needed where an ingredient is divided (a practice decried by Jacob (2010)).
The naming (and regeneration of names) was by far the most
complex matter. The reconstruction of ingredient listings from
the disaggregated data is non-trivial, and a comprehensive
solution would involve extension of the current system well
into the future in order to handle the infinite number of ways
that recipe authors will have of expressing themselves.
However, for practical purposes, it appears that
(unquantified) most
recipes can be represented
accurately, in the sense that the need to add new ingredient
items to the lists diminished rapidly as testing proceeded.
The current system appears to handle correctly the generation
of items for the list of ingredients and their matching
references in the method (error class 6), but it is in no way
comprehensive and needs much more testing with a greater range
of ingredients.[3]
There was considerable conflict over the assignment of a few items to lists: should garlic be under vegetables or spices? Are beans a sufficiently large class to warrant their own list? Are nuts? It is simple enough to edit the files and change the classes, but some agreed standard would make it more useful.
Benefits and drawbacks
The benefits of a system checking these errors would include greater reliability, accuracy, and consistency; three things that publishers insist on from their contributors, whatever about the utility to personal web recipe sites.
Identifying the ingredient data in a form a computer can manage also has a benefit separate from these quality control aspects: it might make that hoary old chestnut ‘recipe search’ actually work for once, both in the sense of locating a recipe using specific ingredients, distinct from whatever the title says, as well as in the sense of letting cooks find out exactly what they can make with the ingredients in the quantities on hand.
I leave to others the dubious usefulness of having your recipe selection trigger your fridge into ordering the missing ingredients. While it is perfectly possible, the effort in maintaining the metadata after every midnight snack is probably not worth the candle.
The most obvious drawback in the system as it currently stands is that implementing it requires some form of programming in a target system. Cooks, and cookery authors and contributors, are not part of the target market for XML systems: although implementation in an XML editor should be straightforward, they are not going to buy an editor for recipes, and they won’t be using Emacs.
Commonplace editors like Microsoft Word can certainly be coerced into providing prompted or drop-down categorisation, although embedding the error-checking logic currently implemented in XSLT would require more effort. Web-based systems running Javascript are perhaps more likely targets, as would be Wordpress plugins. Unless someone makes me an offer I can’t refuse, the current code will be released under a suitable public licence later in the year.
Conclusions
In general, this work satisfied the requirements and demonstrated that a limited amount of data checking can eliminate (or at least, signal) five of the seven classes of errors described.
However, the need to have an authorial or editorial interface written to handle data input (encoding) accurately means that wider implementation would need to rely on demand, unless there is sufficient interest in a collaborative, possibly open-source, implementation.
Encoding would still remain a time-consuming operation, even with sophisticated software, because of the need to apply domain expertise, which in turn would require relatively experienced users (cooks, collectors, publishers). Given the fairly strict formatting of published recipes, however, it might be possible to write a semantic and syntactic filter to identify at least quantity, units, and name from published recipes. This has not been investigated in the current iteration.
The work on the category lists confirms the well-known principle that data should be stored at the lowest practicable level of disaggregation because it can always be aggregated for implementation, whereas data stored aggregated can never be broken back down into its components. It also confirms the long-held, if anecdotal, belief in systems design that time spent planning the data model shortens the overall development time: if the data model is right (that is, it matches reality), most requirements tend to click into place; if the data model is wrong, the entire project may be irretrievably damaged from the start.
However, the corollary is that if you do get the data model right, you will still need to front-load enough data for it to be workable as a model before you start to develop it into a full system. In the current circumstances, nowhere near enough recipes have been tested, so the front-loading is a potential point of failure, and for this reason the current system remains experimental and open to more widespread testing and updating.
Appendix A. Worked example
This is an example of a partly-edited recipe from the author’s collection, with unresolved issues (at the time):
<!DOCTYPE recipe SYSTEM "recipe.dtd"> <recipe id="cashewscones"> <nav/> <info> <title>Butterscotch and Cashew Drop-scones</title> <author>Anon</author> <copyright year="2019" web="https://www.teatimemagazine.com/" contrib="Ann Marie O’Connell">Tea Time Magazine</copyright> </info> <intro> <para>Anna mentioned this online and I asked her for the recipe. The original was from Tea Time Magazine (Jan/Feb 2019, but is not in their archive¹). She notes that it works fine with all white whole-wheat flour, and she also added large-crystal raw sugar as a topping, instead of an egg glaze, because of the additional caramel notes.</para> </intro> <ingredients> <ingredient xml:id="plainflour" quantity="1.5" unit="cup" quality="all-purpose" basic="flour"/> <ingredient xml:id="wwflour" quantity=".5" unit="cup" quality="whole-wheat" basic="flour"/> <ingredient xml:id="sugar" quantity="0.333" unit="cup" quality="dark brown" treatment="packed" spice="sugar"/> <ingredient xml:id="bp" quantity="1" unit="tbsp" basic="baking-powder"/> <ingredient xml:id="salt" quantity=".5" unit="tsp" spice="salt" comment="use ¼ tsp if the cashews are already salted"/> <ingredient xml:id="butter" quantity=".5" unit="cup" quality="unsalted" dairy="butter" treatment="chilled and diced"/> <ingredient xml:id="chips" quantity=".5" unit="cup" treatment="slightly heaping" topping="butterscotch-chips" alt="any preferred chips"/> <ingredient xml:id="cashews" quantity=".5" unit="cup" quality="toasted" treatment="slightly heaping" vegetable="cashew"/> <ingredient xml:id="cream" quantity=".5" unit="cup" quality="heavy" dairy="cream"/> <ingredient xml:id="egg" quantity="1" size="large" treatment="beaten" part="egg"/> </ingredients> <method> <step> <para>Preheat oven to 400°F.</para> </step> <step> <para>Combine together <ing i="plainflour wwflour sugar bp salt"/> in medium bowl.</para> </step> <step> <para>Add the <ing i="butter"/>; using fingertips, rub to form coarse meal.</para> </step> <step> <para>In separate bowl, whisk the <ing i="milk"/> and the <ing i="egg"/>.</para> </step> <step> <para>Gradually add the <ing i="milk egg"/> mix to the flour mixture, keeping back 1 tsp of the egg mix to use for glazing.</para> </step> <step> <para>Toss or knead it to thoroughly moisten it and form a clumpy dough (add more milk if too dry).</para> </step> <step> <para>Mix in the <ing i="chips"/>.</para> </step> <step> <para>Drop the dough by ¼ cupfuls onto a nonstick or lightly greased baking sheet at least 1 inch apart, to give 8–10 drop-scones. (You can line a regular pan with aluminum foil instead of greasing it.)</para> </step> <step> <para>Brush the remaining <ing i="egg"/> on top as a glaze.</para> </step> <step> <para>Bake for about 20 minutes or until golden brown.</para> </step> </method> <para>You can also use a mini-scone baking pan, like the Nordic Ware cast-aluminum one, which gives you 16 triangular scones.</para> <para>If you use the “freeze the portioned dough” technique, they will need to bake 3–5 minutes longer.</para> <para>¹ Possibly because the ingredients didn’t match the method in several places.</para> </recipe>
Running the current XSLT code produces the following log:
Processing cashewscones.xml using xml2html.xsl to cashewscones.html Using parameters 8. Unused ingredient "slightly heaping ½ cup toasted cashews" 9. Unused ingredient "½ cup heavy cream" Checking 1. @plainflour Checking 2. @wwflour Checking 3. @sugar Checking 4. @bp Checking 5. @salt Checking 6. @butter Checking 7. @milk Ingredient "" (milk) is listed 1st but mentioned 7th Checking 8. @egg Ingredient "1 large egg, beaten" (egg) is listed 10th but mentioned 8th Checking 9. @chips Ingredient "slightly heaping ½ cup butterscotch chips (or any preferred chips)" (chips) is listed 7th but mentioned 9th 4. No ingredient matching ID "milk" 5. No ingredient matching ID "milk"
The amended and functional recipe is available on the author’s web site at http://xml.silmaril.ie/recipes/cashewscones.html.
References
[Acton 1845] Acton, Eliza (1845) Modern Cookery for Private Families. Longman, London, 644pp.
[Anon 2016] Anon (2016) ‘Recipes’. In Archaeology May/June 2016 May 2016, Archaeological Institute of America, Palm Coast, FL.
[Vehling 1936] Vehling, Joseph Dommers (1936) Cookery and Dining in Imperial Rome. Walter M Hill, Chicago, IL, 301pp.
[Beeton 1861] Beeton, Isabella (1861) [Mrs] Beeton’s Book of Household Management. S.O. Beeton Publishing, London, 1112pp.
[Burros 1997] Burros, Marian (1997) ‘Cookbook Follies’. In New York Times September 1997.
[Cloake 2011] Cloake, Felicity (2011) ‘Cookbook errors’. In The Guardian September 2011.
[Farmer 1896] Farmer, Fannie Merritt (1896) The Boston cooking-school cook book. Little, Brown, & Co, Boston, MA, 620pp. URI:https://d.lib.msu.edu/fa/8#page/2/mode/2up (retrieved 7 February 2020).
[Freeling 1972] Freeling, Nicolas (1972) The Cook Book. Hamish Hamilton, London, 154pp. ISBN:0879238623.
[Hart 2012] Hart, Alice (2012) ‘How to write your first cookbook’. In The Guardian July 2012.
[Sitwell 2012] Sitwell, William (2012) ‘A history of cookbooks’. In The Bookseller June 2012, Bookseller Media Ltd, London.
[Jacob 2010] Jacob, Dianne (2010) 7 Most Common Recipe Writing Errors. Author’s web site, Oakland, CA. URI:https://diannej.com/2010/7-most-common-recipe-writing-errors/ (retrieved 14 December 2019).
[Jacob 2016] Jacob, Dianne (2016) When a Reader Found a Cookbook Error. Author’s web site, Oakland, CA. URI:https://diannej.com/2016/reader-finds-cookbook-recipe-error/ (retrieved 18 December 2019).
[Knauf 2017] Knauf, Torsten (2017) Definition der TEI-basierten culinary editions Markup Language (cueML), Bewertung von Verfahren für die automatische Extraktion von Zutatenlisten aus Rezepten und die Auszeichnung des Praktischen Kochbuchs für die gewöhnliche und feinere Küche von Henriette Davidis (1849). URI:https://shaman-apprentice.github.io/MyMasterThesis/ (retrieved 11 February 2020).
[Klug 2017] Klug, Helmut (2017) ‘Cooking Recipes of the Middle Ages’. URI:https://static.uni-graz.at/fileadmin/gewi-zentren/Informationsmodellierung/PDF/Laurioux__Klug_-_Scientific_Proposal_ANR-FWF_-_full.pdf (retrieved 11 February 2020).
[Masters 2013] Masters, Kristin (2013) ‘The Incredible Treasures of Manuscript Cookbooks’. In ILAB July 2013, International League of Antiquarian Booksellers, Geneva.
[Nestlé 2013] Nestlé, Marion (2013) Food Politics. University of California Press, Berkeley, CA. ISBN:9780520275966.
[Saulnier 1982] Saulnier, Louis (1982) Le Répertoire de la Cuisine. Leon Jaeggi & Sons Ltd, Ashford, UK, 239pp. ASIN:B00I637XDK.
[Shane 2020] Shane, Janelle C (2020) AI recipes are bad (and a proposal for making them worse). AI Weirdness, Lafayette, CO. URI:https://aiweirdness.com/post/190569291992/ai-recipes-are-bad-and-a-proposal-for-making-them (retrieved 9 February 2020).
[Shane 2020] Shane, Janelle C (2020) AI + Vintage American cooking: a combination that cannot be unseen. AI Weirdness, Lafayette, CO. URI:https://aiweirdness.com/post/190721709472/ai-vintage-american-cooking-a-combination-that (retrieved 8 February 2020).
[Northamerica1000 2020] Northamerica1000 (2020) Food group. Wikipedia, The Free Encyclopedia, San Francisco, CA. URI:https://en.wikipedia.org/w/index.php?title=Food_group&oldid=939771878 (retrieved 19 February 2020).
[1] Freeling’s The Cook Book is possibly one of the last from a modern author in Europe to use the narrative style throughout [Freeling 1972].
[2] My thanks to Michael Kay for his suggestions on how to achieve this most efficiently.
[3] As an edge case, the system was tested with a few AI-generated recipes courtesy of [Shane 2020][Shane 2020] where a neural net created recipes without reference to feasibility or edibility (and much else!). However, having coded them to the above standard, they tested correctly, all the errors being picked up.