How to cite this paper
Durusau, Patrick, and Sam Hunting. “Spreadsheets - 90+ million End User Programmers With No Comment Tracking or Version
Control.” Presented at Balisage: The Markup Conference 2015, Washington, DC, August 11 - 14, 2015. In Proceedings of Balisage: The Markup Conference 2015. Balisage Series on Markup Technologies, vol. 15 (2015). https://doi.org/10.4242/BalisageVol15.Durusau01.
Balisage: The Markup Conference 2015
August 11 - 14, 2015
Balisage Paper: Spreadsheets - 90+ million End User Programmers With No Comment Tracking or Version
Control
Patrick Durusau
Sam Hunting
Independent Consultant; blogger; gardener.
Long-time contributor to the topic map standards process and topic maps fan.
Copyright © 2015 by the authors. Used with permission.
Abstract
Stephen Gandel's Damn Excel! How the 'most important software application of all time' is ruining the
world is NOT an indictment of Excel. It is an indictment of the inability to:
-
link spreadsheets to emails
-
track relationships between cells and between cells and formulas
-
track comments on spreadsheet errors
-
have versioning for cell contents/formulas
Without those capabilities, spreadsheets are dangerous to their authors and others.
How dangerous you ask? A short list of horror stories would include: 2013, the "London
Whale," JPMorgan Chase, lost £250 million; 2013, error in calculation of international
Government debt to GDP ratios; 2012, JPMorgan Chase loses $6.2 billion due to a spreadsheet
formula error; 2011, MF Global collapses, in part due to the use of spreadsheets to
track assets and liabilities; 2010, US Federal Reserve, spreadsheet error on calculation
of $4 billion in Consumer Revolving Credit. The EuSpRIG Horror Stories page has a generous sampling of more spreadsheet horror stories.
Statistically speaking, F1F9 estimates:
88% of all spreadsheets have errors in them, while 50% of spreadsheets used by large
companies have material defects, resulting in loss of time and money, damaged reputations,
lost jobs and disrupted careers.
If that weren't bad enough, other research indicates that 33% of all bad business
decisons are traceable to spreadsheet errors. That's right, 33% of all bad business
decisions. That's a business case looking for a solution. Yes?
Disclaimer: Topic maps, even a legend based on the Topic Maps Reference Model, ISO/IEC 13250-5
(2015), cannot magically prevent fraud, stupidity or human error. Topic maps can enable
the modeling of relationships within spreadsheets, create comment tracking on errors/spreadsheets,
even when the comments are in emails, and explore the subject identities and merging
practices required for content level versioing of spreadsheets. Our goal is to empower
you to detect fraud, stupidity and human error.
On the technical side, we will analyze real world spreadsheets, determine subjects
to be represented and how to identify them, create a legend that will constrain the
representatives of subjects (using ordinary XML tools), create a topic map of an actual
spreadsheet and review our results against the known requirements to improve auditing
of spreadsheets. The auditing process itself will be shown to be auditable.
Table of Contents
- The Spreadsheet Use Case
-
- Evidence-Based Requirements
- How To Capture Spreadsheet Semantics
- Interlude on Designing an Assistive System for Spreadsheets
-
- Introduction
- First Requirement: Capture (Not Impose) User Semantics
- Second Requirement: Immediate ROI
- Third Requirement: Simplicity
- Fourth Requirement: Subject Identifier and Subject Locator
- Fifth Requirement: Merging/Sharing, A User Choice
- Sixth Requirement: No Default Data Model, No Merging with Side Effects
- Enron Topic Map?
- Issues and Future Research
- Conclusion
The Spreadsheet Use Case
If the thought of over 90 million end user programmers with no comment tracking or
version control is scary, bear in mind that estimate is for the United States alone.
End user programmers include bankers, lawyers, clerks, mid-level managers, small and
large business owners, people with and without spreadsheet training and others. Given
the universal presence of spreadsheet programs, it is easy to credit claims that spreadsheets
control the spending of trillions of dollars every year. All while being used poorly.
Beyond the spreadsheet horror stories in the abstract, consider that in 2003, a spreadsheet
error at FannieMae make the mortgage guarantor look $1.3 billion more profitable than it actually was.
Gandel 2013 Did that cause the subprime mortgage crisis? Hard to say it was the sole cause but it certainly didn't help.
F1F9 (UK) uses this imagery in a publication on spreadsheet errors:
In case you don't catch the reference see: The Dirty Dozen (movie), which was set in England, just prior to D-Day. (They need a younger advertising firm.)
Evidence-Based Requirements
Horror stories abound about spreadsheets but are too specific to support requirements
for a general solution. Fidelity Magellan Fund overstating capital gains by USD 2.6
billion due to the omission of a minus sign, or JP Morgan Chase's loss of £250 million
by having a model understood by one person, for example. (F1F9 2015) Several spreadsheet corpora have been developed in recent years to support an evidence-based
approach to spreadsheet usage and errors.
Felienne Hermans has played an important role in the development of methods for analysis
of spreadsheets, including adaptation of the notion of code smells to spreadsheet smells. Felienne points out in her dissertation (Hermans Dissertation) that while electronic spreadsheets are recent, the notion of a spreadsheet like
form isn't (which contains a copy-paste error as well):
In case your Babylonian is rusty, Felienne highlights the error in What Archimedes has to say about spreadsheets:
If that still doesn't ring the bell, you can review The Babylonian tablet Plimpton 322 (Bill Casselman, University of British Columbia), Plimpton 322 (David E. Joyce, Clark University), Words and Pictures: New Light on Plimpton 322, Eleanor Robson (MAA publication). The professional literature on Plimpton 322 is
extensive but those resources will get you started.
Felinne has published and spoken widely on spreadsheets and is the leader of the Spreadsheet Lab, which organizes the yearly conference: Software Engineering Methods in Spreadsheets.
Felienne Hermans and Emerson Murphy-Hill (Hermans 2014), extracted spreadsheets and emails that reference them from the Enron data set. A quick overview of this spreadsheet corpus reveals:
Table I
Enron Spreadsheet Summary
Number of spreadsheets analyzed |
15,770 |
Number of spreadsheets with formulas |
9,120 |
Number of worksheets |
79,983 |
Maximum number of worksheets |
175 |
Number of non-empty cells |
97,636.511 |
Average number of non-empty cells per spreadsheet |
6,191 |
Number of formulas |
20,277,835 |
Average of formulas per spreadsheet with formulas |
2,223 |
Number of unique formulas |
913,472 |
Number of unique formulas per spreadsheet with formulas |
100 |
Table II
Summary of Enron Excel Spreadsheet Defects
Erroneous formulas with dependents |
29,324 |
Formulas depending on erroneous cells |
9.6 (on average) |
Maximum number of errors in one file |
83,273 |
Calculation Chain Length of five or longer |
41,367 |
Calculation Chain length greater than seven |
9,471 (seven being the maximum storage of short term memory) |
Table III
Summary of Enron Emails Related to Spreadsheets
Emails with spreadsheets attached |
44,214 |
Emails sending or talking about spreadsheets |
68,970 |
Emails discussing errors |
4,140 |
Emails describing modifications of spreadsheets |
14,084 |
Emails as documentation |
Observed but no count. |
The first requirement that comes to mind is the need to track comments (sender, receiver, subject) per spreadsheet and more particularly to
any portion of the spreadsheet that is the subject of the email. That sounds obvious to you but when Jen Underwood wrote of thirty-four (34) needed
improvements in Excel, Is Excel the Next Killer BI App?, linking at the spreadsheet level to emails wasn't mentioned?
In fact, What is Spreadsheet Risk? examines causes of spreadsheet errors (beyond formula miscalculations) and has this
non-exclusive list:
-
Too many unknown and unorganized spreadsheet files or tabs
-
Poor naming conventions of files and tabs
-
Excessive and undocumented data linkages between files and sheets
-
Poorly designed and presented spreadsheets
-
Spreadsheet evolution (start off simple and become increasingly too complicated)
-
Spreadsheet ownership (ownership changes without the transfer or knowledge)
-
Manual data input (including cell by cell typing and copy & pasting large datasets)
-
Too much data in a spreadsheet (remember the 64,000 row limit in Excel??)
-
No versioning or change tracking
-
No double-checking processes
-
Overly and needlessly complicated formulas
-
Plain old human error
There is no, repeat no intersection between that list of spreadsheet risks and Underwood's
list of needed product improvements for Excel.
Out of those (12) spreadsheet risks, nine (9) of them involve or could be solved by
identifying subjects and their relationships to other subjects:
-
Too many unknown and unorganized spreadsheet files or tabs
-
Poor naming conventions of files and tabs
-
Excessive and undocumented data linkages between files and sheets
-
Poorly designed and presented spreadsheets
-
Spreadsheet evolution (start off simple and become increasingly too complicated)
-
Spreadsheet ownership (ownership changes without the transfer or knowledge)
-
No versioning or change tracking
-
No double-checking processes
-
Overly and needlessly complicated formulas
Those nine (9) spreadsheet risks have a common underlying issue. In order to correct
any or all of those problems, the intended semantics of a spreadsheet would have to
be known. Unfortunately, the semantics of spreadsheets are not discoverable on the
face of the spreadsheets.
In her discussion of Excel errors in the Enron dataset, Hermans points out that semantics
errors are beyond her reach, saying:
It is impossible to determine what cells in the set are erroneous, because
we cannot possibly know what was the intention of the formula. This rules out
finding semantic errors....
Even though as outside observers we can't capture the semantics of spreadsheets, that
doesn't mean no one understands (or thinks they understands) the semantics of spreadsheets.
Perhaps we should ask the users of spreadsheets about their semantics.
User Semantics
Bear in mind that "user of spreadsheet" does not equate to "manager of users of spreadsheets."
Managers have their own semantics, which can be captured, but should not be substituted
for the semantics of their staff who use spreadsheets. Unless you want to repeat the
$170 million Virtual Case File debacle at the FBI. Inappropriate gathering of requirements was only one of the issues,
but an important one.
Fortunately, research does exist on the importance of and techniques for capturing
user semantics for spreadsheets.
How To Capture Spreadsheet Semantics
In a recent summary and extension of work capturing user semantics, Kohlhase, Andrea 2015 writes:
A taxonomy of the errors [6] shows that a significant portion of errors (87%, as
calculated in [4]) are semantic. In this research, a semantic error is one that
is committed when users have a wrong concept that may be correctly or incorrectly
put into practice. These arise from misunderstanding the real-world, wrong translation
of the real-world to the spreadsheet representation, or a misunderstanding of
the spreadsheet’s internal logic [4]. As semantic errors are made on an individual
document base, there is neither hope for a best-practice guide to train avoiding
them nor for a general software update to help out. Semantic errors pose a
more serious threat for wide-impact spreadsheets since more and more individual communication
errors might aggregate over the span of distribution.
It has been proposed [4], [7] that a key reason in committing semantic errors is
a missing higher-level abstraction of the data. Tables, with their grid
framework, expose details and allow manipulation of underlying data. Therefore,
spread-sheets, as a computer-supported realization of tables, turn one’s attention
to data on a micro-level, failing to provide the big picture. Generally,
schematic diagrams or pictures abstract away and integrate the data, presenting
it holistically....
Ouch! Eighty-seven (87%) of spreadsheet errors are semantic! That signals that topic maps
have a role to play in a solution for spreadsheets, but that becomes even clearer
when you read:
Note that crossing a community boundary leaves the entire context - the
circumstances and settings in which a document is created and obtains its
specific meaning - behind. Researchers in the field of Human-Computer Interaction
(HCI) have focused in recent years on the context-of-use of software systems: user
experience issues often only arise in the concrete context in which a product is used.
Our approach for tackling the readability issue of spreadsheets is motivated
by this insight. Therefore we ask: what is the context of a spreadsheet document
and which role does it play for comprehension of spreadsheets? For an answer,
consider the following distinct contexts:
-
the context of the data itself,
-
the information context (implicit knowledge) of the author or the reader,
-
the event context of the author (the intention of the document as communication
tool) or the reader (the expectation towards the usefulness of the document),
-
the effect context (e.g. decision making based on the document).
Note that the clear distinction between authors and readers is only an analytical
one. We are well aware that authors turn into readers after a short while even
for their own documents and, vice versa, that the motivation of readers might consist
in searching for copy-able parts to author their very own spreadsheets. Nevertheless,
the context can be clearly distinguished where wide-impact, local boundary-leaving
spreadsheets are concerned.
Context isn't an unfamiliar idea in topic map circles. In (
Topic Maps Data Model):" the term "scope" was used to convey the context in which a statement was valid.
In relevant part that reads:
5.3.3 Scope
All statements have a scope. The scope represents the context within which a statement is valid. Outside the context represented
by the scope the statement is not known to be valid. Formally, a scope is composed
of a set of topics that together define the context. That is, the statement is known
to be valid only in contexts where all the subjects in the scope apply.
Note on Scope
The statement: "Formally, a scope is composed of a set of topics that together define
the context." is true for topic maps governed by the Topic Maps Data Model. Under Topic Maps Reference Model you can redefine scope to be disjoint if that is appropriate for your use case.
The authors find there are twelve dimensions to be explored for context:
-
Statement
-
Rephrasing
-
Definition
-
By-Example
-
Evaluation
-
Significance
-
Purpose
-
Organization
-
Provenance
-
Formula
-
History
-
Other
The authors ultimately conclude:
In general, the readers missed out on a lot of context dimensions, therefore
making the case for assistance systems for spreadsheet comprehension.
Everyone appears to agree that:
Spreadsheets Woes
-
Spreadsheets have numerous errors
-
Spreadsheets have semantic errors
-
Spreadsheets lack documentation
-
Spreadsheets have complex relationships
-
Spreadsheets have complex semantics
-
Spreadsheets errors can damage enterprises and economies
From the examples and literature, a case has been made for
...assistive systems for spreadsheet comprehension
but questions remain about how to fashion such a system?
Interlude on Designing an Assistive System for Spreadsheets
Introduction
Beyond the specific problems of spreadsheets, there are general principles that should
inform a topic map based solution. The principles discussed below are not particularly
original nor are they the only principles you should take into account in your design
work. Where they prove useful, use them. Where they prove unhelpful, ignore them.
The measure of being useful or not is in the satisfaction of your current customer.
With semantics, what other measure would you use?
First Requirement: Capture (Not Impose) User Semantics
You may have missed these lines in an earlier block quote from Kohlhase, Andrea 2015:
As semantic errors are made on an individual document base [sic], there is neither
hope for a best-practice guide to train avoiding them nor for a general software
update to help out. Semantic errors pose a more serious threat for wide-impact spreadsheets
since more and more individual communication errors might aggregate over the span
of distribution.
If you think about it, selling a uniform means of identification (for universal merging),
or any other aspect of modeling, in an environment defined by diversity, is a demonstration
of madness. If you doubt that claim, consider the long dying process now underway
for the Semantic Web. You have seen the working group products on CSV, as the W3C attempts to find a popular data format. After fifteen years of puff pieces
in Scientific American and elsewhere, billions of dollars spent, there remains no Semantic Web, at least
not as promised.
Second Requirement: Immediate ROI
In addition to the issue of replacing diversity with uniformity, the Semantic Web
also offered delayed gratification, as in long delayed gratification. That was because in large part, the value-add
of the Semantic Web could not be realized until a critical mass of users was obtained.
No critical mass = no value add.
Take as a starting point, a hypothetical topic to represent the column headers in
a spreadsheet.
<topic>
<type>spreadsheet-heading</type>
<spreadsheetoforigin>topicref</spreadsheetoforigin>
<name>name</name>
<column>value</column>
<datatype>topicref</datatype>
<author>topicref</author>
<origin>topicref</origin>
...
</topic>
As each topic is created, the user's work product is of immediate value to the user
and others. For the user, it is documentation of a column header in their spreadsheet.
There could be different or additional properties, represented as elements and their
values under each topic. For the office manager, a collection of topic maps with such
topics provides documentation on column headers across spreadsheets used in their
office.
Third Requirement: Simplicity
Many of us were in the car together when the Topic Maps Data Model declared:
A subject can be anything whatsoever, regardless of whether it exists or has any other
specific characteristics, about which anything whatsoever may be asserted by any means
whatsoever. In particular, it is anything about which the creator of a topic map chooses
to discourse....
A topic is a symbol used within a topic map to represent one, and only one, subject,
in order to allow statements to be made about the subject. A statement is a claim
or assertion about a subject (where the subject may be a topic map construct). Topic
names, variant names, occurrences, and associations are statements, whereas assignments
of identifying locators to topics are not considered statements.
Which quickly leads to the observation: a topic can represent anything but if you
want to talk about an occurrence or an association, you have to create the occurrence
or association first and then reify the occurrence or association with a topic. Say
that quickly five or more times.
A simpler representation would be to create topic elements that enforce the use of
<type> with default values for occurrence or association and which have models for
those subjects.
As a consequence users would have this rule: If you want to talk about a subject of any kind, it MUST be represented by a topic.
Fourth Requirement: Subject Identifier and Subject Locator
One of the more useful ideas that came out of Topic Maps Data Model was the distinction between Internationalized Resource Identifiers (IRIs) as subject identifiers and subject locators. As you know:
A subject identifier is a locator that refers to a subject indicator....
A subject indicator is an information resource that is referred to from a topic map in an attempt to
unambiguously identify the subject represented by a topic to a human being. Any information
resource can become a subject indicator by being referred to as such from within some
topic map, whether or not it was intended by its publisher to be a subject indicator.
A subject locator is a locator that refers to the information resource that is the subject of a topic.
The topic thus represents that particular information resource; i.e., the information
resource is the subject of the topic.
— Topic Maps Data Model 5.3.2 Identifying Subjects [The order of the first two paragraphs is switched in this presentation. Why subject
indicator preceded subject identifier in the original isn't recalled.]
The distinction between subject identifiers and subject locators is too useful to
be limited to IRIs. IRIs were, after all, topic maps effort to be "universal" and
therefore mergeable across the universe of topic maps. As pointed out earlier, we
were all in the car together so that is a statement of fact, not assignment of blame.
Fifth Requirement: Merging/Sharing, A User Choice
As mentioned in Immediate ROI, there should be no default merging rules unless and until you understand the semantics
of your spreadsheet data. Especially across spreadsheets, where you have changed contexts.
Depending upon your role, auditor, end-user, author, etc., you can choose to share
merging rules for your topic map, or not. Or process your topic map to suppress information
it contains that might enable merging, or limit merging in some way.
That isn't to say that some topics may have merging rules or even multiple merging
rules that are applied under different circumstances. It may help to remember that
merging
is the act of clustering topics on some criteria and then processing that cluster
for presentation. Clustering analysis is widely used in exploratory data mining and should have a place in exploring the
semantics of spreadsheets.
Rather than defaulting to presumed set of common goals, the better way to promote
topic maps for spreadsheets is to enable users to choose their own goals.
Sixth Requirement: No Default Data Model, No Merging with Side Effects
Once you forsake a default data model and merging with side-effects, a world of tools
become available for processing your spreadsheet topic map.
The language of choice for spreadsheet topic map legends (at least in XML) should
be RELAX-NG. There are other languages in which subject identity, relationship models and even
merging rules could be written.
RELAX-NG offers a compact syntax, so you have no reason to develop and debug one for your
topic map authoring. In addition to RELAX-NG constraints, additional constraints can be imposed via Schematron which saves you from developing a constraint language as well.
Work by the XQuery Working Group and XSLT Working Group continues to advance processing capabilities for XML.
Enron Topic Map?
When we first started this paper, semantic analysis of the Enron data loomed large.
After reading the current literature on spreadsheet semantics, we discovered it isn't
possible to obtain user semantics in the absence of users. Seems obvious in hindsight
but only in hindsight.
In lieu of a topic map of the semantics of the Enron data, the demonstration portion
of this presentation will include a series of "what if" examples of how speculated
semantics and relationships could be modeled, along with a basic RELAX-NG schema,
to be released at the conference.
Issues and Future Research
There are a number of issues only alluded to or not mentioned at all in this paper
that will make customized topic maps for spreadsheets a successful approach. For exmaple:
Future Research Issues
-
Association types with fixed role models defined by functions in Recalculated Formula (OpenFormula)
-
Exploration practices for Spreadsheets
-
Modeling Complex Formulas as Associations of Associations
-
Detection of topics playing multiple roles in Complex Formulas
-
Modeling Patterns in Spreadsheet Formulas
-
Best Practices for Capturing User Semantics
-
and others
Your assistance with this issues and suggestion of others overlooked is greatly appreciated.
Conclusion
In many ways this has been a difficult paper to write. Very difficult. One of the
primary reasons was fighting the urge to proscribe aspects of how users should model
semantics correctly
at every stage of writing. Every start at a schema would quickly devolve into making
decisions that users should be making, at least if the schema is to reflect their
semantics and not those of your authors.
It has taken six (6) years but I now have a deeper appreciation for Tommie Usdin's
Standards considered Harmful. If you missed Balisage 2009, then you missed the presentation. There is one passage
that captures the essence of the presentation:
I think you probably also want to start thinking about a distinction I talk about
a lot when we’re talking about document modeling: what is true versus what is useful.
When you start looking at a set of documents, you can find a lot of things that are
true about them and that you could identify and spend an awful lot of money on. The
question is how many of those things are useful. It would be possible, for example,
when marking up business documents for a subject retrieval system to identify the
parts of the document on your subject taxonomy and to identify the documents themselves
and the chapters of them and the sections of them and the paragraphs and the sentences
and the words and what was the language of origin of each of the verbs in each of
the sentences. Is this likely to be useful? Is it conceivable that there might be
somebody someplace who would find that useful? Yes. If your goal is to make a corporate
procedure library easily available to the telephone help desk, is it likely that knowing
which word is a verb is going to be helpful? Probably not. So, if you’re supporting
a telephone help desk, maybe you don’t need to get into the linguistic analysis of
the sentences. That’s what I’m talking about: about not supplying, not spending money
to do something that it is possible that some unknown future person might want. Stick
to what you’re supposed to be doing. There may be a text markup standard that specifies
how to mark the parts of speech of each word in documents and their language of origins.
If there is, and knowing or manipulating this information is related to one of your
project goals, then and only then, is that standard relevant to your project.
For our present context, I would rephrase the question to be: "is it useful for understanding
a particular spreadsheet?"
If you ask your next client if they prefer you to capture your semantics for their
spreadsheets, their semantics for their spreadsheets, or the semantics of others for
their spreadsheets, which one do you think they would choose?
References
[Abreu 2014] Abreu, Rui et al. FaultySheet Detective: When Smells Meet Fault Localization http://conferences.computer.org/icsme/2014/papers/6146a625.pdf, doi:https://doi.org/10.1109/ICSME.2014.111
[Asavametha 2013] Asavametha, Atipol, Detecting bad smells in spreadsheets http://hdl.handle.net/1957/30672
[Badame 2012] Badame, Sandro, Danny Dig, Refactoring meets Spreadsheet Formulas http://dig.cs.illinois.edu/papers/refactoringSpreadsheets.pdf, doi:https://doi.org/10.1109/ICSM.2012.6405299
[Barik 2015] Barik, Titus, et al., FUSE: A Reproducible, Extendable, Internet-scale Corpus of Spreadsheets
http://people.engr.ncsu.edu/ermurph3/papers/icse-msr-15.pdf, doi:10.1109/MSR.2015.70
[Clermont 2003] Clermont, Markus, A Scalable Approach to Spreadsheet Visualization http://www.isys.uni-klu.ac.at/PDF/2003-0175-MC.pdf
[Cunha 2012] Cunha, Jácome, et al., Towards a Catalog of Spreadsheet Smells http://haslab.uminho.pt/jacome/files/iccsa12.pdf, doi:https://doi.org/10.1007/978-3-642-31128-4_15
[Cunha 2013] Cunha, Jácome, et al., Spreadsheet Engineering http://dsl2013.math.ubbcluj.ro/files/saraiva_lecture.pdf
[Delft Spreadsheet Lab] Delft Spreadsheet Lab http://spreadsheetlab.org/
[Dou 2014] Dou, Wensheng, Cheung, Shing-Chi, Wei, Jun, Is Spreadsheet Ambiguity Harmful? Detecting
and Repairing Spreadsheet Smells due to Ambiguous Computation (ACM digital library)
http://hdl.handle.net/1783.1/66737, doi:https://doi.org/10.1145/2568225.2568316
[F1F9 2015] Capitalism's Dirty Secret: A Research Report into the Uses and Abuses of Spreadsheets
http://www.f1f9.com/blog/spreadsheets-play-a-critical-role-in-business-decision-making-globally
[Fowler 1999] Fowler, Martin, Refactoring http://martinfowler.com/tags/refactoring.html
[Gandel 2013] Gandel, Stephen, Damn Excel! How the 'most important software application of all time'
is ruining the world http://fortune.com/2013/04/17/damn-excel-how-the-most-important-software-application-of-all-time-is-ruining-the-world/
[Hermans 2011a] Hermans, Felienne, What Archimedes has to say about spreadsheets https://www.youtube.com/watch?v=yda2cm7D_cQ
[Hermans 2011b] Hermans, Felienne, Infotron Analyzer Screencast https://www.youtube.com/watch?v=GkGr9UoebEk
[Hermans 2011c] Hermans, Felienne, Martin Pinzger, Arie van Deursen, Breviz: Visualizing Spreadsheets
using Dataflow Diagrams http://arxiv.org/pdf/1111.6895.pdf
[Hermans 2012] Hermans, Felienne, Detecting and Visualizing Inter-worksheet Smells in Spreadsheets
http://www.slideshare.net/Felienne/detecting-and-visualizing-interworksheet-smells-in-spreadsheets
[Hermans 2013] Hermans, Felienne, Analyzing & visualizing spreadsheets (overview of dissertation)
http://www.slideshare.net/Felienne/an-overview-of-my-phd-research
[Hermans 2013b] Hermans, Felienne, Software Engineering [spreadsheets] http://www.slideshare.net/devnology/slides-felienne-hermans-symposium-ewi
[Hermans 2013c] Hermans, Felienne, Spreadsheets: The Ununderstood Dark Matter Of IT (video) https://www.youtube.com/watch?v=wbiVK6HKHHg
[BumbleBee, blog post] Hermans, Felienne, BumbleBee, a tool for spreadsheet formula transformations (blog)
http://www.felienne.com/archives/2964
[BumbleBee, paper] BumbleBee, a tool for spreadsheet formulat transformations (http://dl.acm.org/citation.cfm?id=2661673)
http://files.figshare.com/1222815/paper.pdf
[Hermans Dissertation] Hermans, Felienne, Analyzing & visualizing spreadsheets (dissertation), doi:10.6084/m9.figshare.658936
[Hermans 2014] Hermans, Felienne and Emerson Murphy-Hill. Enron’s Spreadsheets and Related Emails:
A Dataset and Analysis http://people.engr.ncsu.edu/ermurph3/papers/icse-seip-15.pdf, doi:10.6084/m9.figshare.1222882
[Hermans 2014b(video)] Hermans, Felienne, Spreadsheets for Developers (video) https://www.youtube.com/watch?v=0CKru5d4GPk
[Hermans 2014b] Hermans, Felienne, Spreadsheets for Developers (slides) http://www.slideshare.net/Felienne/spreadsheets-for-developers
[Hermans 2014c] Hermans, Felienne, Spreadsheets are graphs too: Using Neo4j as backend to store spreadsheet
information http://www.slideshare.net/Felienne/felienne-neo-online
[Hermans 2014d] Hermans, Felienne, Improving Spreadsheet Test Practices http://www.slideshare.net/Felienne/improving-spreadsheet-test-practices
[Hermans 2014e] Hermans, Felienne, Testing and Refactoring Spreadsheets http://www.slideshare.net/eusprig/felienne-hermans-at-eusprig-2014
[SEMS 2014] Hermans, Felienne, ed., Software Engineering Methods in Spreadsheets http://ceur-ws.org/Vol-1209/
[SEMS 2015] Hermans, Felienne, ed., Software Engineering Methods in Spreadsheets http://ceur-ws.org/Vol-1355/
[Hermans Blog] Hermans, Felienne, Felienne Hermans blog http://www.felienne.com/
[Topic Maps Data Model] Topic Maps Data Model, ISO/IEC 13250-2 http://www.isotopicmaps.org/sam/sam-model/
[Topic Maps Reference Model] Topic Maps Reference Model, ISO/IEC 13250-5 http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=40757
[Hofer 2014] Hofer, Birgit, et al., Tool-supported fault localization in spreadsheets: Limitations
of current evaluation practice http://ceur-ws.org/Vol-1209/paper_1.pdf
[Hofer] Hofer, Birgit, Code Smells by Birgit Hofer http://www.ist.tugraz.at/_attach/Publish/SelectedTopics3/CodeSmells.pdf
[Infotron] Infotron http://www.infotron.nl/
[Jannach 2014] Jannach, Dietmar, et al., Avoiding, Finding and Fixing Spreadsheet Errors - A Survey
of Automated Approaches for Spreadsheet QA http://ls13-www.cs.tu-dortmund.de/homepage/publications/jannach/Journal_JSS_2014.pdf, doi:https://doi.org/10.1016/j.jss.2014.03.058
[Khokhar 2014] Khokhar, Tariq, Three things I learned at the 2014 Open Knowledge Festival, Tariq
Khokar (world bank media contact, hot on spreadsheets) http://blogs.worldbank.org/opendata/three-things-i-learned-2014-open-knowledge-festival
[Kohlhase, Andrea 2008] Kohlhase, Andrea, and Michael Kohlhase, Compensating the Semantic Bias of Spreadsheets
http://omdoc.org/pubs/kohkoh-lwa08.pdf
[Kohlhase, Andrea 2015] Kohlhase, Andrea, Micahael Kohlhase, Ana Guseva, Context in Spreadsheet Comprehension
http://ceur-ws.org/Vol-1355/paper8.pdf
[Kohlhase, Michael 2009] Kohlhase, Michael, An Open Markup Format for Mathematical Documents http://omdoc.org/pubs/omdoc1.2.pdf
[Kulesz 2013] Kulesz, Daniel, Jan-Peter Ostberg, Practical Challenges with Spreadsheet Auditing
Tools http://www.iste.uni-stuttgart.de/fileadmin/user_upload/iste/se/research/publications/download/dk_jpo_EuSpRiG2013_preprint.pdf
[Recalculated Formula (OpenFormula)] OpenFormula http://standards.iso.org/ittf/PubliclyAvailableStandards/c066375_ISO_IEC_26300-2_2015.zip
[Rajalingham 2008] Rajalingham, Kamalasen, David R. Chadwick, Brian Knight, Classification of Spreadsheet
Errors http://arxiv.org/pdf/0805.4224.pdf
[RELAX-NG] RELAX-NG http://standards.iso.org/ittf/PubliclyAvailableStandards/c052348_ISO_IEC_19757-2_2008(E).zip
[Schematron] Schematron http://standards.iso.org/ittf/PubliclyAvailableStandards/c040833_ISO_IEC_19757-3_2006(E).zip
[Shallcross 2015] Shallcross, Mike, Sniff out spreadsheet “smells” with new Rainbow 9.0 http://www.themodelanswer.com/news/sniff-out-spreadsheet-smells-rainbow-9-0/
[Shueh 2014] Shueh, Jason, Why Spreadsheets Stink — and 4 Ways to Improve Them http://www.govtech.com/data/Why-Spreadsheets-Stink-4-Ways-to-Improve-Them.html
[Sohon Blog] Roy, Sohon, All things soft and some hard too (Sohon Roy) https://sohonroy.wordpress.com/
[Stiel 2014] Stiel, Bjoern, Version Control for Spreadsheets: A fresh take on an old problem http://www.slideshare.net/eusprig/spread-git
[Stolee 2011] Stolee, Kathryn T., Sebastian Elbaum, Anita Sarma, End-User Programmers and their
Communities: An Artifact-based Analysis http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1215&context=cseconfwork, doi:https://doi.org/10.1109/ESEM.2011.23
[Usdin 2009] Usdin, Tommie, Standards considered harmful http://www.balisage.net/Proceedings/vol3/html/Usdin01/BalisageVol3-Usdin01.html, doi:https://doi.org/10.4242/BalisageVol3.Usdin01
[Walkenback] Walkenback, John, The Spreadsheet Page (Excel) http://spreadsheetpage.com/
×Hermans, Felienne, Analyzing & visualizing spreadsheets (dissertation), doi:10.6084/m9.figshare.658936
×Hermans, Felienne, Felienne Hermans blog http://www.felienne.com/
×Roy, Sohon, All things soft and some hard too (Sohon Roy) https://sohonroy.wordpress.com/