How to cite this paper
Gryk, Michael Robert. “Human Readability of Data Files.” Presented at Balisage: The Markup Conference 2022, Washington, DC, August 1 - 5, 2022. In Proceedings of Balisage: The Markup Conference 2022. Balisage Series on Markup Technologies, vol. 27 (2022). https://doi.org/10.4242/BalisageVol27.Gryk01.
Balisage: The Markup Conference 2022
August 1 - 5, 2022
Balisage Paper: Human Readability of Data Files
Michael Robert Gryk
Associate Professor
Department of Molecular Biology and Biophysics, UCONN Health (US)
Doctoral Student
University of Illinois, Urbana-Champaign (US)
Dr. Michael R. Gryk is Associate Professor of Molecular Biology and Biophysics at
UCONN Health.
At UCONN, Michael co-leads a technical research and discovery component of the NMRbox
BTRR Center,
the mission of which is to foster the computational reproducibility and scientific
data re-use of bioNMR data.
Michael is also the associate director of the BioMagResBank, the international repository
for bioNMR research
data. He is also a doctoral student at the School of Information Sciences at the University
of Illinois,
Urbana-Champaign, where his broad research interests are in provenance, workflows,
digital curation and
preservation, reproducibility, and scientific data re-use. Michael is also a participant
of the W3C
Invisible Markup group.
Copyright ©2022 by the author. Used with permission.
Abstract
In this era of big data and FAIR data, data formats must be machine interpretable.
XML, among other standards,
satisfies this requirement. Yet many standardization initiatives cite human readability
as a second, key property
in data format development. Examples include the development of STAR in the field
of structural biology, W3C PROV for
provenance, and even the continuing development of XML. This begs the question(s),
what is meant by human
readability and can this property be measured for a given data format or compared
between competing standards?
The broad topic of readability is considered with attention to the various aspects
of
written text which either foster or counter readability. Drawing on efforts in the
educational system, a metric is
proposed for estimating the relative human readability of structured data within an
archival file format. Comparison
is made between the same data represented in various formats, including JSON and XML,
to help judge whether these
standards have accomplished their simultaneous goals of machine interpretability and
human readability.
Table of Contents
- Introduction and Motivation
- Background
-
- STAR
- NMR-STAR
- PROV
- YAML / JSON
- XML
- Readability
- Data Readability
-
- Data Identifiers
- Data Values
- Syntactical Characters
- Readability Formula
- Discussion
- Acknowledgments
- Appendix A. An example STAR file: 93.8% Human Readability
- Appendix B. An example JSON serialization of the STAR file: 72.5% Human Readability
- Appendix C. An example XML serialization of the STAR file: 40.5% Human Readability
Introduction and Motivation
In this era of big data and FAIR data, data formats must be machine interpretable.
XML, among other standards,
satisfies this requirement. Yet many standardization initiatives cite human readability
as a second, key property
in data format development. Examples include the development of STAR in the field
of structural biology, W3C PROV for
provenance, and even the continuing development of XML. This begs the question(s),
what is meant by human
readability and can this property be measured for a given data format or compared
between competing standards?
The broad topic of readability is considered with attention to the various aspects
of
written text which either foster or counter readability. Drawing on efforts in the
educational system, a metric is
proposed for estimating the relative human readability of structured data within an
archival file format. Comparison
is made between the same data represented in various formats, including JSON and XML,
to help judge whether these
standards have accomplished their simultaneous goals of machine interpretability and
human readability.
I have been motivated to write about this topic after witnessing several years of
conversations between
colleagues, mentors and students. It is impossible to count the number of discussions
which contained statements
such as: "I prefer STAR to JSON because it is easier to read." Interestingly, sometimes
a discussion within a different
group would prompt the exact opposite preference, suggesting there is a subjective
element to readability.
I admit to having my own bias. I favor XML over JSON, STAR, etc. due to many factors,
including its structure, elegance, longevity and the tremendous technology stack built
around XML. Perhaps due to
this bias, I had not considered trying to measure the readability of data within XML versus
other formats. My role was simply another participant in water cooler conversations:
"I prefer XML, full stop."
Last year's submission to Balisage [Gryk, 2021] provided me the opportunity to roll up my sleeves
and really engage with the STAR data structures. That contribution was an effort to
deconstruct the structure of STAR
from its syntax (serialization). An extra benefit of that exercise was the development
of not only an XML serialization
of STAR, but also a series of transformations (XSLTs) for round-tripping the XML representation
back to STAR, for
converting to JSON, and even creating a spreadsheet representation of the data. Generating,
testing and trouble-shooting
these various serializations provided a very tacit experience regarding which formats
are more readable than others.
There are two important qualifications to add at this point which will be discussed
again later. One, readability
of data is somewhat dependent on the data. The STAR format example in this manuscript
is used in the field of structural
biology, and the pairing of data with format may produce yet another bias to the conversation.
Two, the terms format and
serialization have been used somewhat interchangeably up until now but it will be
important to distinguish the readability
of data as presented in a textual layout versus the ability of a format to be converted
to a textual layout which is readable.
Background
Mark Twain is often cited for the quip: Everybody talks about the weather, but nobody
does anything about it. A similar
thing can be said of the human readability of data formats. Everyone talks about
human readability, but hardly anyone defines
precisely what they mean by it nor which properties make one format more readable
than another.
Of course, in a different era the topic in front of us would not be what is meant
by human readable. Documents such as books,
magazines and pamphlets are written and printed specifically for human consumption.
Fifty years ago, the important topic would
be how to make documents machine readable. The acronym MARC emphasized MAchine Readable
Cataloguing rather than human readable
cataloguing. The FAIR data principles also require that scientific data be machine
interoperable. FAIR data repositories
support this requirement by providing data in machine interpretable formats.
Nevertheless, human readability is frequently cited alongside machine readability
as a desirable property for data formats
and as a major design consideration. A few examples should drive home the point.
STAR
The STAR file was introduced in 1991 to support scientific data and this format is
still in use in the structural biology
communities. Hall identifies the following requirements for the STAR format [Hall, 1991]:
A Universal Archive File should be simple to read and to access. — Bullet point 4 of requirements
The file is easy to read visually, or by machine. — Bullet point 5 of the properties of a STAR file
To facilitate access to a STAR File the names of data items must be defined to be
as descriptive as possible.
NMR-STAR
A variant of the STAR format (called NMR-STAR) has been used by the BioMagResBank
[Ulrich, et al., 2008] for archiving data related
to the field of biomolecular nuclear magnetic resonance spectroscopy since the 1990's.
The decision to use STAR as opposed to other
available standards was made in part out of concern for human readability.
ASN.1, XML, and SGML formats did not meet the need for easy human readability and
efficient manual editing with
common text editing software.
— BMRB Internal Whitepaper
PROV
The W3C supports a set of standards for recording and reporting provenance, particularly
within the context of the world wide web. Part of
this family of standards is a specialized notation for provenance, PROV-N. Once again,
human readability was a major design consideration.
A key goal of PROV is the specification of a machine-processable data model for provenance.
However, communicating provenance between
humans is also important when teaching, illustrating, formalizing, and discussing
provenance-related issues. With these two requirements
in mind, this document introduces PROV-N, the PROV notation, a syntax designed to
write instances of the PROV data model according to the
following design principles:
-
Technology independence. PROV-N provides a simple syntax that can be mapped to several
technologies.
-
Human readability. PROV-N follows a functional syntax style that is meant to be easily
human-readable so it can be used in
illustrative examples, such as those presented in the PROV documents suite.
-
Formality. PROV-N is defined through a formal grammar amenable to be used with parser
generators.
— https://www.w3.org/TR/2013/REC-prov-n-20130430/
YAML / JSON
YAML is a human-friendly data serialization language for all programming languages.
— https://yaml.org/
YAML is a strict superset of JSON. However, when comparing YAML with JSON, once again,
the topic often returns to readability:
In practice, however, the two formats look different, as the YAML specification puts
more emphasis on human readability by adding
a lot more syntactic sugar and features on top of JSON.
— https://realpython.com/python-yaml/
XML
Even the design criteria for XML refer to human readability.
The design goals for XML are: ...
6. XML documents should be human-legible and reasonably clear.
— https://www.w3.org/TR/REC-xml/
Readability
This topic of this paper is the human readability of data as contained within scientific
data formats. It is useful at this point
to consider readability more generally, as much work has been done in developing metrics
for measuring the readability of common texts.
These include the Flesch Reading Ease [Flesch, 1948], the Fry Reading Formula [Fry, 1968], and the Simple
Measure of Gobbledygook (SMOG) Formula [McLaughlin, 1969]
The Flesch Reading Ease formula is as follows:
Reading Ease = 206.835 – 1.015 * (average words per sentence) – 84.6 * (average syllables
per word)
A larger number is considered easier to read, a smaller number is more difficult to
read. As an example, let's consider:
I do not like green eggs and ham. I do not like them, Sam I am.
— Seuss, 1960
The total number of words is 16. The total number of sentences is 2. Therefore, the
average words per sentence is 8.
Since they are all single syllable words, the second Flesch value is 1. The overall
Reading Ease is
206.835 - 8.12 - 84.6 = 114.115. This is considered to be very easy to read, up to
fourth grade reading level.
Let's contrast that with the following:
The broad topic of readability is discussed in this manuscript with attention to the
various aspects of
written text which either foster or counter readability. Drawing on efforts in the
educational system, a metric is
proposed for estimating the relative human readability of structured data within an
archival file format.
— Draft of Abstract
In this case, we have 50 words in 2 sentences: 25 words per sentence. We have a total
of 96 syllables for the 50 words or an average
syllable/per word of 1.92. The overall
Reading Ease is 206.835 - 25.375 - 162.432 = 19.028. This is
considered to be very difficult to read, at a college reading level.
Data Readability
How can we construct a formula similar to the Flesch formula for measuring the human
readability of a data format? The first
consideration is defining the contents of a data file in as general terms as possible.
A data file generally contains data values
which are associated with data identifiers. These data items are structured within
the file using some type of syntactical characters
such that the data file can be parsed by a machine. Machine readability is assumed
as a prerequisite for a data file format. Let's consider these three components, data
identifiers, data values and syntax as to how
they affect human readability. In the end, I will propose that simply counting the
number of identifier characters, value
characters and syntactic characters can be used to define a general formula for the
human readability of scientific data. I do not
claim that this is the best nor the only way to measure human readability; with this
exercise I hope to start a conversation on
defining and measuring readability rather than ending the conversation.
Data Identifiers
Identifiers provide a name for the underlying data. In principle, the identifiers
only need to be unique within whatever
scope they are used. For example, the following table illustrates the same concept
being expressed using different identifier
conventions.
Table I
Verbose Identifier names.
Examples |
a = l * w |
area = length * width |
rectangle.area = base.length * side.width |
All three of the examples convey the same concept, that the area is a given by the
length multiplied by the width.
However, the examples differ in their verboseness or their descriptive value. In the
first example, a reader needs to
either know that the identifiers refer to area, length and width, or be able to infer
that relationship. In the third
example, additional qualifiers are used which can be helpful in cases where there
are multiple formulas for calculating
the area of rectangles, triangles and parallelograms.
As a general rule, we notice that the more characters which are used, the more descriptive
the identifier.
Of course, this is just a generalization which helps justify counting characters as
a measurement of readability, similar
to the counting of syllables as a measure of the complexity of a word. "Shunt" may
be less readable than "today"
irrespective of the number of syllables. "I" might be just as readable as "the author
of this manuscript", even though
the latter has more characters.
At the other extreme of verboseness are fixed width file formats, such as the original
pdb format of the Protein Data Bank
[PDB format, 2012]. In the case of fixed width formats, there need not be any identifiers or any syntactical
characters either. It is part of the documentation how the various data values are
ordered in the file (similar to binary
data representations). These formats may still maintain a degree of human readability;
however, edge cases where fixed
width values bleed into each other can be onerous. (An example of this for the pdb
format is between the chain identifer
and residue number. Residue number is given 4 characters after the chain ID. In the
vast majority of cases, there are fewer
than 999 residues in a polymer and there is whitespace between the chain ID and residue
number. However, if 1000 residues are
reached, then there is no intervening whitespace. This is a common trip point for
folks writing parsers for the old pdb format.)
Data Values
Just as data identifiers can benefit from verboseness, so can data values. However,
in the case of scientific data values,
there is a stronger impetus for quantitative values which can tip the scale more towards
machine readability rather than human
readability [Wrightson, 2005]. An example is given in below.
Table II
Comparing Oranges to Oranges. Various methods of recording data about color.
From a data perspective, the numerical quanties are more precise.
However, the textual description is the most human readable.
Type |
Value |
Audience |
Textual Description |
Orange |
Human (General) |
Wavelength |
600 nm |
Human (Scientist) and Machine |
RGB: Hexidecimal |
#FFBE00 |
Machine |
RGB: Decimal |
255, 190, 0 |
Machine |
CMYK |
0%, 25%, 100%, 0% |
Machine |
The overarching purpose of a data file is to convey the data values using whatever
representation
the scientific community agrees is the most correct, precise, or important. Therefore,
it is not my
intention to suggest that the data values should be transcoded from machine readable
values useful to the community to something more human readable which is less useful,
as in 600 nm versus
orange. Nevertheless, as Ann Wrightson pointed out in 2005, much XML is not human
readable because the
values stored within XML files are intended for machines, not humans [Wrightson, 2005].
As in the preceding section regarding data identifiers, it is a simple generalization
that the more characters
are used to represent a data value, the more human readable it can be. For example,
"true" and "false" are more readable
than "1" and "0". In fact, a simple machine encoding of a Boolean value can be used
to represent true/false, on/off,
up/down, and various other exclusive properties for which a more verbose description
could assist in human readability.
Counter examples would include lengthy numerical codes as proxies, such as the ISBN
of "978-0-385-12167-5" rather
than the book title, "The Shining".
Syntactical Characters
The final component of scientific data formats are the characters and expressions
which are used to define
the syntax. These are useful for both human readability and machine parsing. However,
in this section I argue
that a less verbose syntax leads to easier human readability. (A more verbose syntax
often leads to easier machine parsing
as can be seen in programming languages such as ALGOL where every if is closed with a fi
and every do is closed with an od.
Throughout the rest of this paper I will explicitly refer to three serializations
of the STAR file format. The original serialization
which is part of the STAR definition [Hall, 1991], an XML representation of the STAR format [Gryk, 2021]
along with a JSON serialization generated through an XSLT of the XML serialization.
It should be noted that the motivation
for the design of the XML schema for STAR [Gryk, 2021] was specifically to support transformation into other
serializations and to support that goal, the XML schema explicitly defines and tracks
aspects of the STAR format. As pointed
out by one of the reviewers of this paper, that makes the final comparison between
the readability of XML and the other
serializations a bit unfair as the XML version defines more of the data structure
within the schema.
To quickly summarize the STAR format, a data file consists of two types of top level
containers, data blocks which have an
associated identifer and global blocks (named for their global scope). Within these
containers are allowed a third type of
container called a save frame which in some variations of STAR can be arbitrarily
nested. Finally, the data itself is provided
either through key/value pairs for which the keys must be unique within the scope
of the container, or as tabular data with
explicit column names along with tabular data values. An important point which will
be revisited, STAR uses whitespace as a
basic delimiter between these keywords, identifers and values, but other than that
whitespace has no formal meaning. Because
of this, tabular data can be formatted in very human readable ways or the whitespace
can be used to obstruct human readability.
A comparison of the syntactical characters required for each of the three serializations
is given below.
Table III
STAR, XML and JSON representations for the various STAR constructs. '...
' is used to signify that additional
content follows which belongs to a different STAR construct. For example, files are
composed of data and global
blocks, which in turn are composed of save frames, key/value pairs, and tables (called
loops in STAR).
The second line of each row indicates the number of extra syntactic characters required by each
serialization format.
Block |
STAR |
XML |
JSON |
File |
... |
<file>...</file> |
{"file" : ...} |
0 |
13 |
9 |
Data Block |
data_ identifier ... |
<data name=" identifier ">...</data> |
{"data" : { "name" = " identifier ", ...} |
5 |
20 |
20 |
Global Block |
global_ ... |
<global>...</global> |
{"global" : ...} |
7 |
17 |
11 |
Save Frame |
save_ identifier ... save_ |
<save name=" identifier ">...</save> |
{"save" : { "name" = " identifier ", ...} |
10 |
20 |
20 |
Pair |
_ key value |
<datum key=" key "> value </datum> |
" key " : " value " |
1 |
21 |
5 |
Table (loop) |
loop_ _ column1 _ column2 value1 value2 |
<loop><header><column key=" column1 "/><column key=" column2 "/></header><row><cell> value1 </cell><cell> value2 </cell></row></loop> |
[[" column1 "," column2 "],[" value1 "," value2 "]] |
5 + 1 per column |
56 plus multiple of rows and columns |
6 plus multiple of rows and columns |
The above table provides the general syntax for each of the STAR constructs, serialized
as canonical STAR, XML or as JSON. For
each STAR construct, the syntax is given on the top row and below is given a tabulation
of the number of characters required
to define the syntax.
The first row emphasizes that STAR is itself a file format and implicitly uses the
file as the top level container. In the
case of XML or JSON, this root element is made explicit. Therefore, XML and JSON representations
require an additional 13 and
9 characters to define this construct.
The other container constructs in STAR have short keywords to define data blocks,
global blocks or save frames. The XML and
JSON representations are similarly short; however, they require a few more syntactical
characters.
The largest difference noted is for key value pairs and tables. STAR has an extremely
concise manner for representing identifiers
by preceding them with a single underscore. This is much more syntactically efficient
than XML and slightly better than JSON.
In the case of tabular data, it becomes impossible to define the difference between
the serializations as a single number; the
verboseness of both XML and JSON is proportional to the number of columns and the
number of rows within the table.
Discussion
My goal in this paper is to explore a possibility of measuring the human readability of a scientific
data file. The benefits of human readability are often cited as important design considerations
for data standards and it has
been noted that different XML files can vary on their readability [Wrightson, 2005]. Estimating the readability of
books has been attempted by multiple sources with efforts focusing either on look-up
tables of content or simple mathematical
counting of syllables and words as an indicator of the complexity of the prose. This
latter approach is taken to define a
formula for readability of scientific data with a use case of the STAR format used
in the fields of chemistry and structural biology.
There are several obvious caveats and critiques of this work. The first of which is
the appearance of comments
within a data file. Both STAR and the XML serialization (Appendix A and Appendix C) allow for comments while JSON does not. Comments are
explicitly intended to aid in human comprehension. However, since they are not part
of the machine parsable content (at least in STAR) it
seemed unfair to include them as either improving or detracting from human readability.
A second important caveat is with regards to whitespace. STAR ranks the best in human
readability according to this formula, in large
part because whitespace is used as the natural delimiter. This is how natural language
is also delimited. It is important to note that
while Appendices B and C also use whitespace to aid in human readability, almost none
of the whitespace is actually required for those
serialization formats. (The only required whitespace is to separate element names
and attributes in XML.) STAR on the other hand requires
whitespace and uses this requirement as its main mechanism of achieving its stated
goal of ensuring the file is easy to read visually
or by machine.
However, whitespace can be challenging when used for both humans and machines, particularly
because whitespace is
invisible to humans but visible to machines. In this regard, if a file format distinguishes
between the types
and amounts of whitespace (as does Python and YAML), a single representation may be
challenging to read both as a human and a
machine. In other words, the difference between a tab and three spaces may be significant
to the machine but indistinguishable to the
human which, while not directly countering human readability, perhaps affects human/machine
mutual understanding.
In summary, I hope that this discussion and proposed metric are useful in attempting
to more formally define the oft-cited concern
of human readability in data formats.
Acknowledgments
This work was supported in part by the National Institute of General Medical Sciences
of the National Institutes of Health under
Award Number GM-109046.
Appendix A. An example STAR file: 93.8% Human Readability
data_5208
#######################
# Entry information #
#######################
save_entry_information
_Entry.Sf_category entry_information
_Entry.Sf_framecode entry_information
_Entry.ID 5208
_Entry.Title
;
1H, 13C and 15N resonance assignments for the perdeuterated 22 kD palm-thumb
domain of DNA polymerase B
;
_Entry.Type .
_Entry.Version_type original
_Entry.Submission_date 2001-11-14
_Entry.Accession_date 2001-11-14
_Entry.Last_release_date 2002-05-07
_Entry.Original_release_date 2002-05-07
_Entry.Origination author
_Entry.NMR_STAR_version 3.1.1.61
_Entry.Original_NMR_STAR_version 2.1
_Entry.Experimental_method NMR
_Entry.Experimental_method_subtype .
_Entry.Details .
_Entry.BMRB_internal_directory_name .
loop_
_Entry_author.Ordinal
_Entry_author.Given_name
_Entry_author.Family_name
_Entry_author.First_initial
_Entry_author.Middle_initials
_Entry_author.Family_title
_Entry_author.Entry_ID
1 Michael Gryk . R. . 5208
2 Mark Maciejewski . W. . 5208
3 Anthony Robertson . . . 5208
4 Mary Mullen . A. . 5208
5 Samuel Wilson . H. . 5208
6 Gregory Mullen . P. . 5208
loop_
_Data_set.Type
_Data_set.Count
_Data_set.Entry_ID
assigned_chemical_shifts 1 5208
loop_
_Datum.Type
_Datum.Count
_Datum.Entry_ID
'1H chemical shifts' 354 5208
'13C chemical shifts' 621 5208
'15N chemical shifts' 168 5208
loop_
_Release.Release_number
_Release.Format_type
_Release.Format_version
_Release.Date
_Release.Submission_date
_Release.Type
_Release.Author
_Release.Detail
_Release.Entry_ID
1 . . 2002-05-07 2001-11-14 original author . 5208
save_
Appendix B. An example JSON serialization of the STAR file: 72.5% Human Readability
{"STAR-file" :
{"data" : {
"name" : "5208",
"save" : { "name" : "entry_information",
"Entry.Sf_category" : "entry_information",
"Entry.Sf_framecode" : "entry_information",
"Entry.ID" : "5208",
"Entry.Title" : "1H, 13C and 15N resonance assignments for the perdeuterated 22 kD palm-thumb\ndomain of DNA polymerase B",
"Entry.Type" : ".",
"Entry.Version_type" : "original",
"Entry.Submission_date" : "2001-11-14",
"Entry.Accession_date" : "2001-11-14",
"Entry.Last_release_date" : "2002-05-07",
"Entry.Original_release_date" : "2002-05-07",
"Entry.Origination" : "author",
"Entry.NMR_STAR_version" : "3.1.1.61",
"Entry.Original_NMR_STAR_version" : "2.1",
"Entry.Experimental_method" : "NMR",
"Entry.Experimental_method_subtype" : ".",
"Entry.Details" : ".",
"Entry.BMRB_internal_directory_name" : ".",
"loop" : [["Entry_author.Ordinal","Entry_author.Given_name","Entry_author.Family_name","Entry_author.First_initial","Entry_author.Middle_initials","Entry_author.Family_title","Entry_author.Entry_ID"],
["1","Michael","Gryk",".","R.",".","5208"],
["2","Mark","Maciejewski",".","W.",".","5208"],
["3","Anthony","Robertson",".",".",".","5208"],
["4","Mary","Mullen",".","A.",".","5208"],
["5","Samuel","Wilson",".","H.",".","5208"],
["6","Gregory","Mullen",".","P.","5208"]],
"loop" : [["Data_set.Type","Data_set.Count","Data_set.Entry_ID"],
["assigned_chemical_shifts","1","5208"]],
"loop" : [["Datum.Type","Datum.Count","Datum.Entry_ID"],
["1H chemical shifts","354","5208"],
["13C chemical shifts","621","5208"],
["15N chemical shifts","168","5208"]],
"loop" : [["Release.Release_number","Release.Format_type","Release.Format_version","Release.Date","Release.Submission_date","Release.Type","Release.Author","Release.Detail","Release.Entry_ID"],
["1",".",".","2002-05-07","2001-11-14","original author",".","5208"]]
}
}
}
}
Appendix C. An example XML serialization of the STAR file: 40.5% Human Readability
<?xml version="1.0" encoding="UTF-8"?>
<STAR-file version="Hall_96" xmlns="BMRB.STAR" xmlns:xsi="star.xsd">
<data name="5208">
<!-- ###################### -->
<!-- Entry information # -->
<!-- ###################### -->
<save name="entry_information">
<datum key="Entry.Sf_category" >entry_information</datum>
<datum key="Entry.Sf_framecode" >entry_information</datum>
<datum key="Entry.ID" >5208</datum>
<datum key="Entry.Title" delimiter="semi-colon">\n1H, 13C and 15N resonance assignments for the perdeuterated 22 kD palm-thumb \ndomain of DNA polymerase B</datum>
<datum key="Entry.Type" >.</datum>
<datum key="Entry.Version_type" >original</datum>
<datum key="Entry.Submission_date" >2001-11-14</datum>
<datum key="Entry.Accession_date" >2001-11-14</datum>
<datum key="Entry.Last_release_date" >2002-05-07</datum>
<datum key="Entry.Original_release_date" >2002-05-07</datum>
<datum key="Entry.Origination" >author</datum>
<datum key="Entry.NMR_STAR_version" >3.1.1.61</datum>
<datum key="Entry.Original_NMR_STAR_version" >2.1</datum>
<datum key="Entry.Experimental_method" >NMR</datum>
<datum key="Entry.Experimental_method_subtype" >.</datum>
<datum key="Entry.Details" >.</datum>
<datum key="Entry.BMRB_internal_directory_name" >.</datum>
<loop>
<header>
<column key="Entry_author.Ordinal"/>
<column key="Entry_author.Given_name"/>
<column key="Entry_author.Family_name"/>
<column key="Entry_author.First_initial"/>
<column key="Entry_author.Middle_initials"/>
<column key="Entry_author.Family_title"/>
<column key="Entry_author.Entry_ID"/>
</header>
<row>
<cell>1</cell>
<cell>Michael</cell>
<cell>Gryk</cell>
<cell>.</cell>
<cell>R.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
<row>
<cell>2</cell>
<cell>Mark</cell>
<cell>Maciejewski</cell>
<cell>.</cell>
<cell>W.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
<row>
<cell>3</cell>
<cell>Anthony</cell>
<cell>Robertson</cell>
<cell>.</cell>
<cell>.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
<row>
<cell>4</cell>
<cell>Mary</cell>
<cell>Mullen</cell>
<cell>.</cell>
<cell>A.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
<row>
<cell>5</cell>
<cell>Samuel</cell>
<cell>Wilson</cell>
<cell>.</cell>
<cell>H.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
<row>
<cell>6</cell>
<cell>Gregory</cell>
<cell>Mullen</cell>
<cell>.</cell>
<cell>P.</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
</loop>
<loop>
<header>
<column key="Data_set.Type"/>
<column key="Data_set.Count"/>
<column key="Data_set.Entry_ID"/>
</header>
<row>
<cell>assigned_chemical_shifts</cell>
<cell>1</cell>
<cell>5208</cell>
</row>
</loop>
<loop>
<header>
<column key="Datum.Type"/>
<column key="Datum.Count"/>
<column key="Datum.Entry_ID"/>
</header>
<row>
<cell delimiter="single-quote">1H chemical shifts</cell>
<cell>354</cell>
<cell>5208</cell>
<cell delimiter="single-quote">13C chemical shifts</cell>
<cell>621</cell>
<cell>5208</cell>
<cell delimiter="single-quote">15N chemical shifts</cell>
<cell>168</cell>
<cell>5208</cell>
</row>
</loop>
<loop>
<header>
<column key="Release.Release_number"/>
<column key="Release.Format_type"/>
<column key="Release.Format_version"/>
<column key="Release.Date"/>
<column key="Release.Submission_date"/>
<column key="Release.Type"/>
<column key="Release.Author"/>
<column key="Release.Detail"/>
<column key="Release.Entry_ID"/>
</header>
<row>
<cell>1</cell>
<cell>.</cell>
<cell>.</cell>
<cell>2002-05-07</cell>
<cell>2001-11-14</cell>
<cell>original</cell>
<cell>author</cell>
<cell>.</cell>
<cell>5208</cell>
</row>
</loop>
</save>
</data>
</STAR-file>
References
[Gryk, 2021]
Gryk, Michael R. Deconstructing the STAR File Format
.
Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021.
In Proceedings of Balisage: The Markup Conference 2021.
Balisage Series on Markup Technologies, vol. 26 (2021).
doi:https://doi.org/10.4242/BalisageVol26.Gryk01
[Hall, 1991]
Hall, S.R. The STAR File: A New Format for Electronic Data Transfer and Archiving
.
J. Chem. Inf. Comput.,
31, 326-333 (1991). doi:https://doi.org/10.1021/ci00002a020
[Wrightson, 2005]
Wrightson, A. Semantics of Well Formed XML as a Human and Machine Readable Language: Why is some
XML so difficult to read?
Proceedings of Extreme Markup Languages 2005,
2005.
[Flesch, 1948]
Flesch, R. A new readability yardstick
.
Journal of Applied Psychology,
32, 221–233 (1948).
doi:https://doi.org/10.1037/h0057532
[Seuss, 1960]
Seuss. Green Eggs and Ham. New York, NY: Beginner Books, 1960.
[Fry, 1968]
Fry, Edward. A Readability Formula That Saves Time
.
Journal of Reading,
11, 513-578 (1968).
[McLaughlin, 1969]
McLaughlin, G.H. SMOG Grading — A New Readability Formula
.
Journal of Reading,
12, 639-646, (1969).
[PDB format, 2012]
Protein Data Bank (original pdb format). https://www.wwpdb.org/documentation/file-format-content/format33/sect9.html
[Ulrich, et al., 2008]
Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J.,
Livny, M., Mading, S.,
Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Wenger, R.K., Yao,
H. & Markley, J.L.
BioMagResBank
. Nucleic Acids Research, 36, D402–D408 (2008).
doi:https://doi.org/10.1093/nar/gkm957
×
Gryk, Michael R. Deconstructing the STAR File Format
.
Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021.
In Proceedings of Balisage: The Markup Conference 2021.
Balisage Series on Markup Technologies, vol. 26 (2021).
doi:https://doi.org/10.4242/BalisageVol26.Gryk01
×
Wrightson, A. Semantics of Well Formed XML as a Human and Machine Readable Language: Why is some
XML so difficult to read?
Proceedings of Extreme Markup Languages 2005,
2005.
×
Seuss. Green Eggs and Ham. New York, NY: Beginner Books, 1960.
×
Fry, Edward. A Readability Formula That Saves Time
.
Journal of Reading,
11, 513-578 (1968).
×
McLaughlin, G.H. SMOG Grading — A New Readability Formula
.
Journal of Reading,
12, 639-646, (1969).
×
Ulrich, E.L., Akutsu, H., Doreleijers, J.F., Harano, Y., Ioannidis, Y.E., Lin, J.,
Livny, M., Mading, S.,
Maziuk, D., Miller, Z., Nakatani, E., Schulte, C.F., Tolmie, D.E., Wenger, R.K., Yao,
H. & Markley, J.L.
BioMagResBank
. Nucleic Acids Research, 36, D402–D408 (2008).
doi:https://doi.org/10.1093/nar/gkm957