How to cite this paper

Bickel, Alan Edward, Elisa E. Beshero-Bondar and Tim Larson. “A Linked-Data Method to Organize an XML Database for Mathematics Education.” Presented at Balisage: The Markup Conference 2021, Washington, DC, August 2 - 6, 2021. In Proceedings of Balisage: The Markup Conference 2021. Balisage Series on Markup Technologies, vol. 26 (2021). https://doi.org/10.4242/BalisageVol26.Beshero-Bondar01.

Balisage: The Markup Conference 2021
August 2 - 6, 2021

Balisage Paper: A Linked-Data Method to Organize an XML Database for Mathematics Education

Alan Edward Bickel

Software Engineer

Big Ideas Learning, LLC / Larson Texts, Inc.

`<abickel@larsontexts.com>`

Alan Bickel is a software engineer at Big Ideas Learning, LLC. | Larson Texts, Inc. His current focus with Big Ideas Learning includes Data and Systems Architecture, System Design, Application and Web development. Tech stack experience includes: LAMP, Node.js, Typescript, Express, Angular, Aurelia, MongoDB, Phaser, Apache Tomcat. Actively learning and loving the XML/RDF/ExistDB ecosystem. Interests include:

machine language translations for digital and print consumables
Embedded electronics engineering and development
text-to-speech and accessibilty-driven application development
Path of Exile

Elisa E. Beshero-Bondar

Professor of Digital Humanities

Program Chair of Digital Media, Arts, and Technology

Penn State Erie, The Behrend College

`<eeb4@psu.edu>`

Elisa Beshero-Bondar explores and teaches document data modeling with the XML family of languages. Until June 2020, she was a professor of English Literature and Director of the Center for the Digital Text at Pitt-Greensburg. She serves on the TEI Technical Council and is the founder and organizer of the Digital Mitford project and its usually annual coding school. She experiments with visualizing data from complex document structures like epic poems and with computer-assisted collation of differently encoded editions of Frankenstein. Her ongoing adventures with markup technologies are documented on her development site at newtfire.org.

Tim Larson

Director

Big Ideas Learning, LLC / Larson Texts, Inc.

`<tlarson@grantlarsonproductions.com>`

In Timothy Roland Tim Larson’s four decades of professional experience, he has written in many formats from page to screen, for all ages from early reader to adult, and in most media, including interactive media and markup languages. He is a developer, producer, entrepreneur, and occasional yacht crewman. Tim helped found Grant Larson Productions, a film production company, and he is the chief architect of Larson Texts, an educational publishing company. Many of his projects have won awards and achieved market success. Tim is married to Mary Grant Larson, and together they have 2 children, 2 grandchildren, 2 dogs, and a multi-generation family farm in Pennsylvania, where they still make hay in the summer and cider in the fall.

Abstract

This paper presents work in progress to support fine-grained semantic relationships between mathematical concepts and educational resources. Can RDF ontologies and XML structure support a high-capacity database application for lesson planning, teaching, assessment, and tutoring?

Contexts for designing an adaptive content delivery system

Overview: What we seek to design
Prior research and solutions in educational technology
Pedagogical objectives and market needs
A look at the correlation problem in context

Our proposed solution to the competency alignment problem

Introducing the competency graph
Constructing RDF for mathematics education
RDF/XML and the development of the competency graph

Implementation challenges

The challenge of resource identification and referencing
Exploring an XML database for content management supported by RDF/XML

Conclusion

Contexts for designing an adaptive content delivery system

Overview: What we seek to design

Big Ideas Learning LLC (BIL) creates and publishes math learning content for elementary, secondary, and post-secondary courses primarily in the United States. The authors are helping BIL to organize a new content delivery system that will serve its partner elementary and secondary schools (K-12). BIL wants to move beyond the restrictions of relational schemas to organize content using declarative methods. The design must accommodate a large, diverse multimedia archive of digitized materials representing textbooks, teacher materials, tutorials, and assessments that the company has published over the past four decades. It must also support ongoing creation of digital-first learning materials in multiple media formats. BIL’s goal is to atomize these resources into learning objects, to allow for the rapid customization of curriculum to suit more varied learning contexts. Those learning contexts may be based on local curriculum, state standards, and adaptations of the Common Core Standards for Mathematical Practice (CCSS)—or may even be individualized.

The authors are working with XML to organize, search, and remix BIL’s archive. Making all the resources fully searchable and available digitally is a long-range task. We seek to begin with an organizational structure based on a set of Resource Descriptive Framework (RDF) ontologies that associate the following kinds of information:

topic
use (teaching, practicing, assessing, etc.)
curriculum and standards (local, state, and national)
relationships to other topics and materials
client data (assessing competency and tracking usage)

The work involves matching internal BIL resources with external requirements. BIL serves a growing base of over 5 million student users per year, with custom alignment for 22 states. Curriculum to Standards alignment is complicated because there is no formal relationship between district curriculum needs and state and CCSS standards. We are developing a set of ontologies to benefit educators and districts by providing formal ways to comprehend intersections and deviations with standards. This is especially difficult for states that do not follow the CCSS.^[1] In addition, math content providers need to align their contents to distinct learning contexts for training and reviewing math skills.

The online interactive service that BIL hopes to provide will empower teachers, students, and tutors to discover and organize their learning plans. In addition to mainline plans, the system should support projects and tasks for which gaps and problems are identified. It should also support students who move between schools in different states, who may need to adjust quickly to topics they are not prepared for in their new school. A system that can adeptly assist these customizations should also be able to track clients’ use of the system through time to support customized recommendations.

The authors are planning a database storing RDF associations and data pointers that correlate resources, standards, topics, related topics, and client data. We are exploring the drafting of RDF in XML format, and organizing this using eXist-dB. We are also exploring XPath and XQuery for fine-grained searching, retrieving, and visualizing networked data.

Prior research and solutions in educational technology

In the field, much attention has been dedicated to intelligent learning management systems that respond to the needs of learners by assessing and delivering bespoke content. The promise of the semantic web is emphasized by Gottfried Vossen, Miltiadis Lytras, and Nick Koudas: The fundamental social and political impact of the Semantic Web . . . supports a shift of social interaction patterns from ‘knowledge push’ to ‘knowledge pull’. This includes the shift . . . from teacher-centric to learner-centric education. Vossen et. al. see this shift in education as well as health care, government, and business.^[2]

The linked open data of the semantic web intersects public and private sectors. Applications in education, like those in health care as well as business, require networks crossing between open public and secure private domains. Y. Anistyasari et. al. explored the interoperation of learning management systems like Moodle to help students enroll in courses at multiple universities. Cross-enrollment management is based on a publicly shared ontology of course information, permitting individualized calculations of tuition for each school involved.^[3]

Other projects seek to design intelligent or adaptive learning management systems that combine the use of RDF ontologies with individual student data. Monika Rani et. al. observe that designing an LMS to have meta-cognitive awareness argues for the application of machine-readable RDF over the use of a relational database, and that reliance of LMS’s on such databases and client-server applications limits their capacity to adapt flexibly to individual learners. They propose an RDF-based LMS for Computer Science designed on two ontology categories: for domain and for task. They base the domain ontology on a standard ontology for computer science concepts and software, the ACM Computing Classification system, and for the task ontology they apply VARK for classifying different kinds of learning styles (visual, aural, read/write, kinesthetic) to be self-selected by the student who interacts with the system.^[4]

More pertinent to the BIL project is the work of Fernando Díez and Rafael Gil on the Reasoning and Managing System (RAMSys), designed to supervise and support students in writing geometry equations. This is a far narrower application than what the authors are designing for BIL but is relevant for its responsiveness to student input and its application of the OpenMath markup language for guiding and semantically checking student input working with Mathematica software.^[5]

While there are neighboring use-cases for RDF ontologies informing learning management systems, what BIL needs is more of a catalog of its resources. These ontologies to deliver learning objects as needed to instructors as they design lessons and to students as they seek tutoring. Perhaps the most similar to what BIL seeks to design is the model of the intelligent learning management system Multitutor, discussed by Goran Šimić et. al. in 2004. Multitutor was designed in Java with reliance on XML to store course descriptions, with the idea of making materials reusable in multiple course contexts. The system provided authoring tools for instructors to organize their own courses and track students’ progress, and it involved administrator, teacher, and student levels of access.

The system is designed to support changeable navigation possibilities to the student. It provides the dynamic creation of the learning materials . . . The tutor is the main part of the system architecture. It is the system coordinator, dispatcher, and monitor at the same time. The pedagogical strategies are implemented in the tutor. It analyzes the data of the student model (model of particular student) and uses its teacher knowledge to require the proper learning contents. Tech expert module maintains the references of domain knowledge and rule base. The reasoning machine processes the request of the tutor and composes the learning content. The content can include the text, the picture, or some other multimedia. In the test phase the content is represented by the test sets or by the problems that students have to solve. These contents the tutor sends back to the servlets.^[6]

Multitutor permits teachers to customize and organize the learning experience, with the system brokering delivery of customized content to students. Optimally, the system can respond to a student’s need for review by connecting related materials relevant to student competence with assessed skills and tasks.

The authors have begun experimentally drafting RDF/XML to incorporate existing ontologies in order to describe resources and their interconnectedness. We present this paper at a moment when we face serious questions about how best to adapt existing RDF ontologies for education to ontologies describing mathematical concepts. While we seek to work with existing ontologies, we need to determine at what point and for what purposes a new ontology will be required based on BIL’s needs and application. We also face serious concerns about how best to implement a functional and adaptive content delivery service, and how much to deploy XML stack technologies in BIL’s existing development workflow.

Pedagogical objectives and market needs

The marketplace for learning materials is changing. Classrooms continue to be more connected and more digital-friendly. At the same time, the divide between urban and rural, poor and affluent, diverse and homogeneous is more pronounced in the digital learning environment than in the physical classroom. Market stakeholders have a duty to enable all teachers and all students.

BIL’s needs for next-generation digital classrooms require improvements to our resource correlation and usage. Teachers, administrators, and the community need to know that their limited resources are used to benefit all their students. Learning materials need to be accessible for all students and teachers. Technology must help to lower the bar for entry, not raise it. One way to eliminate barriers may be to improve the alignment of resources with standards, regardless of medium.

Historically, BIL’s digital content has been written, aligned, and correlated from a print-first perspective, meaning that standards alignment, remediation resources, and curriculum coordination occur in terms of the print page. This presents several challenges in converting print resources to digital and/or interactive web content, while adding limited value to the teacher. Nevertheless, much of this content has demonstrated efficacy across decades of use and needs to be preserved if not enhanced.

The first challenge is that many of BIL’s print resources are re-used across multiple products and programs, which complicates proper alignment and correlation. Poor information design leads to a mix of cloning and re-use. This becomes increasingly difficult for resources like digital assessment questions, when the same assessment question may be used in multiple products or included in a custom assessment created by a teacher.

The second challenge is to guide a teacher or student user to appropriate remediation materials. In the current system, the best we can do is to direct the user back to the lesson that teaches a concept. While this may be appropriate for the simple general case, it does little to precisely address a student’s needs.

The third challenge is to empower users to create custom curriculum content. Historically, textbook publishers provide a canonical curriculum to users, with the expectation that teachers will follow it as laid out in the print books. With the increase in online learning, teachers expect the ability to tweak their curricula to meet the needs of their classrooms and individual students. It is a straightforward task to provide users the ability to add and remove lesson content. However, customizing a lesson risks invalidating its correlations to standards and curriculum (i.e., x lesson teaches y required topic). If a teacher removes a component of one lesson, does it still teach to the state standard? Does it provide proficiency for a given measurable Learning Objective? A successful customization tool must do more than enable remixing of the print content. The customized plan must be meaningful, measurable, and accountable to the educational requirements it serves.

A look at the correlation problem in context

When we look at U. S. state mathematics standards to assess objectives and learning paths, we quickly see that a major shortcoming of nearly every alignment mechanism is that the standards are composite in nature. A single state standard often concatenates several individual skills, competencies, and facts into one bullet point. Let’s look at an example of this in the Common Core Mathematics Standards, Grade 2, since many state standards are in fact, simple variants of this standards program.

When looking at the domain Operations and Algebraic Thinking, we see that a single standard, CCSS.MATH.CONTENT.2.OA.A.1, states that a student should be able to Use addition and subtraction within 100 to solve one- and two-step word problems involving situations of adding to, taking from, putting together, taking apart, and comparing, with unknowns in all positions. If we take a moment to decompose all the tasks that this single standard covers, we see that there are a number of discrete skills that all combine to achieve proficiency in this standard:

Use addition within 100 to solve one- and two-step problems,
Use subtraction within 100 to solve one- and two-step problems,
Understand decomposition in order to put together,
Understand decomposition in order to compare,
Understand decomposition in order to take apart,
Use symbols as variables in equations,
Use symbols as variables in drawings.

While this breakdown might not be expressly outlined in pedagogy, it nevertheless shows that in order for a student to master a single standard, they actually need to master several smaller skills. At the end of the day, the teacher is responsible for the student being able to accurately pass assessment of this standard, whether or not they are provided with distinct resources for each of these components.

When we apply this insight to the generality of our current alignment and remediation mechanisms, it becomes clear that there is room for improvement in our digital offerings. For example, if we have a second-grade learner, little Bobby DropTables, and they fail an assessment question aligned to this example standard CCSS.MATH.CONTENT.2.OA.A.1, what can we offer in terms of remediation? Did they fail the question because they don’t understand addition within 100? Is it because they don’t understand how to use symbols as variables in an equation? Or, is it because they are missing or forgetting some fundamental prior knowledge skill or concept?

Currently, our digital offerings have little capability to offer such insight, and it falls squarely on the shoulders of both teacher and student users to perform this analysis, for each student, for each standard, for each assessment. This ambiguity, coupled with our disconnected remediation offerings, brings to the forefront the challenges that we wish to overcome when serving digital content to our users.

Our proposed solution to the competency alignment problem

Introducing the competency graph

One of the beautiful things about mathematics is that it is a progressive, cumulative discipline. While it is true that many states provide their own distinctive state mathematics standards, they all cover the same material, varying primarily on cadence and progression. Whether you live in Arkansas or California, you calculate a percentage in the same way. Students in Puerto Rico and Illinios know the same Quadratic Equation. It is this immutability, the fundamentally progressive way in which mathematics is taught and learned, that enables us to propose our Competency Graph.

The study of mathematics is in part the progressive attainment of discrete skills. Certain competencies require certain other prior knowledge competencies. Looking at the prior example, breaking down a single standard into its composite parts, we get a feel for the level of granularity that a competency graph can express. Essentially, the competency graph is a low-level knowledge framework that underpins a state standards set, or our internal classification system of measurable learning objectives.

The immediate benefits of developing the competency graph can be expressed in two distinct areas. The first is standards correlation; by mapping a standards set to our competency graph, and also mapping our lesson content | resources | curriculum data to the same competency graph, we provide an accurate alignment at a highly granular level. For example, if we map state standard CCSS.MATH.CONTENT.2.OA.A.1 to competencies A , B , and C , and also map a 3-page lesson to the same competencies, then we gain standards alignment through the proxy of the competency graph. The subtle but important distinction is the shift from This lesson covers this standard because they are aligned with each other, to lesson A and standard X are aligned through their intersection of competencies. This is the basis for supporting alignable custom curricula.

This new alignment perspective offers superior accuracy and flexibility. By aligning directly to individual resources, instead of to the container of the curriculum (i.e., the Lesson), we gain the accuracy and granularity for remediation support that serves our users’ market and pedagogical needs. We also realize a substantial increase in alignment efficiency, due to the fact that a lesson’s alignment to a given standards set is now inherited through the competency graph, by virtue of the alignment of the contents within the lesson.

Finally, we need to consider the analysis of prior knowledge requirements for competencies. Due to the linked nature of the competency graph, with any given node being aware of its immediate prior knowledge dependencies, we are also positioned to query this information for remediation. If a student misses an assessment question on the Quadratic Equation, not only can we provide resource links to target the teaching of the competency, but also its prior knowledge dependencies. By surfacing these knowledge dependencies to teachers and students at point of use, we offer a valuable analytical tool that can be used to help diagnose underlying issues. This becomes especially important in higher grade bands, when topics become complex and there is a greater probability of the assessment failure being a symptom of their misunderstanding of a prior competency.

Constructing RDF for mathematics education

Ontology vocabularies abound to express the organization of educational materials for general delivery of content, assessment of skills, and indications of prerequisite knowledge. However, we found ourselves unexpectedly lacking in RDF models for sequencing educational materials in mathematics. Looking outside of RDF vocabularies for education, though, we found some impressively complex ontologies of mathematics concepts, which could be associated with educational concepts using the subject-predicate-object construction of RDF triple-stores. A very detailed mathematics ontology we have found so far is OntoMathPro, developed by a research group at the Federal University of Kazan (Russia): https://ontomathpro.org/. The developers write, We are going to create an ecosystem of datasets and mashups around the ontology, which suggests use in modeling mathematical applications.^[7]

We think of RDF for mathematics education as a network that students, teachers, and their assistants (both human and machine) will traverse in multiple directions, and through which there is not just one simple linear path of progression. Structuring the basis for triple-stores gives us a basis for organizing math concepts with educational content and curricular activities.

We have chosen to separate our local BIL data into several distinct buckets for the purposes of prototyping. These categorizations are not entirely superficial, however. Given the expected size of our data set, we sought value in separating each type of data into its own collection for maintenance and governance purposes. Our collections are:

curriculum.rdf, which stores all linked-list style containers for representing curricula and lesson structure. Conceivably, we would house custom curriculum data in its own collection. Realistically, we could expect to break this out into multiple collections, possibly along program or product lines.
elements.rdf, which defines all of our custom data types for competencies, learning resource types, curriculum container types, such as lesson, section, and chapter, along with custom relationship types, depth-of-knowledge (DOK) alignment classes, and any other proprietary data types. This collection effectively houses the custom BIL namespace.
learningObjects.rdf, which stores all instances of learning resources. Again, we will probably find ourselves in a position where we need to separate along resource types for maintenance and processing purposes. To provide a little context into the data volumes we expect, our assessment question bank alone represents well over 1,500,000 entries. We can realistically expect two to three times that volume in ancillary, consumable resources after a few years of production, not to mention multimedia assets, interactive tools and widgets, as well as static content modules, which typically number between 500-1000 per grade.

The ontologies that we have chosen to implement alongside our custom BIL namespace are SKOS (Simple Knowledge Organization System) and LRMI Metadata Specification. The SKOS namespace provides the base concept class, upon which we can construct our competency class, along with several semantically expressive labels, such as prefLabel, editorialNote, altLabel, and hiddenLabel. Availability of multiple label tags is important not only for editorial and maintenance purposes, but also for providing alternate names for competencies. Our labelling scheme will likely be leveraged in search functions, and it should allow us a certain degree of flexibility to support custom nomenclature for specific state customizations. Additionally, the canonical prefLabel allows us to provide competency names in multiple languages. The following is an example of our base competency definition:

                        
<rdfs:Class rdf:ID="competency">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
    <skos:prefLabel xml:lang="en-US">Competency</skos:prefLabel>
    <skos:definition>Root competency class. This should be 
    extended by competency subclasses.</skos:definition>
</rdfs:Class>

Another important vehicle provided by SKOS is the base relationship class that we use to build our Prior Knowledge bridge. Even with SKOS’s extensive collection of transitive and hierarchical relationships, we thought it appropriate to define a custom extends relationship:

<rdfs:Class rdf:ID="extends">
    <rdfs:subClassOf rdf:resource="http://www.w3.org/2004/02/skos/core#related"/>
    <skos:prefLabel xml:lang="en-US">Extends</skos:prefLabel>
    <skos:closeMatch rdf:resource="https://ceds.ed.gov/element/000869/#Prerequisite"/>
    <skos:definition>
        A semantic relationship to show that a concept, skill, or strategy 
        'builds upon' another competency. Implies a logical 'requirement', 
        and is disjoint with 'dc:requires', i.e., a competency either
        'dc:requires' or 'bil:extends' another competency, but not both.
    </skos:definition>
</rdfs:Class>

The last major component we are using from the SKOS namespace is the orderedCollection and memberList properties, which allow us to store lists of links to other container resources. The LRMI namespace provides us with learningResource and learningResourceType, which we use to define our resource instances (colloquially referred to as Learning Objects), as well as to provide structure for storing our curriculum data. We see here, an example of how we are storing curriculum data as a list-like container.

                        
<lrmi:learningResource rdf:ID="learningResource/curriculum/NA/algebra-1">
    <lrmi:learningResourceType rdf:resource="learningResource/type/curriculum"/>
    <dc:title>Big Ideas Learning Algebra 1</dc:title>
    <skos:OrderedCollection>
        <skos:memberList>
            <lrmi:learningResource rdf:resource="Select Resource Type/my-lesson-plan"/>
            <lrmi:learningResource rdf:resource="learningResource/curriculum/NA/algebra-1/chapter/2"/>
        </skos:memberList>
    </skos:OrderedCollection>
</lrmi:learningResource>

In the above example, we note that the curriculum object is an LRMI:learningResource, with a learningResourceType attribute, which points to a definition in elements.rdf, and an orderedCollection container from the SKOS namespace, which contains references to child container lists.

RDF/XML and the development of the competency graph

Early in the design phase, we needed a mechanism to regulate and standardize both our data structures and the semantic relationships between them. By leveraging RDFS namespaces such as LRMI and SKOS, we were able to design an XML data schema that provided consistent relationships, meaningful tag names, and highly structured collections upon which we could apply validation to ensure consistency and homogeneity when creating our initial data sets.

In addition to its advantages for validation, structuring our data in RDF/XML allows us to explore a number of relational representations not possible in a SQL environment. This flexibility, coupled with the ability to use attributes such as rdf:resource as pointers, or weak foreign key references, we were able to organize our data into simple collections with shallow hierarchies in an easily readable and highly queryable state.

RDF also allows us to express our data in a robust, sustainable vernacular that requires little transformation between persistent data and its natural language origin. The ability for us to capture contextual, lexical, and pedagogical metadata in a human-readable format should empower our internal subject matter experts to work in data much closer to the persistence layer, which, in turn, helps to increase our data transparency and accuracy by reducing the amount of transformation that our data must undergo between entry and storage.

Implementation challenges

The challenge of resource identification and referencing

Creating links to resources poses a serious challenge considering the storage of our curriculum data. According to our system, a learning object resource is simply a pointer to a digital asset, and we have chosen to make each container’s member list a collection of pointers to resource identifiers:

                        
<lrmi:learningResource rdf:ID="learningResource/curriculum/NA/algebra-2">
    <lrmi:learningResourceType rdf:resource="learningResource/type/curriculum"/>
    <dc:title>Big Ideas Learning Algebra 2</dc:title>
    <skos:OrderedCollection>
        <skos:memberList>
            <lrmi:learningResource rdf:resource="learningResource/curriculum/NA/algebra-1/chapter/1"/>
            <lrmi:learningResource rdf:resource="learningResource/curriculum/NA/algebra-1/chapter/2"/>
        </skos:memberList>
    </skos:OrderedCollection>
</lrmi:learningResource>

And, for one of those resources listed within the container, we have another entry, like this:

                        
<lrmi:learningResource rdf:ID="learningResource/curriculum/NA/algebra-1/chapter/1">
    <lrmi:learningResourceType rdf:resource="learningResource/curriculum/chapter"/>
    <dc:title>Chapter 1: The 0th Chapter</dc:title>
    <skos:OrderedCollection>
        <skos:memberList>
            <lrmi:learningResource rdf:resource="learningResource/curriculum/NA/algebra-1/chapter/1/lesson/1.1"/>
        </skos:memberList>
    </skos:OrderedCollection>
</lrmi:learningResource>

This implementation should allow us to store each container as a free node, rather than being structurally and intrinsically bound to a single containing resource. This is important when we take customization support into consideration: A single BIL lesson may be referenced by n containers, and we do not want to have to search the entire collection of customized curricula for every instance of this single lesson resource when changes are made, so, we store it as an independent node, and then link to the resource wherever it needs to be included. This way, should we need to make changes to the source node, we update in one place, and all references pull the updated data. As we look at an entry for a single resource (not a container), it should be noted that there are two vital pieces of information that need to be captured here:

The resource’s RDF ID: this is how the database knows to reference and locate a specific resource.
The digital resource ID, which will be passed to a content server for retrieval within Big Ideas learning platforms.

Canonically, we recognize that an IRI represents a unique, resolvable address to a resource. We chose not to use the physical resource ID as the RDF ID for the following reasons:

Increase flexibility when integrating with our content servers. By providing a system ID instead of an absolute IRI, we eliminate the need to host content in a static location. For example, a video resource moving to a new cloud host would require us to update IRIs on all affected video resources if we had hardcoded the absolute address of the resource.
Resources represented in RDF are just pointers. BIL houses its digital resources across multiple CDN delivery systems, so any application that integrates with our content base will do so through a content proxy. This layer of abstraction allows our applications to be agnostic to the content, since it is the responsibility of the content proxy to fetch and return the target resource.
Semantic human-readable Resource IDs assist in data maintenance and governance. By implementing IRIs as semantic paths to each entity, we gain important contextual awareness of each IRI within the greater namespace. For example, implementation of UUIDs such as 6afc32b7-8b73-4b42-bb03-08af18ab5655 ensures uniqueness but neither provides nor receives context or purpose from the identifier itself. If we instead rely on a namespace hierarchy, such as Arjuna/LearningObject/MediaElement/Video/6077473635001, we can ensure uniqueness through validation of each segment of the hierarchy (i.e., unique values at any given depth), while retaining context and purpose though the IRI. It becomes possible, then, to ensure that an entity has consistent, singular representation across multiple systems while retaining contextual value within the ID itself. In essence, by creating 'resource namespaces' within our learning object data, we provide another measure of organization and control.

This implementation decision is the product of much internal debate and research, and represents what we feel is the most stable, scalable solution to the challenge of naming and identifying resources. We welcome any insight or observations into improving our resource identification and storage mechanisms.

Exploring an XML database for content management supported by RDF/XML

One of the biggest challenges our team faces is determining the most appropriate technology stack for a production-grade application. Early development and prototyping with the XML database eXist-db has proven to be more than capable in terms of data manipulation, serving, and storage. However, we do face some significant constraints.

Tech team background: XQuery's FLWOR expressions represent a significant shift for the engineering team's experience with data manipulation and processing. We learned from early prototyping that development velocity lags when designing and writing more complex filtering and querying, due to the team’s unfamiliarity with XQuery. This issue has been partially remedied by the development of a custom JavaScript API which allows our developers to interact with eXist-db in a context closely resembling the Fetch API.
Hardening and scaling: While the challenges of maintaining, tuning, scaling, and securing a new database technology are not unique to eXist-db, the engineering team is not currently equipped to absorb all facets of securing and scaling new database technology. Enterprise partnership and support would help to mitigate this concern as we seek a scalable database solution.
Security: Authentication at the server level may present a challenge in implementing secure write operations from external client requests. This is an ongoing area of investigation.

RDF can be serialized in many different ways, and XML is one of the oldest. Nowadays it is certainly more common to see expressions of RDF in Turtle syntax or JSON-LD, yet RDF itself can be shared in multiple formats as needed. The authors have been writing and modeling RDF in XML (rather than Turtle or JSON-LD) for the following reasons:

Legibility: RDF written in well-formed XML is precise and legible in representing relationships among resources via attributes on the XML element tree. This should be easy for the core team to write and maintain as a central source of truth for conceptual organization of the project.
Validation: Maintaining the conceptual RDF framework at the core of the project should require more precise validation beyond checking for correct use of RDF vocabularies. Checking against the semantic web of linked data standards on its own can be served by the w3c validating services at https://www.w3.org/RDF/Validator/or https://www.w3.org/2015/03/ShExValidata/. However, validation needs to be customized much more precisely to keep relationships simple, to control use of appropriate namespaces, to delimit acceptable values and ranges, and define valid datatypes where needed. For this purpose, we are exploring powerful validation tools such as Relax NG and Schematron.
Querying and Transformation We are exploring XPath, XQuery, and XSLT as tools for precise querying as well as serializing data in syntaxes we need to interact with multiple web services.

Having begun work with RDF/XML for these reasons, we are aware that we can serialize it as JSON or JSON-LD, which gives us a wide range of considerations for how best to deploy a system based on our abstract data model.

If our RDF/XML serves as an index and central nexus point for coordinating access to resources, the BIL tech team will need to be querying it regularly, and of course RDF/XML can be transformed for querying into JSON, JSON-LD, or GraphQL. A web service running XSLT 3.0 that mediates between JSON and XML might simplify validation and maintenance, and serialize the database outputs as needed for querying.

We close, then, with these questions:

Is RDF/XML the best format for legible declarative expression of our data structure with robust schema validation? Or is JSON expression, validation, and querying comparable and sufficient for BIL’s requirements?^[8]
Is an XML database actually necessary for us, even if we are expressing our abstract data model and structure in RDF/XML?
If we continue to work with RDF/XML at the core of our system architecture, should we serialize it in a JSON output format for database implementation?

Thanks to XPath, XSLT, and XQuery 3 specifications, we know that we can now transform XML to JSON, and JSON to XML, which gives us a wide range of database options to consider implementing in the BIL technology stack. Going forward, we need to evaluate these decisions based not only on a continually evolving technology landscape, but also on the particular resources, technology requirements, and implementation needs of BIL and the community it serves.

Conclusion

The development and implementation of a semantic data web has been an educational journey for the Big Ideas team over the last several months, and we are excited to continue down this path in developing a foundation for building resources and tools that uplift teachers and students across the country. Our investigation into RDF data storage promises a number of solutions to some of our biggest analytical and remediation challenges. It promises to improve how accurately we match teachers and students with the resources they need. It promises to empower our team of mathematics experts to create valuable educational material in ways not possible before. And it raises serious questions for us in moving from abstract modeling to implementation.

We would like to take a moment to thank the Balisage community for providing us with the opportunity to share our story. We are eager to continue the discussion of our development and investigation with this community.

^[1] Although the Common Core was intended to apply to all states in the US, in practice fewer than half the states have adopted it. Among the states and territories that did not adopt the Common Core early on are Alaska, Indiana, Texas, Nebraska, North Carolina, Florida, Virginia, and Puerto Rico.

^[2] Gottfried Vossen, Miltiadis Lytras, and Nick Koudas, Editorial: Revisiting the (Machine) Semantic Web: The Missing Layers for the Human Semantic Web, IEEE Transactions on Knowledge and Data engineering 19:2 (February 2007) 145-148. https://www.researchgate.net/publication/3297654_Editorial_Revisiting_the_Machine_Semantic_Web_The_Missing_Layers_for_the_Human_Semantic_Web. Accessed 20 May 2021.

^[3] Y. Anistyasari, R. Sarno, and N. Rochmawati, Designing learning management system interoperability in semantic web, IOP Conference Series Material Science and Engineering 296: 1 (January 2018). https://iopscience.iop.org/article/10.1088/1757-899X/296/1/012034. Accessed 20 May 2021.

^[4] See Monika Rani, Kumar Vaibhav, and O. P. Vyas, An Ontological Learning Management System, Computer Applications in Engineering Education. Wiley Online Library, 2016. https://arxiv.org/pdf/1708.09475.pdf. Accessed 20 May 2021.

^[5] Fernando Díez and Rafael Gil, An Ontology Based Semantic System for Mathematical Problem Solving, in Computers and Education: Towards Educational Change and Innovation, ed. António José Mendes, Isabel Pereira, and Rogério Costa. London: Springer, 2008, 229 - 240. https://link-springer-com.ezaccess.libraries.psu.edu/content/pdf/10.1007%2F978-1-84628-929-3.pdf. Accessed 20 May 2021.

^[6] Goran Šimić, Dragan Gašević, and Vladan Devedžić, Semantic Web and Intelligent Learning Management Systems, in Proceedings of the International Workshop on Applications of Semantic Web Technologies for E-Learning 2004. https://static.aminer.org/pdf/PDF/000/263/001/the_case_for_conditional_rules_and_actions_in_learning_management.pdf. Accessed 21 May 2021.

^[7] See the OntoMath GitHub repo at https://github.com/CLLKazan/OntoMathPro. Last Updated 4 August 2017. Accessed 20 May 2021. There is also a ScienceWise ontology for math under construction at http://sciencewise.info/ontology/.

^[8] Seva Safris has written extensively on the comparative merits of XML and JSON, finding JSON to be simple and XML to be powerful, in part for its greater capacities for validation. Seva Safris, A Deep Look at JSON vs. XML, Part 3: XML and the Future of JSON, Toptal Engineering Blog n. d. (est. 2018 or later). https://www.toptal.com/web/json-vs-xml-part-3. Accessed 11 July 2021.

Author's keywords for this paper:

linked open data; semantic web; RDF; RDF/XML; ontology; interface; search interface; network; graph; mathematics; education; educational technology; educational planning system; XQuery; XML database; graph database; XSLT; XPath; schema validation; Relax NG; Schematron; eXist-dB; JSON; JSON-LD

BalisageThe Markup Conference2021