Balisage Paper: Efficient scripting
David A. Lee
Principal senior software engineer
Epocrates, Inc.
<dlee@epocrates.com>
David Lee has over 20 years experience in the software industry responsible for many major projects in small and large companies including Sun Microsystems, IBM, Centura Software (formerly Gupta.), Premenos, Epiphany (formerly RightPoint), WebGain. As principal senior software engineer at Epocrates, Inc., Mr Lee is responsible for managing data integration, storage, retrieval, and processing of clinical knowledge databases for the leading clinical information provider.
Key career contributions include Real-time AIX OS extensions for optimizing transmission of real-time streaming video (IBM), secure encrypted EDI over internet email (Premenos), porting the Centura Team Desktop system to Solaris (Gupta,Centura), optimizations of large Enterprise CRM systems (Epiphany), author of xmlsh an open source scripting language for XML.
Norman Walsh
Principal Technologist in the Information & Media group
Mark Logic Corporation
<ndw@nwalsh.com>
Norman Walsh is a Principal Technologist in the Information & Media group at Mark Logic Corporation where he assists in the design and deployment of advanced content applications. Norm is also an active participant in a number of standards efforts worldwide: he is chair of the XML Processing Model Working Group at the W3C where he is also co-chair of the XML Core Working Group. At OASIS, he is chair of the DocBook Technical Committee.
Before joining Mark Logic, he participated in XML-related projects and standards efforts at Sun Microsystems. With more than a decade of industry experience, Mr. Walsh is well known for his work on DocBook and a wide range of open source projects. He is the principle author of DocBook: The Definitive Guide.
Copyright © 2009 David A. Lee and Norman Walsh. Used by permission.
Abstract
The efficiency and performance of individual XML operations such as parsing, processing (XSLT, XQuery) and serialization, and the merits of different in-memory document representations, have been widely discussed. However, real world uses cases often involve many operations orchestrated using a scripting environment. The performance of the scripting environment can often overshadow any performance gains in individual operations. In an exploration of real world scripting, we compare performance of several scripting languages and techniques on a set of typical XML operations such as generation of a table of contents and conditionally accessing non-XML files identified in XML documents. Based on performance results, we suggest best practices for scripting XML processes. Scripting languages compared include DOS Shell (CMD.EXE), Linux Shell (bash), XMLSH, and XProc (calabash). These are run (where possible) on multiple operating systems: Windows XP, Linux, and Mac/OS.