Appendix: Database Software, Scripting Languages, and XML

XML

XML is a more recent technology with less of a track record than the database, but with a growing following in the digital humanities and among librarians and archivists who believe that it will stand the test of time—unlike database files. XML does not have the extensive built-in search features that database programs do, though with some additional fuss you can search XML documents in complex ways. XML is probably best for a historical website with a circumscribed, unchanging set of historical documents that are mostly text. For instance, the Virginia Center for Digital History used XML for their collection of primary documents (including poignant “runaway advertisements”) in their project The Geography of Slavery.8 XML is also a good choice for archives that want to share their contents with other related collections because the simple text format XML is written in is, like HTML, rapidly becoming a lingua franca on the web.9 For example, the Cornell University Library, the University of Michigan Library, and the State and University Library of Göttingen combined their historical collections of mathematical works using XML.10

Getting started with XML is in some respects much easier— and in others much harder—than using a database. On the one hand, you can use a rudimentary text editor like Notepad to create XML documents (though more sophisticated text editors are very helpful) because XML is simply text with added tags. On the other hand, compared to databases, working with XML is a highly unstructured affair. Indeed, you create much of that structure yourself, which is both its beauty and its peril. Although you need to define columns in database programs, many common types of definitions often come preset—for example, database programs have built-in formats for dates and times, and ways to generate such information automatically. By contrast, in XML you have to define each of these elements yourself, although occasionally you can borrow a set of definitions, as with TEI (see Chapter 3). XML’s flexibility—its ability to tag any bit of text in a document any way you like, highlighting words or phrases as you would with a set of differently colored highlighters on paper—can easily breed unwieldy complexity if you are not careful.

Just as databases require scripting languages like ASP and PHP to pluck information from the database and place it into a web page, XML documents require translators to convert them into HTML for web viewing. Although the very same scripting languages can take care of this task of “parsing” the XML into its constituent parts and putting those parts into a web template, the World Wide Web Consortium has two technologies specially designed for this task: XSL and XSLT. The Extensible Stylesheet Language (XSL) provides a set of codes for formatting XML elements; the related Extensible Stylesheet Language Transformations (XSLT) converts an XML document with an associated XSL stylesheet into an HTML file that can be viewed in any web browser. With XML/XSL/XSLT (a confusing alphabet soup, to be sure), you can create a formatting template for your XML documents—say, boldfacing the names of authors within each document, taking the date of each letter and right-justifying it—and the server will convert each XML document on request into that format for your web visitors. No need to create separate HTML pages for your web archive; merely create your XML documents and a translator will take care of the rest.

For example, a version of Frederick Jackson Turner’s The Frontier in American History prepared in XML and made available for the web, might have the following pieces. First, like any XML document, it must have a Document Type Definition (DTD), in this case defining different parts of the text such as the overall title to the work, the chapter titles, and paragraphs and notes:

<!ELEMENT doc (title, chapter*)>

<!ELEMENT chapter (title, (para|note)*)>

<!ELEMENT title (#PCDATA)*>

<!ELEMENT para (#PCDATA)*>

<!ELEMENT notenumber (#PCDATA)*>

<!ELEMENT note (#PCDATA)*>

<end box>

XML tags are added to the raw text of Turner’s Frontier:

<!DOCTYPE doc SYSTEM "doc.dtd">

<doc>

<title>The Frontier in American History</title>

<chapter>

<title>Chapter 2:The First Official Frontier of the Massachusetts Bay</title>

<para>In “The Significance of the Frontier in American History,” I took for my text the following announcement of the Superintendent of the Census of 1890: “Up to and including 1880 the country had a frontier of settlement but at present the unsettled areas has been so broken into by isolated bodies of settlement that there can hardly be said to be a frontier line. In the discussion of its extent, the westward movement, etc., it cannot therefore any longer have a place in the census reports.” Two centuries prior to this announcement, in 1690, a committee of the General Court of Massachusetts recommended the Court to order what shall be the frontier and to maintain a committee to settle garrisons on the frontier with forty soldiers to each frontier town as a main guard.<notenumber>1</notenumber> In the two hundred years between this official attempt to locate the Massachusetts frontier line, and the official announcement of the ending of the national frontier line, westward expansion was the most important single process in American history.</para>

<note>1. Massachusetts Archives, xxxvi, p. 150.</note>

</chapter>

</doc>

<end box>

An XSL stylesheet specifies how to take each part of this XML document and, by mixing its pieces (as defined in the DTD) with HTML tags, turn them into a new document (technically, an XHTML document) that a web browser can understand:

<xsl:stylesheet version="1.0"

xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns="http://www.w3.org/TR/xhtml1/strict">

<xsl:strip-space elements="doc chapter "/>

<xsl:output

method="xml"

indent="yes"

encoding="iso-8859-1"

/>

<xsl:template match="doc">

<html>

<head>

<title>

<xsl:value-of select="title"/>

</title>

</head>

<body>

<xsl:apply-templates/>

</body>

</html>

</xsl:template>

<xsl:template match="doc/title">

<h1>

<xsl:apply-templates/>

</h1>

</xsl:template>

<xsl:template match="chapter/title">

<h2>

<xsl:apply-templates/>

</h2>

</xsl:template>

<xsl:template match="para">

<p>

<xsl:apply-templates/>

</p>

</xsl:template>

<xsl:template match="notenumber">

<sup>

<xsl:apply-templates/>

</sup>

</xsl:template>

<xsl:template match="note">

<p class="note">

<xsl:apply-templates/>

</p>

</xsl:template>

</xsl:stylesheet>

<end box>

The resulting document looks like this:

<?xml version="1.0" encoding="iso-8859-1"?>

<html xmlns="http://www.w3.org/TR/xhtml1/strict">

<head>

<title>The Frontier in American History</title>

</head>

<body>

<h1>The Frontier in American History</h1>

<h2>Chapter 2: The First Official Frontier of the Massachusetts Bay</h2>

<p>In “T>he Significance of the Frontier in American History,” I took for my text the following announcement of the Superintendent of the Census of 1890: “Up to and including 1880 the country had a frontier of settlement but at present the unsettled areas has been so broken into by isolated bodies of settlement that there can hardly be said to be a frontier line. In the discussion of its extent, the westward movement, etc., it cannot therefore any longer have a place in the census reports.” Two centuries prior to this announcement, in 1690, a committee of the General Court of Massachusetts recommended the Court to order what shall be the frontier and to maintain a committee to settle garrisons on the frontier with forty soldiers to each frontier town as a main guard.<sup>1</sup> In the two hundred years between this official attempt to locate the Massachusetts frontier line, and the official announcement of the ending of the national frontier line, westward expansion was the most important single process in American history.</p>

<p class=“note”>1. Massachusetts Archives, xxxvi, p. 150.</p>

</body>

</html>

<end box>

In a web browser, this XHTML document would render roughly like this:

The Frontier in American History

Chapter 2: The First Official Frontier of the Massachusetts Bay

In “The Significance of the Frontier in American History,” I took for my text the following announcement of the Superintendent of the Census of 1890: “Up to and including 1880 the country had a frontier of settlement but at present the unsettled areas has been so broken into by isolated bodies of settlement that there can hardly be said to be a frontier line. In the discussion of its extent, the westward movement, etc., it cannot therefore any longer have a place in the census reports.” Two centuries prior to this announcement, in 1690, a committee of the General Court of Massachusetts recommended the Court to order what shall be the frontier and to maintain a committee to settle garrisons on the frontier with forty soldiers to each frontier town as a main guard.1 In the two hundred years between this official attempt to locate the Massachusetts frontier line, and the official announcement of the ending of the national frontier line, westward expansion was the most important single process in American history.

1 Massachusetts Archives, xxxvi, p. 150.

As you can see, compared to programming languages like ASP or PHP, XSL and XSLT are fairly straightforward text formats like XML and HTML—that is, they are written in plain English, with few embellishments other than colons and brackets. Their syntaxes still require an extra effort to learn, and that effort should not be underestimated, but XSL and XSLT do not require mastering sometimes complex mathematical elements that are found in web programming languages like ASP and PHP, such as arrays, functions, and logic, in addition to linguistic constructions. Nevertheless, if your head is spinning even slightly from this brief discussion of XML/XSL/XSLT, you will likely have to outsource the creation of these more complex documents and translators. Unfortunately, because databases have been around for so much longer than XML and especially XSLT, many more programmers know how to create websites with a database and scripting language than with these newer technologies. Furthermore, many more prepackaged (and often free) web tools using databases rather than XML currently exist. For instance, almost all forum software—which could use either XML or databases—is written for the latter. Tens of thousands of programmers who know how to create websites using the free database software MySQL and the programming language PHP are available; far fewer know XML and XSLT well. This situation will likely change in the coming years, and XML’s large following in the digital humanities provides a source from which to draw strength—and, we hope, some advice about implementing this promising technology.

8 Tom Costa, The Geography of Slavery in Virginia: Virginia Runaways, ↪ link A.8.

9 Daniel J. Cohen, “History and the Second Decade of the Web,” Rethinking History 8 (June 2004): 297-98, ↪ link A.9.

10 Cornell University Library, Distributed Digital Library of Mathematical Monographs, ↪ link A.10.