Getting Started: The Nature of Websites, and What You Will Need to Create Yours

Databases and XML

or archival or gallery sites that have hundreds or thousands of artifacts or documents to display, for sites gathering history online, or for running online discussions, historians will likely need at least one of two advanced technologies, the database or XML, and one of several non-HTML programming languages that often accompany these technologies. Both the database and XML are storage systems for materials arranged in a formal manner, and thus help deal with caches of documents or other materials that exhibit elements of repetition (a set of notecards or letters, a series of comments about a topic, or a slate of encyclopedia entries). XML and databases have structures for containing critical bits of information about historical objects, such as the author of a document or the dates of a battle. Researchers using sites that employ these technologies can examine the highlighted information in extremely useful ways. For instance, a database or XML archive of a thousand letters to and from Ralph Waldo Emerson could enable very precise searches by a range of dates, correspondents, keywords such as “wonder” or “slavery,” or a combination of all three–thus allowing for more penetrating historical analysis by providing responses to questions such as, “To whom did Emerson write about abolitionism in the 1850s?” A thousand normal web pages with these same letters wouldn’t be nearly as useful because there would be no way of combing through it with such specificity.

In some respects, databases and XML are similar technologies. Each allows you to define information such as “author” or “date” and then encode historical materials using those definitions. Yet they do this task in very different ways. XML is much like an HTML document: pure text with tags surrounding words or passages, in this case, representing the definitions such as author or date. Databases generally store their information in less readable files (that require a database program, not a simple text editor, to access them), and mandate that the bits of information one wants to highlight get separated from the main document into distinct “columns” or “fields.” For example, the date of one of Emerson’s letters would remain at the top of the full text of that letter in XML—though wrapped by informational tags—whereas the same letter in a database version would have the main text of the letter in one column and the date in another. The advantage of XML for archival websites is therefore that it permits in situ definitions. If you want to number the paragraphs of Shakespeare’s plays and note where he fashioned new words, XML works extremely well. In short, XML works particularly well with sites that focus on historical texts.

The advantages of databases include the ease of updating entries (e.g., changing the dates of a hundred documents without editing each one), logging transactions (e.g., when did the visitor last view a document), and perhaps most important the native ability to search for matching records in a variety of ways (and with a sophisticated language called SQL). Such features make it a natural technology for history sites that involve forums, the gathering and editing of historical materials, or the membership rolls of a historical society, as well as many online archives. Because of these innate, robust features and long history of use (even before the web), sites using databases are far more prevalent than sites using XML, although recently the plain text simplicity of XML has led to an accelerating use of the technology for the easy sharing of information between websites. Because both technologies require more specialized knowledge that would be somewhat out of place in this introduction to the mainstream web, including an understanding of more complicated, non-HTML web programming languages that are necessary to transfer stored documents onto web pages, we have included a longer discussion of databases and XML in the appendix. We also cover how they work in greater depth in subsequent chapters.