Preserving Digital History: What We Can Do Today to Help Tomorrow’s Historians

The Fragility of Digital Materials

f only digital preservation were as easy as changing the quality of the paper we print on, as publishers and archivists have done by using high-grade acid-free paper for documents deemed sufficiently important for long-term preservation. Electronic resources are profoundly unstable, far more unstable than such paper records. On the simplest level, many of us have experienced the loss of a floppy’s or hard drive’s worth of scholarship. The foremost American authority on the longevity of various media, the National Institute of Standards and Technology (NIST), still cannot give a precise timeline for the deterioration of many of the formats we currently rely on to store precious digital records. A recent report by NIST researcher Fred R. Byers notes that estimates vary from 20 to 200 years for popular media such as the CD and DVD, and even the low end of these estimates may be possible only under ideal environmental conditions that few historians are likely to reproduce in their homes or offices. Anecdotal evidence shows that the imperfect way most people store digital media leads to much faster losses. For example, a significant fraction of collections from the 1980s of audio CDs, one of the first digital formats to become widely available to the public, may already be unplayable. The Library of Congress, which holds roughly 150,000 audio CDs in conditions almost certainly far better than those of personal collections, estimates that between 1 and 10 percent of the discs in their collection already contain serious data errors.1

Moreover, nondigital materials often remain intelligible following modest deterioration, whereas digital sources such as CDs frequently become unusable at the first sign of corruption. Most historians (perhaps unconsciously) know this. We have gleaned information from letters and photographs discolored by exposure to decades of sunlight, from hieroglyphs worn away by centuries of wind-blown sand, and from papyri partially eaten by ancient insects. In contrast, a stray static charge or wayward magnetic field can wreak havoc on the media used to store “digital objects” (a catchall term that refers to everything from an individual image file or Word document to a complex website) that we might want to look at in the future. Occasionally the accidental corruption of a few bits out of the millions or billions of bits that comprise a digital file renders that file unreadable or unusable. With some exceptions, digital formats tend to require an exceedingly high degree of integrity in order to function properly. In an odd way, their perfection is also their imperfection: they are encoded in a precise fashion that allows for unlimited perfect copies (unlike, say, photocopied paper documents), but any loss of their perfection can mean disaster.

Yet this already troubling characterization of digital materials only begins to scrape the surface of what we are up against in trying to save these bits. Historians—even those strongly committed to long-term preservation—can lose important digital resources in some very unsettling ways. The Ivar Aasen Centre of Language and Culture, a literary museum in Norway, lost the ability to use its large, expensive electronic catalog of holdings after the death of the one administrator who knew the two sequential passwords into the system. The catalog, an invaluable research tool stored in an encrypted database format, had taken four years to create and contained information about 11,000 titles. After desperately trying to break into the system themselves, the Centre sent out an open call for help to computer experts and less above-board types, the reward being a round-trip flight to a Norwegian festival of literature and music. Within five hours a twenty-five-year-old hacker, Joakim Eriksson of Sweden, figured out that the first password needed to access the system was the administrator’s last name spelled backwards. (The second password, equally suspect security-wise, was his first name spelled forwards.)2

Beyond the frightening possibilities of data corruption and loss of access, all digital objects require a special set of eyes—often unique hardware and accompanying operating system and application software—to view or read them properly. The absence of these associated technologies can mean the effective loss of digital resources, even if those resources remain fully intact. In the 1980s, for instance, the British Broadcasting Corporation (BBC) had the wonderful idea of collecting fragments of life and culture from across the U.K. into a single collection to honor the 900th anniversary of William the Conqueror’s Domesday Book, which housed the records of eleventh-century life from over 13,000 towns in England following William’s invasion of the isle in 1066. Called the Domesday Project, the BBC endeavor eventually became the repository for the contributions of over a million Britons. Project planners made optimistic comparisons between the twentieth-century Domesday and its eleventh-century predecessor; in addition to dozens of statistical databases, there would be tens of thousands of digital photographs and interactive maps with the ability to zoom and pan. Access to this massive historical snapshot of the U.K. would take mere seconds compared to tedious leafing through the folios of the Domesday Book.

Such a gargantuan multimedia collection required a high-density, fully modern format to capture it all—so the BBC decided to encode the collection on two special videodiscs, accessible only on specially configured Philips LaserVision players with a BBC Master Microcomputer or a Research Machines Nimbus. By the late 1990s, of course, the LaserVision, the BBC line of computers, and the Nimbus had all gone the way of the dodo, and this rich historical collection faced the prospect of being unusable, except on a few barely functioning computers with the correct hardware and software translators. “The problems of software and hardware have now rendered the system obsolete,” Loyd Grossman, chairman of the Domesday Project, fretted in February 2002. “With few working examples left, the information on this incredible historical object will soon disappear forever.” One imagines that the Domesday Book’s modest scribes, who did their handiwork with quills on vellum that withstood nine centuries intact and perfectly readable, were enjoying a last laugh. Luckily some crafty programmers at the University of Michigan and the University of Leeds figured out how to reproduce the necessary computing environment on a standard PC in the following year, and so the Domesday videodiscs have gotten a reprieve, at least for a few more years or perhaps decades. But this solution came at considerable expense, a cost not likely to be borne for most digital resources that become inaccessible in the future. Though the U.S. Census Bureau can surmount a “major engineering challenge” to ensure continued access to the 1960 census, recorded on long-outdated computer tapes, an individual historian, local history society, or even a major research university will probably not foot similar bills for other historical sources.3

We could fill many more pages (of acid-free paper) with examples of such digital foibles, often begun with good intentions that in hindsight now seem foolish. Digital preservation is a very serious matter, with many facets—yet unfortunately no foolproof solutions. As Laura McLemore, an archivist at Austin College, concludes pragmatically, “With technology in such rapid flux, I do not think enough information is available about the shelf life or future retrieval capabilities of current digital storage formats to commit to any particular plan at this time.” The University of Michigan’s Margaret Hedstrom, a leading expert on digital archiving, bluntly wrote in a recent report on the state of the art (co-sponsored by the National Science Foundation and the Library of Congress), “No acceptable methods exist today to preserve complex digital objects that contain combinations of text, data, images, audio, and video and that require specific software applications for reuse.”4

It is telling that in our digital age—according to the University of California at Berkeley, ink-on-paper content represented an incredibly miniscule 0.01 percent of the world’s information produced in 2003, with digital resources taking up over 90 percent of the nonprinted majority—the New York Times felt compelled to use an analog solution for their millennium time capsule, created in 1998-1999. The Times bought a special kind of disk, HD-Rosetta, pioneered at Los Alamos National Laboratory to withstand nuclear war. The disk, holding materials deemed worthy for thousand-year preservation by the editors of the Times magazine, was created by using an ion beam to carve letters and figures into a highly pure form of nickel. Etched nickel is unlikely to deteriorate for thousands, or even hundreds of thousands, of years, but just to be sure, the Times sealed the disk in a specially made container filled with the highly stable inert gas argon, and surrounded the container with thermal gel insulation.5

Even skimping a bit on the argon and thermal gel, this is an expensive solution to most historians’ preservation needs. Indeed, we believe that any emphasis on technological solutions, whether they be pricy ones from Los Alamos or more germane computer repository systems we explore at the end of this chapter, should come second to attention to more basic tenets of digital preservation that are helpful now and are generally (though not totally) independent of the murky digital future. Archivists who have studied the problem of constant technological change realized some time ago that the ultimate solution to digital preservation will come less from specific hardware and software than from methods and procedures related to the continual stewardship of these resources.6 That does not mean that all technologies, file formats, and media are created equal; it is possible to make recommendations about such things, and where possible we do so below. But sticking to fundamentally sound operating principles in the construction and storage of the digital materials to be preserved is more important than searching for the elusive digital equivalent of acid-free paper.

1 Fred R. Byers, Care and Handling of CDs and DVDs: A Guide for Librarians and Archivists, (Washington, D.C.: Council on Library and Information Resources, 2003), ↪link 8.1a; Peter Svensson, “CDs and DVDs Not So Immortal After All,” Associated Press, 5 May 2004, ↪link 8.1b; Basil Manns and Chandrui J. Shahani, Longevity of CD Media, Research aAt the Library of Congress (Washington, D.C.: Library of Congress, 2003), ↪link 8.1c; Eva Orbanz, ed., Archiving the Audio-Visual Heritage: A Joint Technical Symposium (Berlin: Stiftung Deutsche Kinemathek, 1988); Diane Vogt-O'Connor, “Care of Archival Compact Discs,” Conserve O Gram, 19/19 (Washington, D.C.: National Park Service, 1996), ↪link 8.1d.

2 Charles Arthur, “The End of History,” Independent(London), 30 June 2003, 4; Jonathan Tisdall, “Hackers Solve Password Mystery,” Aftenposten Norway, 10 June 2002, ↪link 8.2.

3 University of Michigan School of Information, “CAMiLEON Project Cracks Twentieth-Century Domesday Book,” news release, December 2002, ↪link 8.3a; John Elkington, “Rewriting William the Conqueror / Focus on a computer project to update the Domesday Book,” Guardian (London), 25 April 1985; Arthur, “The End of History.” See also CAMiLEON Project, “BBC Domesday,” CAMiLEON Project, ↪link 8.3b; Margaret O. Adams and Thomas E. Brown, "Myths and Realities about the 1960 Census," Prologue: Quarterly of the National Archives and Records Administration 32, 4 (Winter 2000), ↪link 8.3c.

4 Laura McLemore quotation from WGBH, “Migration,” Universal Preservation Format, ↪link 8.4a; Margaret Hedstrom quotation from It’s About Time: Research Challenges in Digital Archiving and Long-Term Preservation (Washington, D.C.: National Science Foundation and the Library of Congress, 2003), 8, ↪link 8.4b. For more on the challenges libraries and archives currently face with the proliferation of digital artifacts and collections, see Daniel Greenstein, Bill Ivey, Anne R. Kenney, Brian Lavoie, and Abby Smith, Access in the Future Tense (Washington, D.C.: Council on Library and Information Resources, 2004), ↪link 8.4c, and Building a National Strategy for Preservation: Issues in Digital Media Archiving (Washington, D.C.: Council on Library and Information Resources and the Library of Congress, 2002), ↪link 8.4d.

5How Much Information? 2003, ↪link 8.5a; “Built to Last,” New York Times, 5 December 1999. The Long Now Foundation has done similar thinking about how to send information deeply into the future, ↪link 8.5b.

6 Hedstrom, It’s About Time,