Preserving Digital History: What We Can Do Today to Help Tomorrow’s Historians

The Future of Our Digital Past

f only robust, reliable storage were enough to ensure that our documents and other digital objects would be available and readable many years from now. As the Arts and Humanities Data Service at the University of Essex warns, “Backup is not preservation…in contrast to backup, a preservation version of the data is designed to mitigate the effects of rapid technology change that might otherwise make the data unusable within a few years.”32 Technological change has indeed become a troubling constant in our world, and one that greatly erodes the reliability and durability of the data and documents on which we rely as both historians and modern human beings. It is already difficult to open WordStar documents from the 1980s or even many WordPerfect documents from the 1990s. Such continual change poses the greatest challenge to the preservation of digital records for posterity, and although much work has been done on this subject, we cannot currently offer solace that all of the issues have been solved.

Although almost every commentator waxes eloquent about the new forms of access to historical documents opened up by digitization, most view digitization’s benefits for preserving the past skeptically. “The digital landscape looks bleak for preservation purposes,” concludes Paula De Stefano, the head of the Preservation Department at New York University. “So far,” agrees Abby Smith of the Council on Library and Information Resources, “digital resources are at their best when facilitating access to information and weakest when assigned the traditional library responsibility of preservation.” We very well may face a future in which it is hard to retrieve large segments of the past.33

As we saw in Chapter 3, one reason that, in Smith’s words, “digitization is not preservation” stems from the loss of information that comes in the move from analog to digital format. Digital copies can never be perfect copies—or at least not yet. Still, improved technology is making it easier to come closer to perfection, and recently the influential Association of Research Libraries (ARL, a coalition of major North American institutions) has acknowledged that digitization, though imperfect, presents some very real advantages as a preservation method. The main problem facing digitization as a preservation medium is that we have not yet figured out how to preserve digital objects—even cutting-edge programs like DSpace and Fedora are closer to repositories than true archives, which by definition ensure that valued objects are available in perpetuity (though these programs may certainly may grow into this role with further improvement). If digital objects are truly and irreparably impermanent, we are not saving anything by moving from a fading analog map to digital bits that may not be readable in twenty years because of hardware and software changes.34

On the other hand, as the recent ARL position on digitization shows, librarians and archivists are growing more optimistic about making digital forms more permanent. Several national and international initiatives are currently formulating a digital preservation architecture that will facilitate digital libraries and archives and confront many of the anxieties about obsolescence and nonstandardization. And despite these anxieties, digitization sometimes offers the best (albeit not the perfect) preservation approach. The “purist” strategy for preserving the 10,000 deteriorating reels of taped oral history interviews held by the Marine Corps would require duplicating all the tapes. Instead, the Marines are putting the interviews on CDs and allowing the analog tapes to decay. Samuel Brylawski, the head of the Library of Congress Recorded Sound Section, quotes the assertion that “no one can prove any digital version will survive and be accessible beyond a few decades,” but then adds that “in the case of audio preservation É there is no proven analog preservation practice” and concludes that “digital preservation is here to stay.”35

Digitization is not yet the preservation silver bullet, but it can still extend the “preservation tool kit,” as preservationists Anne R. Kenney and Paul Conway explain. The most obvious value is in creating digital surrogates that can be provided to users as an alternative to allowing more wear and tear on the originals. You no longer need to consult the original Beowulf manuscript in the British Library because you can more easily access a digital copy, which also happens to reveal more of the text. After the Chicago Historical Society finishes digitizing its 55,000 glass plate negatives from the Chicago Daily News photo morgue, it intends to retire the originals from use, “thus ensuring their longevity.”36

But what about the longevity of the digital copies, or of digital materials in general? Caroline R. Arms of the Library of Congress’s Office of Strategic Initiatives has summarized the five general avenues the library and others in the long-term preservation business are exploring to ensure the continuity of such valuable and growing collections of digital materials. The first two avenues have to do with solving problems related to the storage of digital files; the last three have to do with the more difficult matter of being able to understand and read those files in the distant future. The first avenue is “better media,” or the search for media formats that will last longer and be more reliable throughout their lifetimes than existing media. The second approach is what Arms terms “refreshing bits” and is the process both of constantly backing up data onto storage media as well as transferring it to new computers over time, all the while making sure that no corruption of files has occurred.

The third and fourth avenues are often considered rival methods for accessing digital files for which the original “interpreters”—that is, the hardware and software used to read them during their initial creation and early life—are now gone. “Migration,” as the name suggests, involves moving a file from one format to another to keep it up to date. So a file originally created in, and readable by, Microsoft Word 97 is updated to be a Word 2003 file, then a Word 2007 file, and so on. In this way, the latest version of the document is readable by the latest, current version of the software. “Emulation”—the approach taken with the seemingly doomed digital Domesday Project—takes the opposite tack: rather than changing the original file, you try> on modern hardware and software to re-create the original digital environment needed to read a very old file—in other words, making a version of Microsoft Word 97 that works on Windows 2050. As Arms puts it, emulation involves “using the power of a new generation of technology to function as if it were the technology of a previous generation.” This sounds wonderful, especially because (at least in theory) a single Word 97 emulator could enable historians to read millions of once obsolete documents, unlike migration, which forces designated stewards to move each of those files through each intermediate file type over the decades. Although migration can be incredibly expensive—one estimate is that data migration of a digital library is equivalent to photocopying all of the books in a physical library of the same size every five years—it is a known, effective process. Businesses do it all the time, as does the federal government. But emulation remains unproven and untested. Keep in mind that in 2050 we will need to emulate not just the word processing program but elements of the operating system of yesteryear as well. And if emulation fails to work, we are in trouble. The final avenue is truly a last resort, only to be used if the prior four methods fail: “digital archaeology,” in which it is no longer possible to read a file (due to an ancient file type or lack of proper hardware or software), and thus we must be satisfied with picking up pieces of the digital past and hoping to be satisfied with them (and with our ability to interpret them).37

Although historians should be aware of these various efforts to save digital materials for the long run and should be part of this crucial discussion about what will happen to the records of today and yesterday in a digital future, a large portion of this discussion and almost all of its implementation lies beyond our purview. Computer scientists, librarians, and archivists are the prime movers in this realm, and properly so, though they could certainly use our input as some of the most important end users of their products and as professional stewards of the future of the past. Most readers of this book will not become active participants in this work, but they should, to the degree possible, become engaged with the larger social and professional issues it raises. Keep an eye and an ear out for the themes and projects we have discussed, follow what they are trying to do, and connect with the librarians and archivists doing this critical work. We are still in the early stages of the creation of these archives, and many prototypes and methods will disappear like the failed dot-coms of the 1990s. While waiting for the dust to settle, historians should maintain their best individual preservation efforts by focusing on what we have emphasized earlier in this chapter.

 

At the present time, therefore, preservation of digital materials involves some modest, though not radical, steps. Until systems like DSpace or effective metadata schemas become essentially invisible—“click here to tag and archive your website”—our digital work faces an uncertain future. We cannot predict the likelihood that any of these projects will prove capable of easily accessioning websites whole and ensuring that such sites will be readable in fifty or a hundred years. We do know, however, that their overall success will have as much to do with social systems as technical ones. As Margaret Hedstrom perceptively notes, “The challenges of maintaining digital archives over long periods of time are as much social and institutional as technological. Even the most ideal technological solutions will require management and support from institutions that in time go through changes in direction, purpose, management, and funding.”38 Regardless of how sound the systems like DSpace are in the theory of library and computer sciences, they will need to be serviced and cared for over years, by definition, and thus stand at the whim of many human factors other than the technical elements their creators have labored over.

For now, you are the best preserver of your own materials. Pay attention to backing up, and try to create simple, well-documented, standardized code. After covering those basics, you might search for a preservation “partner,” an institution that would be interested in saving your website or its constituent materials after you can no longer provide the attention (or financial resources) it needs. Only then take stock of more advanced preservation systems and methods yourself, or maintain an ongoing dialogue with a digitally savvy archivist. As Louis Pasteur astutely observed, “Chance favors the prepared mind,” and the caprice of technological change as well as future efforts in digital preservation by smart librarians and computer scientists will likely reward the well-prepared website.

32 Arts and Humanities Data Service, “Planning Historical Digitisation Projects—Backup,” Planning Historical Digitisation Projects, ↪link 8.32.

33 Paula de Stefano, “Digitization for Preservation and Access,” in Preservation: Issues and Planning, ed. Paul N. Banks and Roberta Pilette (Chicago: American Library Association, 2000), 319; quotation from Abby Smith, Why Digitize? (Washington, D.C.: Council on Library and Information Resources, 1999), 3; Rosenzweig, “Scarcity or Abundance?”

34 Smith, Why Digitize? 3; Kathleen Arthur, Sherry Byrne, Elisabeth Long, Carla Q. Montori, and Judith Nadler, “Recognizing Digitization as a Preservation Reformatting Method,” Association of Research Libraries, ↪link 8.34a; Hartmut Weber and Marianne Drr, Digitization as a Means of Preservation? (Amsterdam: European Commission on Preservation and Access, 1997), ↪link 8.34b.

35 Among the larger, more elaborate initiatives are the Library of Congress National Digital Information Infrastructure and Preservation Program (NDIIPP) (↪link 8.35a), the National Archives and Records Administration’s Electronic Records Archive (ERA) (↪link 8.35b), and the International Research on Permanent Authentic Records in Electronic Systems (InterPARES) (↪link 8.35c); Frederick J. Graboske and Christine T. Laba, “Field History, the War on Terrorism, and the United States Marine Corps Oral History Program” (paper presented at the International Oral History Association, Rome, Italy, 23–>26 June 2004); Samuel Brylawski, “Review of Audio Collection Preservation Trends and Challenges” (paper presented at Sound Savings: Preserving Audio Collections, Austin, Texas, 24–26 July 2003), ↪8.35d.

36 Anne R. Kenney and Paul Conway, “From Analog to Digital: Extending the Preservation Tool Kit,” in Going Digital: Strategies for Access, Preservation, and Conversion of Collections to a Digital Format, ed. Donald L. DeWitt (New York: Haworth Press, 1998), 67-79; Matthew Cook, “Economies of Scale: Digitizing the Chicago Daily News,” RLG DigiNews 4.1 (15 February 2000), ↪link 8.36.

37 Warwick Cathro, Colin Webb, and Julie Whiting, “Archiving the Web: The PANDORA Archive at the National Library of Australia” (paper presented at Preserving the Present for the Future, Copenhagen, June 18–19, 2001). See also Diane Vogt-O'Connor, “Is the Record of the 20th Century at Risk?” CRM: Cultural Resource Management 22, 2 (1999): 21–24; Caroline R. Arms, “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress,” RLD DigiNews 4, 3 (June 15, 2000), ↪8.37a. For the argument in favor of emulation, see Jeff Rothenberg, Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation (Washington, D.C.: Council on Library and Information Resources, 1999), ↪ 8.37b. For other comparisons of digital preservation strategies, as well as what leading institutions, such as the U.S. National Archives and Records Administration (NARA) and the San Diego Supercomputing Center (SDSC), are doing to address the technological challenges, see NARA’s Electronic Records Archives home page, ↪link 8.37c; SDSC’s Methodologies for Preservation and Access of Software-dependent Electronic Records home page, ↪link 8.37; Kenneth Thibodeau, “Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years,” The State of Digital Preservation: An International Perspective (Washington, D.C.: Council on Library and Information Resources, 2002), ↪link 8.37e; Raymond A. Lorie, “The Long-Term Preservation of Digital Information,” ↪link 8.37f; Preserving Digital Information.

8.37g.

38 Hedstrom, It’s About Time, 9.