Exploring History on the Web

Archival Websites

rofessional archivists complain that many archival websites are not archives at all because they lack “provenance,” that is, a firm history of the custody of a coherent body of materials since their origin.12 Instead, their creators have assembled them, sometimes carefully and other times haphazardly, from diverse sources. But even archivists would consider most (though not all) of the one hundred collections in the Library of Congress’s American Memory website (a central component of its National Digital Library Program) “true” archives. Whatever you call them, taken together these sites are one of the History Web’s greatest achievements and one of its most popular destinations.

In the early 1990s, the library distributed optical disks of major collections to test locations around the country and discovered to its surprise that K-12 teachers and students eagerly embraced the digital gifts.13 In 1994, the library began moving these collections to the web. Less than a decade and more than $60 million later, American Memory had posted more than 8 million items. The collections cover every period of American history and almost every type of historical document in the library’s collections, including books and other printed texts, manuscripts, sheet music, maps, motion pictures, photographs and prints, and sound recordings.

Figure 5: In a suprising bit of understatement for the usually hyperbolic web, American Memory boasts of “7 million items from more than 100 historical collections.” The current count is more than 8 million items.

American Memory succeeds because it exploits two intrinsic advantages of the digital medium: accessibility and searchability (despite a cumbersome interface). Using the online version of the Washington papers, the historian Peter R. Henriques undercut the claims of those who insist on Washington’s religiosity by showing not only that Washington never referred to “Jesus” or “Christ” in his personal correspondence but also that his references to death were invariably “gloomy and pessimistic” with no evidence of “Christian images of judgment, redemption through the sacrifice of Christ, and eternal life for the faithful.”14 Historians around the globe, not just those with physical access to the Library of Congress, may now conduct such investigations, and with a speed impossible when searching entailed months of manual turning of pages.

The early success of American Memory and other pioneering web archives sent hundreds of other libraries and archives to work on getting their own collections online. In 1997, for example, the Bibliothque Nationale de France began the Gallica project to put documents in various media from the Middle Ages to the early twentieth century online. Dozens of similar projects have given the History Web a global reach: PictureAustralia presents 600,000 images from twenty-one cultural agencies, the Digital Imaging Project of South Africa offers the text of 38 anti-apartheid periodicals, the International Dunhuang Project serves up 20,000 digitized images of Silk Road artifacts, and the Nagasaki University Library displays more than 5,000 hand-tinted photographs from the second half of the nineteenth century.15

Figure 6: Over the past century, artifacts initially found in the Dunhuang caves and other ancient silk road sites have been dispersed among museums and private collections around the world. The International Dunhuang Project has begun to reunite—virtually—tens of thousands of these artifacts on its website. The Chinese and English interfaces reflect the international character of the project.

Beyond its own collections, the Library of Congress played an important early role in spreading digital archives in the United States. With a $2 million grant from the midwestern telephone company Ameritech (now SBC), the library sponsored a competition from 1996 through 1999 to enable museums, historical societies, archives, and other libraries to create digital collections of primary resources. Twenty-two funded collections on such topics as Chicago anarchists, the Chinese in California, the Northern Great Plains, and the Florida Everglades now reside within American Memory.

Soon other funding sources as well as the sweat equity of individuals brought dozens of other major collections online. The Academic Affairs Library at the University of North Carolina at Chapel Hill—with support from Ameritech, the National Endowment for the Humanities (NEH), the Institute of Museum and Library Services, and the university itself—created Documenting the American South, which includes six digitization projects drawn primarily from the library’s Southern Collections, including massive holdings of first-person narratives and Southern literature. As with the Library of Congress, greatly expanded access to primary sources has proven to be the most significant contribution of Documenting the American South. Although the Academic Affairs Library conceived the project as a service to Southern studies scholars at the University of North Carolina and other colleges and universities, three-quarters of the users have turned out to be nonacademics.16

Though American Memory and Documenting the American South have found a worthy mission in giving general and student audiences access to materials previously limited to the scholarly community, other archival projects have focused more squarely on scholars, particularly because special collections, archives, and major research libraries have been at the forefront of some of the most important and largest digitization projects. For example, the Digital Library Production Service at the University of Michigan defines K-12 and community colleges as “low priority” audiences and instead focuses on research universities and their graduate schools. Making of America, which the library developed in collaboration with Cornell University with funding from the Mellon Foundation, provides a digital library of printed materials published between 1850 and 1876. The University of Michigan portion of the collection alone encompasses more than 11,000 volumes and more than three million pages. Like scholars using American Memory, those taking advantage of Making of America for their studies can find information formerly available in principle but not necessarily in practice due to the hindrance of flipping through reams of paper. Historian Steven M. Gelber reports that he located “a treasure trove of data in a matter of a couple of days” for his research on the origins of hobbies.17

The ability of digital searching to turn up previously hidden riches applies particularly to records that contain large amounts of detailed information with no easy way to find specific pieces of data. Genealogists, for example, have spent days and weeks pouring over censuses and similar records seeking information on family members. Putting those records into digital form means not only saving the trek to distant archives but also gaining the chance to locate individual names with a quick word search. In April 2001, the Statue of Liberty–Ellis Island Foundation placed online a computer database of the passenger arrival records of more than twenty-two million immigrants who entered through the Port of New York and Ellis Island between 1892 and 1924. Web surfers immediately clogged the site, which was soon the number one destination from the Lycos search engine. In its first year of operation, the site received almost two million visitors. Similarly, volunteers working for the Church of Jesus Christ of Latter-day Saints digitized the records of the fifty-five million people listed in the 1880 United States Census and the 1881 Canadian Census and made them available for free at the church’s FamilySearch Internet Genealogy Service, which averages 3.4 million page views per day.18 Genealogy has long been a grassroots pursuit, but now it has become a cooperative effort whose results are shared among an international community.

Figure 7: The ability to look for ancestorrs in online immigration records instantly made the American Family Immigration History Center a popular stop on the History Web.

The fever to bring the primary sources of the past online that began in the mid-1990s has infected many people—especially scholars and teachers, but also students and amateur enthusiasts—who think differently about documents than librarians and archivists. Their passion generally focuses on a particular historical topic, and they want to make documents related to that topic available online—even if those artifacts don’t necessarily have a shared “provenance” and common association in the manner of a traditional archive. Instead, these website producers create their own virtual collections, often mixing published and unpublished materials in ways that “official” archives avoid.

One of the first and still one of the most impressive of this new genre of “invented archives” is Valley of the Shadow. Like most of the early work coming out of the University of Virginia’s Institute for Advanced Technology in the Humanities (IATH), Valley of the Shadow had its origins in a scholarly project. In 1991, Edward Ayers, a leading Southern historian, began work on a book that would compare the experience of two communities on either side of the Mason-Dixon Line during the Civil War, Augusta County in central Virginia and Franklin County in southern Pennsylvania. In 1992, Ayers and literary critic Jerome McGann, who created a massive site about the Pre-Raphaelite poet and painter Dante Gabriel Rossetti, became the institute’s first two fellows. With the aid of a large team of collaborators and several grants, Ayers began digitizing the collections that would underlie his book. Initially Ayers and McGann planned to put their new media archives on stand-alone computers or local networks, but when they saw Mosaic in the fall of 1993, they knew “everything had changed for our digital projects.”19

Figure 8: The Valley of the Shadow was one of the earliest sites on the History Web. The copyright notice time span indicates both its pioneering status and the many years it took to complete the project.

Valley of the Shadow has since developed into a massive compendium of documents about the two communities before, during, and after the Civil War, including tens of thousands of newspaper articles, 1,400 letters and diaries, full census records from 1860, 45 Geographic Information Systems (GIS) maps, and more than 700 photographs and images. The site offers at least an implicit interpretation of these materials rather than taking the hands-off approach of most archives, and this blurring between archive and historical argument perhaps makes Valley of the Shadow and similar sites more like edited collections of documents than traditional archives.

Jim Zwick, another early pioneer in “inventing” an online archive, brought an even more distinctive authorial voice to his efforts. In early 1995, as a Syracuse University graduate student, he began digitizing and posting a few documents on anti-imperialism, the subject of his dissertation. Like most historians, Zwick had assembled his own personal collection of sources, and he realized that these materials he had gathered for scholarly research could be made public through the web. Over time, Zwick’s efforts have expanded well beyond anti-imperialism to encompass 10,500 pages of historical documents that he personally digitized.20

Figure 9: Jim Zwick began his Anti-Imperialism website in 1995 and has kept it going for a decade without any major grants or institutional supports. In recent years advertising revenues have helped to defray some of the expenses.

Zwick’s Anti-Imperialism in the United States, 1898–1935, illustrates not only what a single scholar can accomplish with energy, passion, and a good scanner, but also how “invented archives” can shape popular historical understanding. Zwick combines the scholar’s enthusiasm for his subject with a commitment to the cause espoused by his historical subjects, and that perspective has shaped his assiduous digitizing of documents, such as his remarkable collection of more than fifty anti-imperialist responses to Rudyard Kipling’s poetic apologia for imperialism, “The White Man’s Burden.” As a result, the researcher who types the title of Kipling’s poem into Google gets Zwick’s compendium of critiques as the first hit rather than a site organized by a Kipling acolyte. Similarly, despite the current popular antipathy to Marxism, the first hit on a Google search on “Marx” is the Marxists Internet Archive, a site that seeks “to show the value of Marxism.”21

Ayers and Zwick approached the web as scholars. They created sites that grew out of their own research interests. But they quickly encountered a large student audience, eager to touch some pieces of the past, even if only virtually. This should tell us something about the latent interests and curiosity of a vast Internet audience and the potential good service that can be done as online historians.

Many others began their websites with pedagogy front and center. In 1995, Doug Lindner, a professor at the University of Missouri–Kansas City Law School, began to post some background materials for students enrolled in his course on famous trials. Lindner’s Famous Trials website gradually grew to thousands of documents (maps, trial transcripts, chronologies, appeals, newspaper accounts, etc.) on thirty-five trials, from Socrates in 399 b.c.e. to O. J. Simpson in 1995. The audience has also blossomed, now including high school, college, and law school students around the world. Despite the impressive scope of the site, Lindner disavows any intention to offer a traditional archive. Such an archive would run counter to his “basic goal of providing a clear, concise, and reasonably balanced understanding of the trials.” He responds to critics with a sentence that summarizes the advantages of low-cost self-publishing on the web: “I’m not getting paid a penny for this—I put up the trials that I, with all of my idiosyncrasies, find interesting or compelling in some way.”22 An even larger but still homegrown and self-financed teaching archive is Paul Halsall’s Internet History Sourcebooks Project, which presents hundreds of public domain and copy-permitted historical texts for teaching organized into a family of sites, including the Internet Ancient History Sourcebook, the Internet Medieval Sourcebook, and the Internet Modern History Sourcebook.23

Most grassroots web archivists lack Zwick, Lindner, and Halsall’s dedication, but they generally start with a similar teaching need or historical passion. Peter Bakewell and two colleagues at Emory University have posted a modest selection of primary sources on Colonial Latin America to “provide expanded access to limited documentary resources” for students in their courses. Eyler Robert Coates, Sr., a self-employed investor and consultant, wanted to publish a volume called Quotations from Chairman Jefferson that he saw as “freedom’s alternative to ÔQuotations from Chairman Mao Tse-Tung’” and began assembling quotations on more than two thousand handwritten cards. Then he learned to use a computer and in December 1995 launched Thomas Jefferson on Politics and Government: Quotations from the Writings of Thomas Jefferson, which became one of the best known and most visited Jefferson sites on the web, ranked number six by Google among sites related to the third president. Stefan Landsberger’s online archive of hundreds of Chinese propaganda posters grew out of his doctoral dissertation and his desire to share his own passionate collecting. In 1995, Software consultant Omar Khan started Harappa: The Indus Valley and the Raj in India and Pakistan because it was the “cheapest way” to bring his “hobby” to “the widest possible audience.” It has turned into a major scholarly and teaching resource for those interested in South Asia.24

Not all the work of energetic, grassroots web archivists has remained noncommercial. In 1972, businessman John Adler, a history buff since his college days at Dartmouth, acquired a full run of Harper’s Weekly (1857–1916) for $10,000. Adler decided that indexing the popular illustrated weekly would make a nice retirement project. Ultimately his efforts turned into a commercial web-based archive, HarpWeek. Despite the high price (purchasers pay $9,900 for a five-year segment), Adler has only recouped about 40 percent of the $10 million he has invested in the project.25

Other for-profit projects, especially from large information companies like Canada’s Thomson Corporation and Michigan-based ProQuest, have made even more massive investments in digitizing the past. For example, Thomson, a “global e-information and solutions company” with close to $8 billion in annual revenues, offers Eighteenth Century Collections Online, which includes “every significant English-language and foreign-language title printed in Great Britain” in the eighteenth century—thirty-three million text-searchable pages and nearly 150,000 titles. Thomson Gale, a subsidiary of the Thomson Corporation, calls it “the most ambitious single digitization project ever undertaken” and boasts, “we own the 18th century.” Those who want their own share must pay handsomely. A university with 18,000 students can spend more than half a million dollars to acquire the full collection—a hefty price, albeit less than the cost of acquiring the original books.26

ProQuest, formerly the camera company Bell & Howell, goes head to head with Thomson in the fight to own the past. The half-billion-dollar corporation has launched a Digital Vault Initiative to convert more than 5.5 billion pages into “the world’s largest digital archival collection of printed works.” Already, its ProQuest Historical Newspapers offers almost eight million online pages containing the full runs of the New York Times, Los Angeles Times, Wall Street Journal, Christian Science Monitor, and Washington Post. Plans for an even more massive digitization project have emerged recently from search engine giant Google, which intends to digitize 15 million books and make them available for free online, possibly using advertising to underwrite the costs. If they succeed, Thomson’s boasts about “owning” the eighteenth century will look petty. Google will “own” the library.27

Massive corporate funding (and Google’s soaring stock price) gives the commercial digitizers a key advantage over public sector institutions like the Library of Congress and grassroots archivists like Zwick. They can easily bear the upfront costs of converting paper into marketable bits. Moreover, in addition to the costs of scanning and indexing, the copyright ownership of most of the intellectual products of the twentieth century means that only an entity that can sell access to the past can also afford to purchase the rights to it. Under the Copyright Term Extension Act of 1998 (see Chapter 7), almost everything published after 1923 remains covered by copyright in the United States until at least 2018. As a result, only companies with gated archives like ProQuest can offer the Times of London or the New York Times (and other newspapers) for most of the twentieth century. (Even Google faces sharp constraints on the display of copyrighted works; current plans call for them to display just limited passages and links to libraries, where the book can be borrowed, and Amazon, where it can be purchased.) Not until the twenty-second century will most of the history of the twentieth century find its way into free online archives.

12 See William J. Maher, “Society and Archives” (Presidential Address delivered at the 61st Annual Meeting of the Society of American Archivists, Chicago, 30 August 1997), ↪link 1.12a; “Cataloger’s Reference Shelf: Definition: Provenance,” The Library Corporation, ↪link 1.12b.

13 Caroline R. Arms, “Historical Collections for the National Digital Library: Lessons and Challenges at the Library of Congress,” D-Lib Magazine (April 1996), ↪link 1.13.

14 Roy Rosenzweig, “The Road to Xanadu: Public and Private Pathways on the History Web,” Journal of American History 88 (September 2001), ↪link 1.14.

15 Gallica 2000, ↪link 1.15a; Picture Australia, ↪link 1.15b; Digital Imaging Project of South Africa, ↪link 1.15c; International Dunhuang Project, ↪link 1.15d; “Japanese Old Photographs in Bakumatsu-Meiji Period,” Nagasaki University OldPicture Database, ↪link 1.15e.

16 Joe A. Hewitt, “Remarks,” DocSouth 1000th Title Symposium, Chapel Hill, N.C., 1 March 2002, ↪link 1.16.

17 Humanities Advanced Technology and Information Institute (HATII) and National Initiative for a Networked Cultural Heritage, The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials—Interview Reports (Washington, D.C.: National Initiative for a Networked Cultural Heritage, 2002), ↪link 1.17; Gelber quoted in Rosenzweig, “The Road to Xanadu.”

18 “The American Family Immigration History Center Fact Sheet,” American Family Immigration History Center, ↪link 1.18a; Statue of Liberty–Ellis Island Foundation, Annual Report, Year Ended March 31, 2003 (New York, 2003), 5; “Facts and Statistics,” FamilySearch Internet Genealogy Service, ↪link 1.18b; “Free Internet Access to Invaluable Indexes of American and Canadian Heritage,” Church of Jesus Christ of Latter-day Saints, ↪link 1.18c; link 1.18d.

19 Edward Ayers, “Living in the Valley of the Shadow” (forthcoming book chapter in possession of authors); Edward Ayers, email to Roy Rosenzweig, 25 August 2003; Jerome McGann, Radiant Textuality: Literature After the World Wide Web (New York: Palgrave, 2001).

20 Jim Zwick, email to Roy Rosenzweig, 27 November 2000; Jim Zwick, email to Rosenzweig, 4 August 2003.

21 Jim Zwick, “ÔThe White Man’s Burden’ and Its Critics,” in Anti-Imperialism in the United States, 1898–1935,link 1.21a; “Marxists Internet Archive History,” Marxists Internet Archive, ↪link 1.21b. (Zwick’s site was first when we wrote this in 2004, but it had dropped out of the top ranks by early 2005, possibly because of readjustments in Google’s ranking system.)

22 Douglas Linder, “Goals and Purposes of the Famous Trials Site,” Famous Trials, link 1.22.

23 Paul Halsall, “Main Page,” Internet Modern History Sourcebook, ↪link 1.23a; Paul Halsall, “Medieval Sourcebook: Introduction,” Internet Medieval Sourcebook, ↪link 1.23b.

24 Peter Bakewell, “Culpepper Project Summary,” Culpepper/CTC Program in Teaching & Technology, ↪link 1.24a; Eyler Robert Coates, Sr., “Information on Eyler Robert Coates, Sr.,” Thomas Jefferson and His Writings, ↪link 1.24b; “Web Server Statistics,” Electronic Text Center—University of Virginia Library, ↪link 1.24c; Stefan Landsberger, Chinese Propaganda Posters, ↪link 1.24d; Omar Khan, Jim McCall, and Andrew Deonarine, Harappa: The Indus Valley and the Raj in India and Pakistan, ↪link 1.24e; Meeta Chaitanya Bhatnagar, “Omar Khan in Conversation,” HindustanTimes.com (15 December 2002),

link 1.24f.

25 Deborah Markham, “Retirement Project Puts Historic Publications on the Web,” Hamptons Roads Business, 24 March 2003, ↪link 1.25a. See also Randall Rothenberg, “HarpWeek Pitches U.S. History to Teens—and Marketers As Well,” Advertising Age, 18 October 1999,

link 1.25b.

26 Jeffrey Cymerint, interview, 1 August 2003; “Gale’s Biggest Digitization Project Ever Covers Eighteenth Century,” Gale Press Room, ↪link 1.26a; Barbara Quint, “Gale Group to Digitize Most 18th-Century English-Language Books, Doubles Info Trac Holdings,” Information Today, Inc. (17 June 2002), ↪link 1.26b.

27 Rosenzweig, “The Road to Xanadu”; “ProQuest Historical Newspapers Preview,” ProQuest Information and Learning,link 1.27a; Google's Gigantic Library Project,” SPARC Open Access Newsletter, 81 (2 January 2005), ↪link 1.27b.