Preserving Digital History: What We Can Do Today to Help Tomorrow’s Historians

Technical Considerations

ut what about the code you are documenting? What can you do to decrease the likelihood that it will cease to function properly in years to come because of unforeseen changes to programming languages, software, or hardware, or because your site has to move from one web server to another? Without the benefit of a crystal ball, you can still implement simple technical conventions to make your website easier to maintain and transport. For example, use relative URLs, where references between files on your site are made using just the tail end of the URL that is different than the referring file’s URL, rather than absolute URLs, which are the entire address, including the domain name, parent directories, and “http://”. Relative URLs make it much simpler to move sections of your site around or transfer your entire site to a new domain without >links breaking.

A more serious though perhaps more time-consuming approach to making your website’s code future-friendly is to use XHTML, a sibling of XML, rather than HTML. In a web browser, XHTML renders in exactly the same way as HTML, and converting a site from HTML to XHTML does not affect its functioning at all. XHTML is a version of HTML written to the more stringent XML specifications—and therefore sites that use XHTML can take advantage of the strengths of XML. One of these potent traits is the capacity for XML to withstand the rapid changes of computer technology and its potential to be viewable on hardware and with software that do not even exist yet. Institutions cognizant of digital preservation, such as the Smithsonian, seem to be moving toward XHTML because of this flexibility. Along with Cascading Style Sheets (CSS) and a couple of more programmer-oriented technologies, XHTML is one of a handful of emerging “web standards” that professional web developers consider the basis for stable websites that will render well on different machines and that will have the greatest life expectancy.19

In general, it makes sense to rely as little as possible on specific hardware and software. Those concerned about the longevity of their site should therefore eschew marginal or unproven technologies. When flashy digital formats seem alluring consider whether they will be readable in five, ten, twenty-five years. We were once dazzled by the visual technology of a small software company called JT Imaging and ended up using it to display large maps on one of our websites. Unfortunately JT Imaging went out of business and their software vanished with them.

Consider future users of your site as important as current ones. It would be unwise to store archival documents that you spent a lot of money digitizing in a cutting-edge but unpopular format with uncertain prospects<. For example, some in the technological cognoscenti do not use Adobe Corporation’s popular PDF format, instead choosing a rival, far less popular format called DjVu, which is said to compress images of documents to a much smaller, yet still readable, size than PDF. At the time of this writing, however, Google shows over 150 million PDF and PDF-related documents on the Web and fewer than a million DjVu documents and related materials. Common sense dictates that the much larger user base for PDF means a greater likelihood that documents in this format will remain readable in the distant future.

Similarly, at the same time that you avoid getting entangled in specific, possibly ephemeral digital technologies, you should be as neutral as possible with regard to hardware and software platforms. Dependence on a particular piece of hardware or software is foolish because, like most hardware and software through history, the computer technology you depend on today will eventually disappear, more likely sooner rather than later. The BBC’s choice of the videodisc, specialized microcomputers, and a unique set of software instructions for its Domesday Project epitomizes this problem. This does not mean that you should avoid choosing specific hardware and software that meet your needs. Obviously you have to use some combination of hardware and software to create and serve your website, and it can be difficult to determine where the herd will go. When the Center for History and New Media (CHNM) decided five years ago that our old Macintosh server was no longer powerful enough to handle the volume of traffic we were receiving, we cautiously decided to buy a Linux server with Apache web server software. Apache now runs two-thirds of the world’s websites, but this dominance was in no way assured, given Microsoft’s aggressive push in the server market. But among other things, we liked how Apache could run on almost any computer with almost any operating system, making it easy to switch down the road if necessary. (Indeed, Apple now sells much more powerful servers—using the Apache program as its standard web server software.) We also chose the MySQL database program because its use of the Structured Query Language (SQL) standard for accessing information in databases meant that we would not have to recode our websites extensively if we changed to another database program later on. In general, try to use technologies that conform to open standards like SQL rather than proprietary standards. An international community of users is in charge of maintaining the open standard in a universal and transparent way, rather than a company whose profit motive is likely at odds with the principle of easily switching from one computer or program to another.

19 See ↪link 8.19a for W3C’s full documentation on the XHTML standard. For the importance of web standards like XHTML, see Dollar Consulting, Archival Preservation of Smithsonian Web Resources; Jeffrey Zeldman, Designing with Web Standards (Indianapolis: New Riders, 2003); Dan Cederholm, Web Standards Solutions (Berkeley, Calif.: Apress, 2004). As the W3C puts it, much more opaquely: “Alternate ways of accessing the Internet are constantly being introduced. The XHTML family is designed with general user agent interoperability in mind. Through a new user agent and document profiling mechanism, servers, proxies, and user agents will be able to perform best effort content transformation. Ultimately it will be possible to develop XHTML-conforming content that is usable by any XHTML-conforming user agent.” See ↪link 8.19b.