Recognizing Digitization as a Preservation Reformatting Method
Prepared for the ARL Preservation of Research Library Materials Committee
Kathleen Arthur, Head, Replacement & Reformatting, University of Chicago
Sherry Byrne, Preservation Librarian, University of Chicago
Elisabeth Long, Co-Director, Digital Library Development Center, University of Chicago
Carla Q. Montori, Head, Preservation Division, University of Michigan
Judith Nadler, Associate Director, University of Chicago
© ARL 2004
Recognizing Digitization as a Preservation Reformatting Method
Cultural institutions serve the international community by building, protecting, preserving and ensuring continued access to diverse collections and resources. The challenges of preserving collections have been addressed in different ways over time. Libraries have used conservation to preserve the original artifact and reformatting strategies, such as microfilming and the creation of print facsimiles, to retain content, enhance access, and protect the original from excessive wear. Over the past several years, libraries have moved towards using digitization as an additional method for reformatting endangered and fragile paper-based materials to both preserve and provide access to library collections.
The Association of Research Libraries endorses digitization as an accepted preservation reformatting option for a range of materials. ARL encourages its members and others engaged in digital reformatting and those interested in initiating these activities to make an organizational and economic commitment to adhere to accepted standards and best practices, and to establish policies and the capacity to maintain digital products for the long-term. ARL calls on the community of federal grants agencies, private foundations, and grants reviewers and panelists to give equal support to proposals that incorporate digital reformatting for preservation when these conditions are met.
Libraries need to employ a variety of reformatting strategies to meet the demands of preserving the many types of materials in our collections. Each reformatting method in use has strengths and weaknesses. Choices have to be made based on the characteristics of the originals, the capabilities of each reformatting process, the current and anticipated needs of the user, and cost. Evaluation is key - no one solution can fit all needs. Our preservation programs must be multi-faceted and responsive to be effective. The more preservation options libraries have at their disposal, the better collection managers can meet the current needs of collections as well as the expectations of users now and in the future.
The choice to use digitization, or any reformatting option for preservation, is not prescriptive – it remains a local decision. Institutions may choose to digitize only, to use one of the other reformatting methods, or to use a combination. Many approaches are possible but digital reformatting should now be considered a valid choice among the various methods for preserving paper-based materials. (See Appendix 1: Comparison of Reformatting Technologies for more details)
A number of positive outcomes result from deploying digitization as a reformatting strategy. Digitization increases the capture capability for many types of paper-based material, such as oversize and color items, for which there has been no effective reformatting strategy to date. Functionality, such as zooming capabilities, allows users to examine more closely fine details and produce a variety of outputs to suit different needs. Digital facsimiles better reproduce the navigational experience of a book than does the linear format of microfilm. Although the preservation of paper-based materials is the primary focus of this document, digitization also has the potential to capture information currently recorded on many other media and may be the only method to preserve this material.
When digital facsimiles of print materials are made accessible via the World Wide Web, the widest range of users has equal access to collections from any location whether they are on- or off-site. A virtual environment of digital files can combine content from many kinds of resources, including primary source material, and provide powerful opportunities to integrate materials seamlessly into instruction and course management systems for teaching and learning. Digitization allows users to create virtual collections that will support new and creative research made possible only in a digital environment. (See Appendix 2: Benefits of Digitization as a Preservation Reformatting Option for more details)
Ensuring high-quality image capture and providing for the long-term viability of digital objects is an admitted challenge, but the library profession has a long history of developing standards and best practices in order to support sustainable operations and facilitate inter-institutional collaboration. This tradition provides confidence that digital preservation challenges will be met.
Institutions and collaborative organizations have identified issues and risks involved in creating and maintaining digital objects and have made significant progress to establish what needs to be in place. There is already wide consensus and acceptance in the preservation community and among practitioners in the field about the creation of digital masters for paper-based materials. There are established guidelines for image capture and processing to ensure that images are of high quality and provide faithful representations of the original. Standards-based-file formats and file compression practices are in use.
In the area of metadata, there are still challenges to overcome. Standards have been established for recording bibliographic descriptions. Work is in progress to specify element sets and formats for preservation and administrative metadata, to facilitate dissemination and maintenance of digital facsimiles over time. PREMIS, METS, and MODS are examples of ongoing metadata initiatives. Regular reports and open discussion as these standards are developed allow digitization programs to capture and record information that will be necessary once the standards are finalized.
Experience to date shows that files can be preserved and refreshed in the short-term. While there are many challenges in managing digital objects over time, institutions are actively engaged in developing solutions to ensure integrity and authenticity, address media and technological obsolescence, and provide long-term accessibility. Efforts are also underway to identify system designs and business models for sustaining large collections of digital objects. In the meantime, practices are in place to ensure we are capturing and recording sufficient information and managing digital objects to keep them safe now. Strategies to keep master files safe for the short-term include the use of high-quality and reliable storage media, multiple back-up systems, periodic testing, and a schedule to refresh data. These short-term strategies are a bridge to the emerging solutions that are being developed to ensure long-term availability and access. (See Appendix 3: Standards and Best Practices in Digital Reformatting for more details)
At the national
level, there is commitment to and funding for the development of
infrastructures to support digital preservation. The Library of
Congress (LC) has been charged to develop a National Digital
Information Infrastructure for Preservation (NDIIPP); the National
Archives and Records Administration (NARA) has undertaken creation of
an Electronic Records Archive (ERA); and the Government Printing Office
(GPO) has established its Legacy Digitization Project for scanning the
collections of the Federal Depository Program. The National
Library of Australia (NLA) has done extensive work within its
Preserving Access to Digital Information (PADI) initiative, and Great
Britain’s Joint Information Systems Committee (JISC) supports a variety
of digital preservation initiatives including the UK Data
Archives. The Consultative Committee for Space Data Systems
(CCSDS) has developed a widely accepted reference model for an Open
Archival Information System (OAIS), which provides a conceptual
framework and common terminology for describing the elements and
functions of a digital archive. (See Appendix 4: Current State of Commitment to Long-Term Preservation of Electronic Resources for more details).
Individual institutions and other organizations have also undertaken important work to develop models, build and test digital archives, create standards, and develop systems and software to support long-term digital preservation. The Global Digital Format Registry is being designed to capture and share information about a wide variety of file formats. Tools, such as JHOVE, are available to validate file formats as standard. Systems, such as LOCKSS, are being developed for managing digital archives and for exchanging digital content and strategies are being tested for migrating and checking digital data over time. Libraries continue to play a leadership role by recommending and testing standards and by actively contributing to broader efforts towards solutions for preservation of digital resources. (See Appendix 3 for more details).
ARL has formed a Working Group on Digitizing Government Document Collections that commissioned a business plan and is making recommendations for a coordinating role for ARL and next steps. Furthering GPO’s goal of preserving content permanently and improving public access through derivative files, the project will establish a baseline for the creation of digital preservation quality master files, a minimum threshold that must be met by all participating institutions. It will also develop specifications and guidance for the creation of metadata, address requirements for preserving digital files, and develop a governance structure. This is a significant and long-term effort that will further consolidate and document standards, practices, and other guidance needed by those engaged in digital reformatting for preservation.
The time is right to adopt digitization as a reformatting strategy for preservation. “As more and more is born digital and a new generation of users grows up with digital as the default mode of delivery, resources that are not in digital form will be ‘orphaned’ over time because they are in ‘obsolete’ formats.” (Abby Smith, Council on Library and Information Resources (CLIR), e-mail message, March 29, 2004.) Concurrently, in response to the increasing amount of and growing library reliance on “born digital” materials, commercial and private sectors are giving more attention to creating environments in which these materials can be maintained for the long term.
The technical issues facing long-term preservation of the born-digital are the same as those for materials converted into digital form. To ensure that preservation goals are incorporated into long-term solutions, we must be active participants in their development. Libraries cannot wait for these solutions to be completely settled before testing the waters. Therefore, we must be prepared for persistent technological change.
ARL supports digitization as a reformatting strategy for preservation and will act as a catalyst in bringing communities together and will take a leadership role by providing a clearinghouse function for information, promoting the use of standards and best practices, and facilitating the implementation of these standards by institutions. ARL has the great potential to fulfill this role not only for the GPO Project, but also for the larger preservation community engaged in the digital reformatting of a wide range of resources.
“Libraries are society’s stewards of cultural and intellectual resources. For libraries to continue fulfilling their stewardship role, they will have to approach preservation in a new way. It must be integrated into every aspect of the library’s work. Preservation must be considered at the highest levels of the institution and reconceived in the digital environment.” (From the Preface to The State of Preservation Programs in American College and Research Libraries: Building a Common Understanding and Action Agenda, Deanna Marcum, President, Council on Library and Information Resources, December 2002.)
Libraries need a variety of reformatting strategies to meet the preservation demands of the many types of materials in collections. Each reformatting strategy has strengths and weaknesses that allow choices to be made based on the characteristics of the original material, the capabilities of each reformatting process, the needs of users, and cost. Digitization is now one additional reformatting strategy for preservation. The more options libraries have at their disposal, the more effective collection managers can be to meet the current needs of collections and the expectations of users now and in the future.
Benefits of Digitization as a Preservation Reformatting Option
Benefits of using digitization as a reformatting method for paper-based materials include:
The preservation community has established technical specifications to capture the content of printed works containing text and most types of printed images. Capture specifications for these materials are widely agreed upon. Work is ongoing to develop capture specifications for color images. The larger community of researchers and publishers has established standard encoding and acceptable levels of accuracy to enable word searching and retrieval of full text. Much work has been done to establish descriptive, structural and administrative metadata elements to ensure the discovery, structure, and management of digital objects now and into the future. Testing is underway to develop strategies to address media preservation and technological obsolescence. In areas where standards have not been finalized, best practices are in place to ensure that digital objects are being managed in such a way that keeps them safe now and allows us to implement long-term strategies as they emerge.
FRAMEWORK FOR CONTINUED ACCESS TO DIGITAL OBJECTS
Digital reformatting as a preservation option is done within a framework that ensures long-term management of the digital objects will be carried out in a standard, sustained and institutionally supported environment.
IMAGE CAPTURE & QUALITY CONTROL FOR DIGITAL IMAGES
Established guidelines are in place for image capture and processing to ensure images that are of high quality and are faithful reproductions. Digital image capture will be accomplished using appropriate hardware and software capable of meeting the guidelines for capture of black and white text, grayscale and color images. The digital objects will be properly oriented, ordered, and named to reflect the presentation of the original volume and will include accompanying technical targets and relevant information about image creation.
All digital files created for purposes of preservation reformatting will undergo strict quality control inspection, which will include checks for bibliographic integrity (completeness, legibility, and placement of images) and compliance with technical specifications.
FILE FORMATS FOR DIGITAL MASTER FILES
Digital files created as preservation master files will be saved in standards-based formats and data will be recorded to describe those formats.
PRODUCTION OF WEB-ACCESSIBLE IMAGES, OBJECT STRUCTURE AND TEXT FILES
Specific formats and interfaces are based on the nature of the original material and the anticipated use of the digital files. Image files representing pages of text can be accompanied by structural metadata to provide basic navigation at the page-level and to special features such as title pages, tables of contents and indices. Image files can be converted to text using optical character recognition (OCR) software or keyboarding and be encoded using standard schemas for web searching and retrieval of text. Standard file formats accessible from web browsers and standard encoding schemas will be used. Continued online access to digital objects will be supported through use of persistent identifiers.
Standards-based metadata provides descriptive information for discovery, structural information about the digital object, contextual information about the creation of the digital object, and administrative information to facilitate management over time.
Files created to standard and documented with appropriate metadata need to be managed within a long-term maintenance environment to remain accessible. Active management of digital files is necessary to handle the impermanence of optical and magnetic media and the rapid change in hardware and software configurations. Strategies include:
The effort to develop standards, guidelines, and best practices for using digital conversion as a preservation reformatting option is occurring within a much broader context. This provides additional confidence that the work that is already underway to build digital archives will be carried forward and finalized. There has been major commitment and funding in this area, including:
ANSI/NISO Z39.48-1992 (R1997) Permanence of Paper for Publications and Documents in Libraries and Archives (http://www.techstreet.com/cgi-bin/detail?product_id=36497)
ANSI/NISO/LBI Z39.78-2000 Library Binding (http://www.techstreet.com/cgi-bin/detail?product_id=229053)
CAMiLEON Project (http://www.si.umich.edu/CAMILEON, http://18.104.22.168/CAMiLEON/dh/ep5.html)
California Digital Library (http://www.cdlib.org/; http://www.cdlib.org/inside/diglib/
Digital Evans Edition (Early American Imprints) (http:www.readex.com/scholarl/eai_digi.html)
DLF (http://www.diglib.org/; www.diglib.org/standards/bmarkfin.htm)
DLF/OCLC Registry of Digital Masters (http://www.diglib.org/collections/reg/reg.htm)
DPC (http://www.dpconline.org/; http://www.dpconline.org/graphics/intro/index.html)
Dublin Core (http://www.dublincore.org)
EEBO (Early English Books Online)
Elsevier Science Policy for Archiving Electronic Journals (http://www.elsevier.com/inca/publications/misc/ni2164.pdf)
(GDFR) Global Digital Format Registry (http://www.ifla.org/IV/ifla69/papers/128e-Abrams_Seaman.pdf)
Getty Research Institute (http://www.getty.edu/research/conducting_research/standards/introimages/homepage.html)
GPO Legacy Digitization Project (http://www.gpoaccess.gov/about/reports/preservation.pdf; 04232004_IST11.pdf )
ISO 14721:2003 Space Data and Information Transfer Systems -- Open Archival Information System – Reference Model (http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=24683&ICS1=49&ICS2=140&ICS3)
JISC Digital Curation Centre (http://www.jisc.ac.uk/digcentre_townmeeting.html)
JSTOR (http://www.jstor.org/; http://www.jstor.org/about/earchive.html)
NARA Technical Guidelines (in progress) and ERA (http://www.archives.gov/electronic_records_archives/index.html)
NLM Permanence Rating (http://www.nlm.nih.gov/pubs/reports/permanence.pdf; www.rlg.org/events/pres-2000/byrnes.html)
NSF Digital Government Research Program (http://www.digitalgovernment.org/)(http://www.rlg.org/preserv/diginews/diginews6-5.html; http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=24683&ICS1=49&ICS2=140&ICS3=)
OCLC Preservation & Digital Services; Digital Archive (http://www.oclc.org/services/preservation/default.htm; http://www.oclc.org/digitalarchive/)
PURLS – Persistent Uniform Resource Locator (http://purl.oclc.org/)
RLG (http://www.rlg.org; http://www.rlg.org/longterm/repositories.pdf; http://www.oclc.org/research/projects/pmwg/pm_framework.pdf)
SGML -Standard Generalized Markup Language (http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=16387&ICS1=35&ICS2=240&ICS3=30)
UK Data Archive (http://www.data-archive.ac.uk/)
XML – Extensible Markup Language (http://www.w3.org/TR/REC-xml/)
Return to TOP