Recognizing Digitization as a Preservation Reformatting Method

 

Prepared for the ARL Preservation of Research Library Materials Committee

By:

Kathleen Arthur, Head, Replacement & Reformatting, University of Chicago
Sherry Byrne, Preservation Librarian, University of Chicago
Elisabeth Long, Co-Director, Digital Library Development Center, University of Chicago
Carla Q. Montori, Head, Preservation Division, University of Michigan
Judith Nadler, Associate Director, University of Chicago

June 2004

ARL 2004


Recognizing Digitization as a Preservation Reformatting Method

Cultural institutions serve the international community by building, protecting, preserving and ensuring continued access to diverse collections and resources.  The challenges of preserving collections have been addressed in different ways over time.  Libraries have used conservation to preserve the original artifact and reformatting strategies, such as microfilming and the creation of print facsimiles, to retain content, enhance access, and protect the original from excessive wear.  Over the past several years, libraries have moved towards using digitization as an additional method for reformatting endangered and fragile paper-based materials to both preserve and provide access to library collections.

The Association of Research Libraries endorses digitization as an accepted preservation reformatting option for a range of materials.  ARL encourages its members and others engaged in digital reformatting and those interested in initiating these activities to make an organizational and economic commitment to adhere to accepted standards and best practices, and to establish policies and the capacity to maintain digital products for the long-term.  ARL calls on the community of federal grants agencies, private foundations, and grants reviewers and panelists to give equal support to proposals that incorporate digital reformatting for preservation when these conditions are met.

Context

Libraries need to employ a variety of reformatting strategies to meet the demands of preserving the many types of materials in our collections.  Each reformatting method in use has strengths and weaknesses.  Choices have to be made based on the characteristics of the originals, the capabilities of each reformatting process, the current and anticipated needs of the user, and cost.  Evaluation is key - no one solution can fit all needs.  Our preservation programs must be multi-faceted and responsive to be effective.  The more preservation options libraries have at their disposal, the better collection managers can meet the current needs of collections as well as the expectations of users now and in the future. 

The choice to use digitization, or any reformatting option for preservation, is not prescriptive – it remains a local decision.  Institutions may choose to digitize only, to use one of the other reformatting methods, or to use a combination.  Many approaches are possible but digital reformatting should now be considered a valid choice among the various methods for preserving paper-based materials.  (See Appendix 1: Comparison of Reformatting Technologies for more details)

Benefits of Digitization as a Reformatting Strategy

A number of positive outcomes result from deploying digitization as a reformatting strategy.  Digitization increases the capture capability for many types of paper-based material, such as oversize and color items, for which there has been no effective reformatting strategy to date.  Functionality, such as zooming capabilities, allows users to examine more closely fine details and produce a variety of outputs to suit different needs.  Digital facsimiles better reproduce the navigational experience of a book than does the linear format of microfilm.  Although the preservation of paper-based materials is the primary focus of this document, digitization also has the potential to capture information currently recorded on many other media and may be the only method to preserve this material.    

When digital facsimiles of print materials are made accessible via the World Wide Web, the widest range of users has equal access to collections from any location whether they are on- or off-site.   A virtual environment of digital files can combine content from many kinds of resources, including primary source material, and provide powerful opportunities to integrate materials seamlessly into instruction and course management systems for teaching and learning.  Digitization allows users to create virtual collections that will support new and creative research made possible only in a digital environment.  (See Appendix 2: Benefits of Digitization as a Preservation Reformatting Option for more details)

Standards and Best Practices

Ensuring high-quality image capture and providing for the long-term viability of digital objects is an admitted challenge, but the library profession has a long history of developing standards and best practices in order to support sustainable operations and facilitate inter-institutional collaboration. This tradition provides confidence that digital preservation challenges will be met.

Reformatting

Institutions and collaborative organizations have identified issues and risks involved in creating and maintaining digital objects and have made significant progress to establish what needs to be in place.  There is already wide consensus and acceptance in the preservation community and among practitioners in the field about the creation of digital masters for paper-based materials.  There are established guidelines for image capture and processing to ensure that images are of high quality and provide faithful representations of the original.  Standards-based-file formats and file compression practices are in use.

Metadata

In the area of metadata, there are still challenges to overcome.  Standards have been established for recording bibliographic descriptions.  Work is in progress to specify element sets and formats for preservation and administrative metadata, to facilitate dissemination and maintenance of digital facsimiles over time.  PREMIS, METS, and MODS are examples of ongoing metadata initiatives.  Regular reports and open discussion as these standards are developed allow digitization programs to capture and record information that will be necessary once the standards are finalized.

Preservation of the Digital Object

Experience to date shows that files can be preserved and refreshed in the short-term.  While there are many challenges in managing digital objects over time, institutions are actively engaged in developing solutions to ensure integrity and authenticity, address media and technological obsolescence, and provide long-term accessibility.  Efforts are also underway to identify system designs and business models for sustaining large collections of digital objects.  In the meantime, practices are in place to ensure we are capturing and recording sufficient information and managing digital objects to keep them safe now. Strategies to keep master files safe for the short-term include the use of high-quality and reliable storage media, multiple back-up systems, periodic testing, and a schedule to refresh data.  These short-term strategies are a bridge to the emerging solutions that are being developed to ensure long-term availability and access.  (See Appendix 3: Standards and Best Practices in Digital Reformatting for more details)

National and Local Commitments

At the national level, there is commitment to and funding for the development of infrastructures to support digital preservation.  The Library of Congress (LC) has been charged to develop a National Digital Information Infrastructure for Preservation (NDIIPP); the National Archives and Records Administration (NARA) has undertaken creation of an Electronic Records Archive (ERA); and the Government Printing Office (GPO) has established its Legacy Digitization Project for scanning the collections of the Federal Depository Program.  The National Library of Australia (NLA) has done extensive work within its Preserving Access to Digital Information (PADI) initiative, and Great Britain’s Joint Information Systems Committee (JISC) supports a variety of digital preservation initiatives including the UK Data Archives.  The Consultative Committee for Space Data Systems (CCSDS) has developed a widely accepted reference model for an Open Archival Information System (OAIS), which provides a conceptual framework and common terminology for describing the elements and functions of a digital archive.  (See Appendix 4:  Current State of Commitment to Long-Term Preservation of Electronic Resources for more details).

Individual institutions and other organizations have also undertaken important work to develop models, build and test digital archives, create standards, and develop systems and software to support long-term digital preservation
The Global Digital Format Registry is being designed to capture and share information about a wide variety of file formats.  Tools, such as JHOVE, are available to validate file formats as standard.  Systems, such as LOCKSS, are being developed for managing digital archives and for exchanging digital content and strategies are being tested for migrating and checking digital data over time.  Libraries continue to play a leadership role by recommending and testing standards and by actively contributing to broader efforts towards solutions for preservation of digital resources.  (See Appendix 3 for more details).

ARL has formed a Working Group on Digitizing Government Document Collections that commissioned a business plan and is making recommendations for a coordinating role for ARL and next steps.  Furthering GPO’s goal of preserving content permanently and improving public access through derivative files, the project will establish a baseline for the creation of digital preservation quality master files, a minimum threshold that must be met by all participating institutions.  It will also develop specifications and guidance for the creation of metadata, address requirements for preserving digital files, and develop a governance structure.  This is a significant and long-term effort that will further consolidate and document standards, practices, and other guidance needed by those engaged in digital reformatting for preservation. 

Time for Action

The time is right to adopt digitization as a reformatting strategy for preservation.  “As more and more is born digital and a new generation of users grows up with digital as the default mode of delivery, resources that are not in digital form will be ‘orphaned’ over time because they are in ‘obsolete’ formats.” (Abby Smith, Council on Library and Information Resources (CLIR), e-mail message, March 29, 2004.)   Concurrently, in response to the increasing amount of and growing library reliance on “born digital” materials, commercial and private sectors are giving more attention to creating environments in which these materials can be maintained for the long term. 

The technical issues facing long-term preservation of the born-digital are the same as those for materials converted into digital form.  To ensure that preservation goals are incorporated into long-term solutions, we must be active participants in their development.  Libraries cannot wait for these solutions to be completely settled before testing the waters.  Therefore, we must be prepared for persistent technological change. 

ARL supports digitization as a reformatting strategy for preservation and will act as a catalyst in bringing communities together and will take a leadership role by providing a clearinghouse function for information, promoting the use of standards and best practices, and facilitating the implementation of these standards by institutions.  ARL has the great potential to fulfill this role not only for the GPO Project, but also for the larger preservation community engaged in the digital reformatting of a wide range of resources.

“Libraries are society’s stewards of cultural and intellectual resources.  For libraries to continue fulfilling their stewardship role, they will have to approach preservation in a new way.  It must be integrated into every aspect of the library’s work.  Preservation must be considered at the highest levels of the institution and reconceived in the digital environment.”  (From the Preface to The State of Preservation Programs in American College and Research Libraries:  Building a Common Understanding and Action Agenda, Deanna Marcum, President, Council on Library and Information Resources, December 2002.)



Appendix 1:
Comparison of Reformatting Technologies

Libraries need a variety of reformatting strategies to meet the preservation demands of the many types of materials in collections. Each reformatting strategy has strengths and weaknesses that allow choices to be made based on the characteristics of the original material, the capabilities of each reformatting process, the needs of users, and cost.  Digitization is now one additional reformatting strategy for preservation.  The more options libraries have at their disposal, the more effective collection managers can be to meet the current needs of collections and the expectations of users now and in the future. 

Microform Facsimile
Pros

Microform Facsimile
Cons

Printed Facsimile
Pros

Printed Facsimile
Cons

Digital Facsimile
Pros

Digital Facsimile
Cons

Appendix 2:
 Benefits of Digitization as a Preservation Reformatting Option

Benefits of using digitization as a reformatting method for paper-based materials include:

Appendix 3:
Standards and Best Practices in Digital Reformatting

The preservation community has established technical specifications to capture the content of printed works containing text and most types of printed images.  Capture specifications for these materials are widely agreed upon.  Work is ongoing to develop capture specifications for color images.  The larger community of researchers and publishers has established standard encoding and acceptable levels of accuracy to enable word searching and retrieval of full text.  Much work has been done to establish descriptive, structural and administrative metadata elements to ensure the discovery, structure, and management of digital objects now and into the future.  Testing is underway to develop strategies to address media preservation and technological obsolescence.  In areas where standards have not been finalized, best practices are in place to ensure that digital objects are being managed in such a way that keeps them safe now and allows us to implement long-term strategies as they emerge. 

FRAMEWORK FOR CONTINUED ACCESS TO DIGITAL OBJECTS

Digital reformatting as a preservation option is done within a framework that ensures long-term management of the digital objects will be carried out in a standard, sustained and institutionally supported environment.

PRE-SCANNING PREPARATION FOR DIGITAL CONVERSION OF PRINT MATERIAL

Work to prepare documents for reformatting is firmly grounded in long-established routines for microfilm reformatting preparation.  Steps taken will include preparation of the original to ensure each volume is complete and that it has not already been preserved.

IMAGE CAPTURE & QUALITY CONTROL FOR DIGITAL IMAGES

Established guidelines are in place for image capture and processing to ensure images that are of high quality and are faithful reproductions.  Digital image capture will be accomplished using appropriate hardware and software capable of meeting the guidelines for capture of black and white text, grayscale and color images. The digital objects will be properly oriented, ordered, and named to reflect the presentation of the original volume and will include accompanying technical targets and relevant information about image creation. 

All digital files created for purposes of preservation reformatting will undergo strict quality control inspection, which will include checks for bibliographic integrity (completeness, legibility, and placement of images) and compliance with technical specifications.

FILE FORMATS FOR DIGITAL MASTER FILES

Digital files created as preservation master files will be saved in standards-based formats and data will be recorded to describe those formats.

PRODUCTION OF WEB-ACCESSIBLE IMAGES, OBJECT STRUCTURE AND TEXT FILES

Specific formats and interfaces are based on the nature of the original material and the anticipated use of the digital files.  Image files representing pages of text can be accompanied by structural metadata to provide basic navigation at the page-level and to special features such as title pages, tables of contents and indices.  Image files can be converted to text using optical character recognition (OCR) software or keyboarding and be encoded using standard schemas for web searching and retrieval of text.  Standard file formats accessible from web browsers and standard encoding schemas will be used.  Continued online access to digital objects will be supported through use of persistent identifiers.

METADATA

Standards-based metadata provides descriptive information for discovery, structural information about the digital object, contextual information about the creation of the digital object, and administrative information to facilitate management over time.

DIGITAL PRESERVATION

Files created to standard and documented with appropriate metadata need to be managed within a long-term maintenance environment to remain accessible. Active management of digital files is necessary to handle the impermanence of optical and magnetic media and the rapid change in hardware and software configurations.  Strategies include:

Appendix 4:
Current State of Commitment to Long-Term Preservation of Electronic Resources

The effort to develop standards, guidelines, and best practices for using digital conversion as a preservation reformatting option is occurring within a much broader context.  This provides additional confidence that the work that is already underway to build digital archives will be carried forward and finalized.  There has been major commitment and funding in this area, including:

 Appendix 5:
Relevant Websites

ANSI/NISO Z39.48-1992 (R1997) Permanence of Paper for Publications and Documents in Libraries and Archives (http://www.techstreet.com/cgi-bin/detail?product_id=36497)

ANSI/NISO/LBI Z39.78-2000 Library Binding (http://www.techstreet.com/cgi-bin/detail?product_id=229053)

ARL (http://www.arl.org/preserv/)

CAMiLEON Project (http://www.si.umich.edu/CAMILEON, http://129.11.152.25/CAMiLEON/dh/ep5.html)

California Digital Library (http://www.cdlib.org/; http://www.cdlib.org/inside/diglib/ 

CLIR (http://www.clir.org/)

Digital Evans Edition (Early American Imprints) (http:www.readex.com/scholarl/eai_digi.html)    

DLF  (http://www.diglib.org/; www.diglib.org/standards/bmarkfin.htm)

DLF/OCLC Registry of Digital Masters (http://www.diglib.org/collections/reg/reg.htm)

DNEP (http://www.kb.nl/kb/resources/frameset_kb.html?/kb/menu/ken-arch-en.html)

DOI (http://www.doi.org/)

DPC (http://www.dpconline.org/; http://www.dpconline.org/graphics/intro/index.html)

D-Space  (http://www.dspace.org/)

Dublin Core  (http://www.dublincore.org)

EAD (http://www.loc.gov/ead/)

EEBO (Early English Books Online) 

EPrints  (http://www.eprints.org/)

Elsevier Science Policy for Archiving Electronic Journals (http://www.elsevier.com/inca/publications/misc/ni2164.pdf)

FEDORA (http://www.fedora.info/)

(GDFR) Global Digital Format Registry (http://www.ifla.org/IV/ifla69/papers/128e-Abrams_Seaman.pdf)

Getty Research Institute (http://www.getty.edu/research/conducting_research/standards/introimages/homepage.html)

GPO Legacy Digitization Project (http://www.gpoaccess.gov/about/reports/preservation.pdf;  04232004_IST11.pdf )

ISO 14721:2003 Space Data and Information Transfer Systems -- Open Archival Information System – Reference Model (http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=24683&ICS1=49&ICS2=140&ICS3)   

JISC (http://www.jisc.ac.uk/)

JISC Digital Curation Centre (http://www.jisc.ac.uk/digcentre_townmeeting.html)

JHOVE  (http://hul.harvard.edu/jhove/)

JSTOR  (http://www.jstor.org/; http://www.jstor.org/about/earchive.html)  

LOCKSS  (http://lockss.stanford.edu/)

MARC (http://www.loc.gov/marc/)

METS  (http://www.loc.gov/standards/mets/

MODS  (http://www.loc.gov/standards/mods/)

NARA Technical Guidelines (in progress) and ERA  (http://www.archives.gov/electronic_records_archives/index.html)

NDIIPP (http://www.digitalpreservation.gov/index.php?nav=1)

NISO (http://www.niso.org/standards/resources/Z39_87_trial_use.pdf)

NLM Permanence Rating (http://www.nlm.nih.gov/pubs/reports/permanence.pdf; www.rlg.org/events/pres-2000/byrnes.html)

NSF Digital Government Research Program (http://www.digitalgovernment.org/)(http://www.rlg.org/preserv/diginews/diginews6-5.html; http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=24683&ICS1=49&ICS2=140&ICS3=)

OAI  (http://www.openarchives.org/)

OAIS (http://www.rlg.org/longterm/oais_schematics.html;

OCLC Preservation & Digital Services; Digital Archive  (http://www.oclc.org/services/preservation/default.htm; http://www.oclc.org/digitalarchive/)

PADI  (http://www.nla.gov.au/padi/)

PDF/A (http://www.aiim.org/pdf_a/)

PREMIS (http://www.oclc.org/research/pmwg/)

PURLS – Persistent Uniform Resource Locator (http://purl.oclc.org/)

RLG (http://www.rlg.org; http://www.rlg.org/longterm/repositories.pdf; http://www.oclc.org/research/projects/pmwg/pm_framework.pdf)  

SGML -Standard Generalized Markup Language  (http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=16387&ICS1=35&ICS2=240&ICS3=30)    

TEI (http://www.tei-c.org/)

UK Data Archive (http://www.data-archive.ac.uk/)

XML – Extensible Markup Language (http://www.w3.org/TR/REC-xml/)

Return to TOP

Preservation Home Page
The Association of Research Libraries
Maintained by: ARL Web Administrator   Site Design Consultant: Webster Media    Last Modified: July 22, 2004