Navigation Bar

Site Index
General Guidelines for Scanning

This Document Is Superceded by: Western States Digital Imaging Best Practices

General Guidelines for Scanning

Printer Friendly PDF (586k)

CDP Scanning Working Group
Spring 1999:

Technical Writer: Erin Rhodes

NOTE: Libraries, archives, museums, and educational institutions may freely use these guidelines, with appropriate credit given to the Colorado Digitization Program. Others should contact the Project Director for permission to copy or make other use.


Table of Contents


Introduction

The goal of the Colorado Digitization Program is to provide the people of Colorado access to Colorado's unique and special collections in digital format through the collaborative effort of Colorado archives, libraries, museums, and historical societies. By providing these standards, the Colorado Digitization Program is not advocating that every organization buy their own equipment and scan in-house. Rather, we suggest that organizations investigate the many commercial vendors that provide high-quality imaging services to libraries, archives, and museums; or that organizations look to regional-based or inter-institutional cooperative ventures before beginning a digital imaging project or investing in equipment.

Although many digitization projects in Colorado are scanning for purposes of increased access to their collections, preservation is often a natural by-product of any digitization project. The Colorado Digitization Program advocates that organizations approach digitization projects with a "preservation mindset." This mindset implies:

  • Scanning at the highest resolution appropriate to the informational content of the originals
  • Scanning at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future--scan once
  • Creating and storing a master image file that can be used to produce derivative image files and serve a variety of current and future user needs
  • Using system components that are non-proprietary
  • Using image file formats and compression techniques that conform to industry standards
  • Creating backup copies of all files on a stable medium
  • Creating meaningful metadata for image files or collections
  • Storing media in an appropriate environment
  • Monitoring and recopying data as necessary
  • Outlining a migration strategy for transferring data across generations of technology
  • Anticipating and planning for future technological developments

Although this document suggests the minimum standards, organizations should not just "do the minimum." Scanning at a higher resolution than this document recommends is encouraged.

Back to Top

Purpose

The purpose of this document is to offer guidance and to provide minimum scanning recommendations to Colorado institutions that are planning for or involved in digitization projects. It is aimed towards institutions that have or are up the equipment and expertise to scan in-house. This document addresses the more standard formats of text, photographs, maps, and graphic materials. If you are planning to scan primarily oversize materials, bound materials, or materials in non-standard formats and sizes, you might consider outsourcing these materials to imaging vendors. A list of vendors is supplied in Appendix C.

These recommendations have been developed in order to: (1) ensure a consistent, high level of image quality across collections; (2) encourage widespread and convenient access to digital images by supporting the use of standard or industry formats that are widely accepted; and (3) decrease the likelihood of rescanning in the future by promoting best practices for conversion of materials into digital format. The recommendations we make in this document are purposely broad enough to apply to a variety of institutions and collections, and attempt to synthesize different recommendations previously made for specific institutions or projects.

  • These standards are not intended to be used as the de facto standard for digital imaging, but rather as guidelines for image capture, presentation, and storage. Obviously, inherent or unique characteristics of different source materials necessitate different approaches to scanning, and conversion requirements for digital projects should be considered on a case-by-case basis (particularly for grant projects with specific requirements).
  • To create archival or high quality images for long-term access, it is important to scan once to serve as many purposes as possible. You may not have the resources in the future to go back and scan again. Derivative files can then be made from the digital master file.
  • We emphasize that we are providing the minimum scanning recommendations we feel are necessary for responsible conversion and for achieving an acceptable level of image quality. These minimum recommendations are reliant upon technical capabilities and limitations at this current point in time.
  • Scanning at higher resolutions or pixel depth than recommended in this document is encouraged, depending on the characteristics of the source material and the intended use of the image. As a general rule of thumb, institutions should scan at the highest resolution appropriate to the nature of the source material.
  • We recommend that a sample from any collection be digitized prior to full implementation of the project in order to determine what resolution, file formats, storage mechanisms, and mode of access best suits the characteristics of the source materials and the intended uses of the digital surrogates.
  • We recommend that the original, or first generation (i.e., negative rather than print), of the source material be scanned to achieve the best quality image possible (unless the original or first generation is too fragile to be handled during scanning).
  • We recommend that you avoid scanning into proprietary formats for long-term imaging projects. Store images in standardized file formats and use system components that conform to open or non-proprietary standards. (Proprietary formats may include vendor-specific formats that are compatible only with specific software or on specific platforms. The CDP suggests that you use file formats that are compatible on most major platforms and are viewable in major Internet browsers).
Back to Top

Scope

What is addressed in this document:

  • Scanning and file format recommendations for:
    • Text, photographs, maps, and graphic materials
  • Suggested hardware configurations
  • Software considerations
  • Quality control, file naming, scanner and monitor calibration, targets and color bars, storing images, and recording and verification of CD-ROMs.

What is not addressed in this document:

Guidelines for workflow, metadata standards, and selection for digitization are addressed in other, complementary documents on the Colorado Digitization Program website.

Because technology and industry standards and preferences are constantly improving and changing, we view this as a continually evolving document. We welcome your comments and suggestions.


Back to Top


In-house or Outsource?

Every organization should carefully consider the pros and cons of outsourcing digitization projects or conducting them in-house. Following are some points to consider for both strategies:

In-house pros:

  • Development of digital imaging project experience by "doing it" (project management, familiarity with technology, etc.)
  • More control over the entire imaging process as well as handling and storage of originals
  • Requirements for image quality, access, and scanning can be adjusted as you go instead of defined up front
  • Direct participation in development of image collections that best suit your organization and users

In-house cons:

  • Requires large initial and ongoing financial investment in equipment, staff
  • Longer time needed to implement imaging process and technical infrastructure
  • Limited production level
  • Staffing expertise not always available
  • Institution must accept costs for network downtime, equipment failure, training of staff, etc.
  • Lack of standards and best practices

Outsourcing pros:

  • Pay for cost of scanning the image only, not equipment or staffing
  • High production levels
  • On-site expertise
  • Less risk
  • Vendor absorbs costs of technology obsolescence, failure, downtime, etc.

Outsourcing cons:

  • Organization has less control over imaging process, quality control
  • Complex contractual process: image specifications must be clearly defined up front, solutions to problems must be negotiated, communication must be open, and problems must be accommodated
  • Vendor many know more than client or may presume a level of understanding on part of library/museum/archives that they may not have
  • Lack of standards with which to negotiate services and to measure quality against
  • Originals must be transported, shipped, and then also handled by vendor staff
  • Possible inexperience of vendor with library/archival/museum/historical society communities

(See Kenney and Chapman, "Digital Imaging for Libraries and Archives" for further information).


Back to Top



Sources Consulted

There are many excellent models and resources available in print and online that have articulated different standards for scanning a variety of materials. The Colorado Digitization Program drew upon the following resources in developing these recommendations. We suggest that institutions review the following resources for further information on scanning practices and procedures prior to implementing a digitization project.

Additionally, we recommend Howard Besser's and Jennifer Trant's excellent online tutorial, "Introduction to Imaging" for a primer on the vocabulary and technology of an imaging project and the issues surrounding the construction an image database. This tutorial can be found at: http://www.getty.edu/redirect/redirect_gri_intromages.html

Besser, Howard. Best Practices for Image Capture. Word Document. 1999.
http:// sunsite.berkeley.edu/Imaging/Databases/Scanning/

Besser, Howard. Procedures and Practices for Scanning.
http://sunsite.berkeley.edu/Imaging/Databases/Scanning/

California Digital Library: Digital Object Standard: Metadata, Content and Encoding (May 18, 2001)
http://www.cdlib.org/about/publications/CDLObjectStd-2001.pdf

California Digital Library: Digital Image Format Standards (July 9, 2001)
http://www.cdlib.org/about/publications/CDLImageStd-2001.pdf

Fleischauer, Carl. Digital Formats for Content Reproductions. Library of Congress, July 13, 1998.
http://memory.loc.gov/ammem/formats.html

Kenney, Anne and Steven Chapman. Digital Imaging for Libraries and Archives. Ithaca: New York, Department of Preservation and Conservation, Cornell University Library, June 1996.

Macklin, Lisa A. and Sarah L. Lockmiller. Digital Imaging of Photographs: A Practical Approach to Workflow Design and Project Management. Chicago: Illinois, American Library Association, 1999.

Puglia, Steven and Barry Roginski. NARA Guidelines for Digitizing Archival Materials for Electronic Access. National Archives and Records Administration, January 1998.
http://www.nara.gov/nara/vision/eap/digguide.pdf

Technical Advisory Services for Images. An Introduction to Making Digital Image Archives. University of Bristol, United Kingdom.
http://www.tasi.ac.uk/advice/overview.html

International Imaging Industry Association (I3A) is a consortium of image and imaging companies engaged in the development of digital imaging standards and technologies for the consumer market. The membership of more than 80 participants includes Eastman Kodak, Digimarc, Fuji, and Polaroid. According to their white paper, their metadata standards are based on XML formats and are designed to support the evolution of digital imaging technology over the next 5-10 years. Because of their near-future and consumer, home use, focus these standards are not appropriate for libraries, museums, archives, and historical societies. http://www.I3A.org


Back to Top

System Components

The goals of any digitization project will have an influence on the equipment needed to accomplish the project. Scanning, organizing, tracking, editing, describing and indexing, displaying, and storing the images all require different equipment, as well as different levels of staff expertise. Choice of equipment depends primarily on the types of materials you plan to scan as well as the intended uses of the digital images. Several sources, such as computing magazines and equipment manufacturers, offer informative reviews of hardware, scanners, and software. We recommend that prior to purchasing system components, you review the current literature available both online and in print for the latest developments and reviews. Several such sources are listed at the end of this section.

Some things to consider when choosing specific components for your system:

  • What does your current technical infrastructure support (staff, networks)?
  • What is the level of staff expertise? (See Staffing Considerations)
  • How will the images ultimately be accessed (stand-alone station, Web, Intranet)?
  • How will the images be stored? Do you have sufficient storage capacity for storing potentially large image files? (See Storing Images)
  • Who are your users and potential users? How will they use the images? (See User Groups)
  • How much is your institution willing to spend (and continue to spend) on equipment costs?

Hardware

A powerful computer is the basis of any digitization project. A dedicated computer should be used specifically for the imaging project. Below is a list of suggested basic equipment:

  • PC/Macintosh
  • Large screen monitor (preferably 21")
  • Scanner or digital camera
  • CD-recorder
  • Back-up drive (DAT, zip® or jaz®)
  • Color/black and white printer (optional)
  • Network

Below, we suggest a set of ideal and minimum hardware specifications for a small-scale, in-house scanning operation. Note that we are not recommending specific brands of equipment, but rather, suggested configurations.

Suggested minimum configuration as of this writing:

  • 500 MHz (or highest current processing speed available)
  • 128 MB (or larger) RAM
  • 8-16 MB video RAM
  • 21" monitor
  • At least 9 GB (13 GB preferred) hard drive with expansion slot for additional hard drive
  • high-density removable drive (zip® or jaz®)
  • CD-ROM (or DVD) drive (48x maximum speed)
  • ISO 9660 compliant CD-ROM recorder

Suggested ideal configuration as of this writing:

  • 1-1.6 GHz (or highest current processing speed available)
  • 256 MB (or larger) RAM
  • 16 or 32 MB video RAM
  • 21" monitor
  • At least 9 GB (13 GB preferred) hard drive with expansion slot for additional hard drive
  • high-density removable drive (zip® or jaz®)
  • CD-ROM (or DVD) drive (48x maximum speed)
  • ISO 9660 compliant CD-ROM recorder

A second PC/Mac station for image editing functions is recommended. A separate PC/Mac exclusively for scanning will increase production. The two stations can be configured as a network (using Windows NT or Appletalk (Mac), for example). Other networked configurations are possible.

PC/Macintosh

The scanning computer is critical to any digitization project. A computer with the highest processing speed available will have a tremendous impact on workflow and will facilitate faster production rates. Since image files are so large, an appropriate amount of RAM is also necessary for image scanning and editing. Ideally, saving image files to a server before or after editing might be preferable in order to avoid running out of file space for organizing and storing images in production. When images have undergone final editing and quality control, they can then be uploaded to a server and/or downloaded to a CD-recorder.

Monitor

A good monitor is critical to image manipulation and editing. It should have a large screen size, high resolution, a high refresh rate, be flicker-free, and it should support adequate video RAM in order to produce images that are a close representation of the original item scanned. A high quality PC and monitor are well worth the money they will require; don't go cheap at the expense of quality.

CD Recorder

If you plan on saving the images to CD (recommended), you will need an internal or external CD-recordable drive. It should be ISO 9660 compliant and should support a variety of formats. CD recorders are defined by recording and playback speed. An acceptable option at this time is 4x record by 6x playback. In addition, we recommend that you use CD-ROM recorders instead of DVD recorders and readers at this point in time, until DVD technology matures and becomes more widely accepted and supported. DVD technology promises backward-compatibility with CD-ROMs. Make sure that the CD recorder, as well as peripherals such as the scanner and backup mechanisms, are compatible with your computer.

Backup Mechanisms

We recommend that you backup image files before transferring them to CD or storing them on a network server. Acceptable backup options for the short term include internal or external tape (DAT) drives or a zip® or jaz® drive. In case of disaster, we recommend that you provide storage for image files on a second, secure, hard-copy (i.e., CD or tape) offsite as well, especially if your only other copies are on the server or on a CD or tape copy onsite.

Printer

If you plan to print out quality copies of the images, you will want to consider a color and/or black and white printer capable of printing at least 1200 dpi, particularly if you plan to sell prints of digitized images (in which case, you may want to invest in a printer that can print on glossy paper).


Back to Top

Scanners

There are many types of scanners available, in many different sizes and levels of quality. Choosing a scanner depends on your project goals and on the format, type, and size of the media you intend to scan. Other factors to consider when choosing a scanner include optical resolution, bit depth, scan area size, scan time, and functionality for scanning different formats. These factors will affect the quality of the digital image and must be taken into consideration when purchasing a scanner.

Resolution

The resolution at which you scan is one of the factors that will determine the quality of your images. Resolution is often expressed as an array--the number of pixels across both dimensions of an image ( or more simply as 3000 pixels across the long side), as dpi (dots per inch), or as ppi (pixels per inch). Higher dpi settings will generally yield a better digital image, because they place more pixels (therefore, information) in an inch than do the lower dpi settings. However, the higher the dpi, the larger your file size. You must take into account your server or computer storage capacity when determining resolution settings, and balance that against the goals of your project. Scanning at a high resolution is recommended if you are planning to convert an important collection into digital form to increase access and to build a virtual archive, generate "archival" images, or make prints of the digital image on a good printer. There is a threshold to resolution, however. After a certain point, increasing resolution will not yield a better image.

There are two different types of resolution: optical and interpolated. Optical resolution is the inherent resolution of the scanner, and is usually expressed as a pixel array (i.e., 1000 x 2000). The first number refers to the limit of the CCD array on your scanner (the short dimension), and the second number refers to a number determined by the movement of the CCD array across the long dimension of your scanner.

Interpolated resolution is calculated by software from a lower resolution image file. This is often performed during or after scanning. A higher optical resolution in a scanner is better than interpolated resolution. The specifications for the resolution at which you scan should represent actual optical resolution rather than values achieved by interpolation.

Bit Depth

Bit depth also has an effect on image quality. Bit depth is the number of bits of data representing each pixel in the image. A bit can have two values, 0 or 1. If an object is scanned at a bit depth of 8, it can have 256 possible colors. A bit depth of 24 produces over 16 million colors. Bit depth also has an effect on file size: as bit depth increases, the file size increases arithmetically. Scanners generally sample at a higher bit depth and then the final image output is sampled down to a lower bit depth. Sampling at a higher bit depth aids in reducing noise, extends the possible tonal range of the image, and allows the scanner to capture a larger density range without loss of detail.

Scanner Types

Common Scanner Types:

  • Flatbed scanner
  • Slide scanner
  • Microfilm scanner
  • Drum scanner
  • Sheetfed scanner
  • Digital camera

Flatbed scanners are one of the most popular scanners used in libraries and archives and are suitable for scanning papers, flat photographs, and other printed materials. Flatbeds can be purchased with an optional attachment called a transparent media adapter, which allows you to scan directly from slides or negatives. However, transparency adapters do not always produce as high a quality of image as a slide or film scanner. If you plan to scan predominantly transparent materials that are smaller than 4 x 5, you may want to consider a slide or a film scanner (there are some slide/film scanners that can handle larger transparent formats). Scanners that combine flatbed scanner capabilities and 35mm slide capabilities are also on the market. Some slide scanners can deliver a better dynamic range than flatbeds; however, the resolution may not be sufficient to create digital masters or meet the resolution requirements of some users.

If your collection contains predominantly oversized materials, you may want to consider outsourcing the scanning to an imaging vendor or purchasing a high-end digital camera that can capture oversize materials, which works much like a copystand setup. There are also flatbed scanners that handle originals that are 12" x 17", and some flatbed scanners can accommodate even larger sizes, although they tend to take up considerable space and produce enormous file sizes.

Some participants in the CDP Project have asked about drum scanners. In general, the CDP does not recommend them for formats of significant value or that are fragile or brittle in any way, as drum scanners can cause a great deal of stress to the document. The original is also taped to the rotating cylinder, so consider how this may also affect the document. Drum scanners are designed for the graphic arts community and, as such, provide an extremely high level of resolution. Drum scanners can scan transparent as well as reflective media, in grayscale and color.
Scanner Suggestions for Various Material Types
Single leaf, regular size, flat materials Single leaf, oversized, flat materials Bound materials Transparent media
  • Flatbed scanner
  • Sheetfed scanner (if non-brittle)
  • Digital camera
  • Oversize flatbed scanner
  • Sheetfed scanner (if non-brittle)
  • Digital camera
  • Digital camera with book cradle
  • Right angle, prism, or overhead flatbed scanner
  • Slide scanner
  • Film scanner
  • Multi-format flatbed scanner
  • Digital camera

Not all scanners take the same amount of time to scan the same image at the same resolution. If high production levels are important, it will be necessary to look at the time it takes for both preview and full scan images of materials similar to what you intend to scan. In general, flatbed/slide scanners accommodate a higher production rate than digital cameras, but they also are limiting in the size and type of media formats they are able to scan.

All electronic devices suffer from "noise," which often appears on scans as blotchy or matte-like areas in the dark shadow parts of an image when these areas are lightened or have their contrast range increased. Noise can obscure details in the shadows. Higher quality scanners, with higher bit depths, will give better results, as they tend to use higher quality (lower noise) components.

Back to Top

A Word About Digital Cameras

At this point in time, we feel that commercially available, hand-held digital cameras are not suitable for archival scanning, excepting the high-end digital cameras (Kontron, Zeutschel, Leica) used by several larger institutions and imaging vendors. High-end digital cameras have no scanning limitations when it comes to size and shape, and can scan at an extremely high resolution (up to 15,000 pixels across the long dimension). They do require certain lighting requirements and a high level of operator skill. However, if you can afford a high-end, overhead digital camera, they present great potential for scanning oversize materials, media in all formats, bound materials with the aid of a book cradle, and present a lower risk to fragile materials by allowing face-up, contact-free scanning.

Digital Camera Reviews http://www.steves-digicams.com/digresources.html#reviews
Digital Photography Online http://www.digital-photography.org/default.html
Digital Camera Online http://www.digicamera.com/
Leica http://www.leica-camera.com/digi_img/digi_sys_e.htm


Back to Top

Scanner software

There are two types of software that you will need for most digital imaging projects. The first is the scanning software that comes with the scanner. The second type of software is the image editing software, normally applied to the image after it has been scanned. Some software, such as Adobe Photoshop®, can serve as both the scanning software and the image editing software. The scanning software is usually limited in its functionality. You should choose scanning software that is at least capable of saving image files into standard formats such as TIFF, JPG, GIF, etc. This functionality will help production and also ensure a wide range of image delivery options. Software that converts image files from one format to another may also be useful.

To produce images of acceptable quality, it is important to invest in image editing software, which is normally used for "cleaning up" an image (removing dust spots, for example) and for correction (adjusting the level of brightness and contrast, for example). Image editing software should come with the capability to crop, deskew, and rotate; adjust brightness and contrast levels; sharpen (if needed); zoom in and out; accommodate different file formats; provide controls for gamma, black and white, and color (RGB); provide a histogram and look-up table; support compression types; and possess the capability for the user to create and save customized settings, among other functions.

The choice of image editing software is based on the level of image manipulation desired for your project and the level of expertise of staff. Some image editing software, such as Adobe Photoshop®, is very advanced, and may require some time and training to learn. Other software is more basic and allows for only limited operations, such as cropping and rotating, and is not difficult to master. Consider the range of operations you will normally need to perform. The cost of this software can range from free (freeware) to several hundreds of dollars. When considering cost, think about not only the cost of the product, but also how easy it is to use—and factor in additional costs for training, accordingly.

In addition to considering the capability and usability of image editing software, make sure that your current technology can support the software. Do you have the appropriate amount of memory, hard drive space, processor power, and display capabilities (a 24-bit color display card is recommended for image editing work)?

The amount of image editing performed on the images should be defined in your project goals, possibly decided in consultation with the collection curator or an archivist or librarian who is knowledgeable about the materials being scanned. Some digitizing projects aim to create a "pleasing image" that may require a great deal of editing. Other projects may be more concerned with the fidelity of the digital image to the original (this may be important to scholars), and may require very minimal editing. Do you intend to match the digital image as closely as possible to the original? Are you more concerned with the photographer's/creator's intent when editing the digital image (i.e., high contrast; scanner operator makes decisions about tone and color values of the digital image)? Or are you more concerned with reconstructing the appearance of the original as it would have existed when first created (to digitally reconstruct deteriorated originals)? What constitutes a "good image" for the purposes of your project--a faithful reproduction or a pleasing image--should be defined prior to scanning.

Additional Software

Batch processing software, such as Equilibrium's DeBabelizer Pro http://www.equilibrium.com/ or ThumbsPlus http://www.cerious.com/featuresv4.shtml, takes a large set of files and automatically performs the same process on them. This type of software is useful for the generation of thumbnails and access images (making JPEGs and/or GIFs from TIFFs, for example), converting from one file format to another, or compressing files. Some image editing software includes batch processing capability; if not, this software might be worth considering.


Back to Top


The following is not an exhaustive list of the hardware and software available as of this writing. By listing the following sources, we do not imply endorsement of any of the products or publications. They are listed simply as a guide to possible options.

Some General Sources for Hardware and Scanner Reviews

PC Magazine Online http://www.pcmag.com
PC Computing http://www.zdnet.com/pccomp/
PC World http://www.pcworld.com/
Computer Network http://home.cnet.com/
Apple http://www.apple.com/
Links to scanner manufacturers and reviews of scanners
http://www.steves-digicams.com/digvideo.html#scanners

Umax http://www.umax.com
Hewlett-Packard http://www.hp.com
Microtek http://www.microtek.com
Mustek http://www.mustek.com/imaging/
Canon http://www.usa.canon.com/consumer
Minolta http://www.minolta.com/
Xerox http://www.xerox.com/
Mekel http://www.mekel.com/

Back to Top

Some Sources for Image Editing Software

Adobe Photoshop http://www.adobe.com/
Runs on Macintosh, Windows, and Unix. Seen as the industry standard. A very powerful program that takes some time to learn. Very high functionality, suited to both the creation and manipulation of images. Plug-ins available. Good color support, capable of performing batch processing. Requires large amounts of RAM.

Adobe Photodeluxe http://www.adobe.com/
Scaled-down version of Photoshop. Used for editing photographic images. Functionality and support nowhere near as comprehensive as Photoshop. Easier to use, and provides "Guided Activities" to assist the user. Can only open one image at a time.

Ulead PhotoImpact http://www.ulead.com/
Suited to both the creation and manipulation of images. Has tutorials and multi-level user interfaces (basic, intermediate, advanced). High functionality. Good color support, capable of performing batch processing. PhotoImpact Album allows you to create an album of thumbnails for viewing/inspection.

JASC Paint Shop Pro http://www.jasc.com/
Shareware for Windows platform. Includes most of the most common editing features, although does not have as powerful tools as the two programs above. Includes batch processing capability.

Micrografx Picture Publisher http://www.micrografx.com/
Suitable for both the creation and manipulation of images. Range of capability compares with Photoshop. Has several unique tools.

Corel Photo-Paint http://www.corel.com/
Suitable for both the creation and manipulation of images. Easy-to-use interface, usual high-end editing tools.

Macromedia xRes http://www.macromedia.com/
Runs on both Macintosh and Windows. Suitable for both the creation and manipulation of images. Provides the usual high-end editing tools. Supports many file formats and includes batch conversion capability. Lack of speed in normal editing mode, but has a large file mode that can edit files much larger than available RAM.

ImageMagick http://www.imagemagick.org/
Freeware. Supports the editing and manipulation of images. Runs on Macintosh, Windows and Unix. Supports popular file formats, has limited functionality. Provides tools such as resize, crop, rotate, sharpen, and color management.


Back to Top

Databases

Although these guidelines do not focus on database options, there are many types of databases available that are suitable for digitization projects, with varying degrees of complexity, cost, and power. Some options include:
 
Sprite (Perl) SWISH-Enhanced My SQL Microsoft Access
Filemaker Pro
Oracle
SyBase
Less
<<<<<<< (Complexity, Cost, Power) >>>>>>
More

Further Training

Many organizations around the country offer workshops and training on digital imaging, and many conferences are held each year addressing imaging issues.

The Northeast Document Conservation Center http://www.nedcc.org/ hosts a "School for Scanning" several times a year for digital project managers. Information on the school can be found on their website.

AMIGOS Bibliographic Council http://www.amigos.org/ gives several workshops a year on digital imaging to institutions in the Southwest and at conferences. Workshops can be requested on certain topics and in Colorado as well.

Museums and the Web Annual Conference: Hosted by Archives and Museum Informatics http://www.archimuse.com/.

Online publications such as RLG Diginews and D-Lib list conferences on digitization around the country.

In addition, professional organizations often host workshops and preconferences on issues related to digitization, such as ALA, ALCTS, AAM.

Staffing

In reality, many digital imaging projects will not have dedicated staff working on the project, but will utilize existing staff from other areas in the organization, student assistants, or volunteers. It may benefit the project to look at "transferable skills" that staff members may already possess that would be useful in any digitization project. Sufficient time for training, and opportunities to receive further education and training, should also be provided.

Digitization projects require a combination of skills from a variety of staff with different areas of expertise. The following areas and skills may be important to any digitization project:

  • Technical skills/staff
  • Project management skills
  • Database development and administration skills
  • Cataloging staff/skills
  • Computer programming skills
  • Web design skills
  • Subject matter specialists (curators, archivists, scholars, librarians, faculty, etc.)
  • Preservation background
  • Photography background
  • Artistic/graphic design skills
  • Interest in fine arts, music, art history, humanities, sciences, history--any subject matter that the digitization project may encompass

Digitization projects, by nature, require a team approach, and bring together different sets of skills from different areas of the library perhaps more than any other project. Administration, technical services staff, cataloging staff, the information technology department, subject specialists, curators, librarians, preservation and conservation staff, faculty, and others may all be involved.

Back to Top


Recommended Standards for Creating Digital Images

This document provides the minimum scanning recommendations we feel are necessary for responsible conversion and for achieving an acceptable level of image quality. These minimum recommendations are reliant upon technical capabilities and limitations at this current point in time. We recognize that all collections differ in the ways they are used and accessed and that institutions have differing purposes and clientele, which will likely have an impact on how and for what purposes and reasons collections are digitized. These are not hard and fast recommendations for every collection and every institution. As a rule, the key to quality scanning is not to scan at the highest resolution possible but to scan at a level that matches the informational content of the original.

Decisions on image quality and resolution should be based on the needs of users, how the images will be used, and the nature of the materials you are scanning (dimensions, color, tonal range, format, material type, etc.). The quality of the original (such as the quality of the shooting or processing technique in the case of photographs) also has an impact on the resolution at which you scan and the resulting quality of the digital image.

Master Image File

Many digital imaging projects scan a high-quality "master" or archival image and then derive multiple versions in smaller sizes or alternative formats for a variety of uses. There are compelling preservation, access, and economic reasons for creating an archival-quality digital master image: it provides an information-rich, unedited, research quality surrogate, and ensures rescanning will not be necessary in the future. A high-quality master image will make the investment in the image capture process worthwhile. Since user expectations and technology change over time, a digital master must be available and rich enough to accommodate future needs and applications. The master image should be the highest quality you can afford; it should not be edited or processed for any specific output; and it should be uncompressed. Intensive quality control should be applied in creating master image files.

Derivative Image Files

Derivative files are created from the master digital image, and are used in place of it, usually for general Web access. Derivative files typically include an access image, which is sized to fit within the screen of an average monitor; and a thumbnail image, which is usually quite small and hotlinked to the larger access image. Derivative files are usually stored online. Master image files are very large and expensive to store online. Consider whether you have the server space to store these files. An alternative is to store these files on CD or DVD.

We recommend that three versions of an image be created: a master image, an access image, and a thumbnail image. A higher resolution access image may be created depending on the need to detect detail in the image.
Master Image Access Image Thumbnail Image
  • Represents as closely as possible the information contained in the original
  • Uncompressed
  • Unedited
  • Serves as long term source for derivative files
  • Can serve as surrogate for the original
  • High quality
  • Very large file size
  • Used for creating high quality print reproductions
  • Usually stored in the TIFF file format
  • Used in place of master image for general web access
  • Generally fits within viewing area of average monitor
  • Reasonable file size for fast download time; does not require a fast network connection
  • Acceptable quality for general research
  • Compressed for speed of access
  • Usually stored in JPEG file format
  • A very small image usually presented with the bibliographic record
  • Designed to display quickly online; allows user to determine whether they want to view access image
  • Usually stored in GIF or JPEG file formats
  • Serves as long term source for derivative files
  • Not always suitable for images consisting primarily of text, musical scores, etc.; user cannot tell what content is at so small a scale

There are three types of scanning:

Bitonal – One bit per pixel representing black and white. Bitonal scanning is best suited to high-contrast documents such as printed text.

Grayscale – Multiple bits per pixel representing shades of gray. Grayscale is suited to continuous tone documents, such as black and white photographs.

Color - Multiple bits per pixel representing color. Color scanning is suited to documents with color information.

These three modes of scanning also require some subjective decisions. For example, a black and white typed document may have annotations in red ink. Although bitonal scanning is often used for typed documents, scanning in color may be preferable in this case, depending on how the image will be used. Manuscripts, older printed matter, and sheet music may be better served by scanning as continuous tone in grayscale or color to bring out the shade and condition of the paper and the marks inscribed on it.


Back to Top

Formats

Recommendations have been developed for the following formats:

  • Text: printed matter, photocopies, typed or laser printed documents, may include some line drawings, illustrations, manuscripts, music scores, and blueprints. We are also including black and white and color photographic prints in this category for scanning purposes.
  • Photographs: Negatives and transparencies.
  • Maps
  • Graphic Materials: Line drawings, artistic illustrations, lithographs, watercolors, etc.

We have included recommendations for tonal depth, file format, compression, and spatial resolution. Tonal depth will be determined by the nature of the material you are scanning as well as the functionality of the scanner/camera you are using. File formats suggested are non-proprietary and meet the industry standard; however, some alternative file formats are briefly addressed. Compression, a process that compresses images prior to storage and transmission in order to save space and time, will be both lossy or lossless, depending on the file format. Lossless compression results in a file similar to the original image, with no loss of information. In lossy compression, a certain amount of information is discarded during the compression process. Although the discarded information may be invisible to the human eye, a loss of quality occurs. Compression levels may vary from project to project. In general, we suggest that master image files remain uncompressed. Finally, we have included recommendations for spatial resolution. These recommendations are the minimum (or, in some cases, middle of the road) for resolution, and are expressed as either dpi (dots per inch) or as a pixel dimension. For all of the following formats, we cannot justify scanning at a resolution lower than 300 dpi, as the cost of storage has become so cheap.

Please see Appendix B for a quick reference table to the following formats.

Alternative format: PDF (Portable Document Format) from Adobe is an alternative file format for creating and displaying text-based files on the web. See http://www.adobe.com for further information. Requires Adobe Acrobat software to create and manipulate files. The Adobe Acrobat viewer is free to download so users can view documents on their computers.

Other considerations: Consider providing a transcription of textual materials, and, especially, of handwritten manuscripts that may be difficult to read. Transcriptions can be of tremendous help to researchers looking at a text. Transcribed text, especially when it is encoded with markup language, can greatly facilitate the researcher's ability to navigate and search long documents. There are several ways of presenting digital images of text. You may want to consider providing the text as an image file linked to a transcribed file, especially since the accuracy of OCR is not high. However, rekeying is labor-intensive. One option for OCR software is by the Caere Corporation at http://www.caere.com/. If you choose to keep the text as page images only, you could also create a table of contents in HTML and have it link to the individual page images, for easier navigation. Another option is to encode the text using a markup language, such as SGML, to enable the searchability of the text document. Participants in the American Memory Project at the Library of Congress, for example, use SGML in a DTD (Document Type Definition) based on the TEI (Text Encoding Initiative) Guidelines. Since SGML viewers are not yet freely available for viewing SGML over the Internet, an HTML version can be derived from the SGML version for widespread viewing online.
PHOTOGRAPHS
Master Image Tonal depth: 8-bit grayscale/24-bit color or greater
Format: TIFF
Compression: Uncompressed
Spatial Resolution: 3000 to 5000 pixels across the long dimension
Access Image Tonal depth: 8-bit grayscale/24-bit color
Format: JPEG
Compression: Depends; 7:1 - 10:1 for grayscale/10:1 - 20:1 for color
Spatial Resolution:
  • Resize image to 640 x 480 pixels
  • Or 1024 x 768 pixels
  • Or 1280 x 1024 pixels
  • Range from 1000 pixels to 5000 pixels across the long dimension for higher-resolution version
Thumbnail Image Tonal depth: 4-bit grayscale/8-bit color
Format: GIF (or JPEG)
Compression: Native to GIF format
Spatial Resolution: Resize original to 150 - 200 pixels (+/-) across the long dimension
72 dpi

 

Alternative formats: Many imaging projects are using the proprietary Kodak PhotoCD format for storing their photographic images. For more information on Kodak PhotoCD can be found at Kodak's site http://www.kodak.com/US/en/digital/products/photoCD.shtml and in an article in RLG Diginews, "Using Kodak PhotoCD for Preservation and Access," at http://www.rlg.org/preserv/diginews/diginews23.html#feature. Consider how to reprocess the image files into GIF/JPEG for direct Web access to the images.

Other emerging file formats include the Flashpix format, http://www.I3A.org/ sponsored by the International Imaging Industru Association. Flashpix is a technology that provides a multi-resolution, tiled file format that allows images to be stored at different resolutions for different purposes, such as editing or printing, in one file. To use Flashpix, you need the Openpix technology.

PNG, http://www.w3.org/Graphics/PNG/ or Portable Network Graphics, is an image format designed to replace the GIF format. It offers a smaller file size than GIF but does not lose any information to compression. It is not yet widely supported.

Other considerations: Photographs can present many scanning challenges. We recommend scanning from the negative (or the earliest generation of the photograph) to yield a higher-quality image. Another consideration is whether to scan sepia-tone photographs as color or black and white images. We recommend scanning them as color images to create a better image, although this will greatly increase the file size.

Another consideration with photographs is whether to scan the backs of photographs as separate image files if there is significant information on the back of the photo (which may be of interest to users) that may not be included elsewhere. If a scanned image of the verso of the photograph is available, the digital image may serve as a more successful surrogate for the original.
MAPS
Master Image Tonal depth: 8-bit grayscale/24-bit color or greater
Format: TIFF
Compression: Uncompressed
Spatial Resolution: 300 dpi
Access Image Tonal depth: 8-bit grayscale/24-bit color
Format: JPEG
Compression: Depends; 20:1
Spatial Resolution: 1200 pixels across the long dimension (large maps)
Resize image to 640 x 480 pixels (small maps)
Range from 1000 pixels to 5000 pixels across the long dimension for higher-resolution version
Thumbnail Image Tonal depth: 4-bit grayscale/8-bit color
Format: GIF (or JPEG)
Compression: Native to GIF format
Spatial Resolution: Resize original to 150 - 200 pixels (+/-) across the long dimension (if thumbnail is applicable)
72 dpi

 

Alternative formats: The MrSID (Multiresolution Seamless Image Database) format by LizardTech, Inc. http://www.lizardtech.com/products/mrsid/ allows for the compression, storage, and retrieval of large digital images. Files are stored in proprietary .sid format. The files are compressed with a "wavelet" compression algorithm that also provides a "zoom in" capability in the browser software, and provides little loss in image quality. LizardTech provides viewers to those who wish to download and manipulate .sid images, but the technology can be used to deliver a portion of the image requested as a standard JPEG, with no viewers required.



GRAPHIC MATERIALS
Master Image Tonal depth: 8-bit grayscale/24-bit color or greater
Format: TIFF
Compression: Uncompressed
Spatial Resolution: 3000 pixels across the long dimension or 300 dpi
Access Image Tonal depth: 8-bit grayscale/24-bit color
Format: JPEG
Compression: Depends; 7:1 - 10:1 grayscale/10:1 - 20:1 color
Spatial Resolution: 1200 pixels across the long dimension (large originals)
Resize image to 640 x 480 pixels (small originals)
Range from 1000 pixels to 5000 pixels across the long dimension for higher-resolution version
Thumbnail Image Tonal depth: 4-bit grayscale/8-bit color
Format: GIF (or JPEG)
Compression: Native to GIF format
Spatial Resolution: Resize original to 150 - 200 pixels (+/-) across the long dimension
72 dpi

 

Standards for artwork are not well defined. Usually artwork imaging projects involve scanning from photographic surrogates such as 35mm slides, in which case recommendations for transparent photographs should be followed. For large format artwork, outsourcing to a vendor with an overhead digital camera or large flatbed scanner suitable for scanning large documents is recommended.

If you do choose to distribute master images over the web for access by users, you may want to consider digital watermarking or some kind of copyright/ownership mark, possibly embedded in the image itself, as master image files are of a quality that can be used for commercial reproduction. The access and thumbnail files are for web display only, and are not of a quality suitable for reproduction.

Some links to digital watermarking information:

Digimarc http://www.digimarc.com/

Back to Top


Other Considerations

Quality Control

A quality control program should be conducted throughout all phases of the digital conversion process. Inspection of final digital image files should be incorporated into your project workflow. Typically, master image files are inspected via CD batch or online for a variety of defects. Depending on your project, you may want to inspect 100% of the master images, or 10% of the files randomly, for example. We do recommend that quality control procedures are implemented and documented and that you have clearly defined the specific defects that you find unacceptable in an image. Images should be inspected while viewing at a 1:1 pixel ratio or at 100% magnification or higher.

Quality is evaluated both subjectively by project staff (scanner operator, image editors, etc.) through visual inspection and objectively in the imaging software (such as using targets). The viewing environment for visual inspection of images is also important: monitors should be calibrated, and the room should be dark or at least free from bright lighting, sunlight, or glare.

Things to look for during visual inspection may include:

  • Image not the correct size
  • Image not the correct resolution
  • File name is incorrect
  • File format is incorrect
  • Image is in incorrect mode (i.e., color image has been scaled as grayscale)
  • Loss of detail in highlight or shadows
  • Excessive noise especially in dark areas or shadows
  • Overall too light or too dark
  • Uneven tonal values or flare
  • Lack of sharpness/Excessive sharpening
  • Pixellated
  • Presence of digital artifacts (such as very regular, straight lines across picture)
  • Moire patterns (wavy lines or swirls, usually found in areas where there are repeated patterns)
  • Image not cropped
  • Image not rotated or backwards
  • Image skewed or not centered
  • Incorrect color balance
  • Image dull or no tonal variation
  • Negative curve in the Look-Up Table
  • Clipping black and white values (in histogram)

Back to Top

CD Recording and Verification

CDs should be inspected to make sure image files open and display properly and that the correct batch has been recorded on the CD.

It is important to label each CD for ease of retrieval. If you use a felt-tip pen to label the CD, make sure it is water-based and does not contain alcohol, which can damage the protective layer of the disc. It is best to write information on the innermost, clear ring. Special adhesive labels are also available for labeling CDs, but the adhesive may have adverse affects on the CD over time. It is best to label only the jewel case or create an insert for it; however, it is easy for the CD to become separated from its case.

You may want to include on the CD or jewel case information such as: Name of your institution, name of collection, name of project or grant, a unique number for the disc, the beginning and ending file name on the disc, the file formats on the disc, the date the disc was created, the speed, brand, and model of the CD recorder, and relevant scanning information, such as the software used to scan the images, the brand and model of scanner used, and the resolution used to scan the images.

Keep an inventory of CDs and the files each one contains!

File Naming

You will need to consider the nomenclature you will use to name your files before starting the project. The file name must be a unique number that uniquely identifies the image. The file name may include the name of the collection or institution as well as the image number, plus the appropriate extension (.gif, .jpg, .tif). File names should be no longer than 8 characters and should not include spaces or symbols such as ?, /, or # (etc.).


Back to Top

Scanner and Monitor Calibration

Most scanner and image editing software provide a function for calibration of the scanner and/or monitor (including monitor brightness, contrast, and control of gamma settings). Scanner software is often used to match the tonal scale of what is being scanned, which may include black and white or color calibration. In general, scanner calibration should occur every time you scan a new media format or scan a new media size. Computer monitors can misrepresent the scanned image if not properly calibrated. Image characteristics, such as moire, wavy lines, dark or light spots, inaccurate resolution, etc. may be introduced if the monitor is not calibrated.

Suggested Monitor Settings:

  • Set to 24 millions of colors
  • Set Gamma at 2.2 (1.8 for Macintosh)
  • Color temperature at 6500° K
  • Calibrate to sRGB (Standard Default Color Space for Internet)

Monitors should be calibrated regularly. There is specific software you can purchase to calibrate your monitor.

The Western History Photodigitization Project at the Denver Public Library has a web page that describes how to calibrate a monitor for image viewing (on the user end). This will help users adjust the brightness and contrast of their computer monitor so that digital images will look their best. If computer monitors are adjusted to a target, the digital images (if scanned properly) should provide a reasonably accurate depiction of the originals when viewed on the "average" computer monitor.


Back to Top

Targets and Color Bars

Targets are used to verify the tone and color reproduction of the materials you are scanning and are also used to measure system resolution (targets are about the scanning system and the accuracy of the system to reproduce correct tonal values, not about the materials that you are scanning). Tone reproduction refers to the degree to which a digital image conveys the luminance ranges of the original. The ideal in tone reproduction is to match the brightnesses in the original with the brightnesses in the digital reproduction. This is not often achieved, since the digital image is different from the original, and viewing conditions are also different. What can be achieved, however, is an acceptable subjective tone reproduction that can give an approximation of the luminance ranges in the original. Targets provide a means of controlling tone reproduction. Targets are a way of predicting image quality, and help ensure that the scanning system you are using is producing the best quality image it can and is operating at a consistent level of quality over time. Different targets for prints and transparencies exist. Targets must consist of the same material as the media being scanned (paper, film, etc.) and quality assessments should be performed on targets each time the scanner is calibrated. Targets usually contain patches of color, black and white, or shades of gray for verifying tone reproduction. Some example of targets include the Kodak Color Separation Guide, Grayscale Control Bar, AIIM Scanner Test Chart, IEEE Standard Facsimile Test Chart, and the RIT Alphanumeric Resolution Test Object target. To ensure color fidelity from scanner to monitor, the use of color targets and proper calibration of the monitor is recommended. Some color targets are the Macbeth Color Checker Rendition Chart and the PostScript IT8 Color Output Target.

Some digitization projects are also scanning a color bar along with the original, to be included in the final digital image, to aid users in verifying accuracy in color reproduction.

Please see NARA's Guidelines for Digitizing Archival Materials and Kenney and Chapman's Digital Imaging for Libraries and Archives for a discussion of calibration and targets in more detail.


Back to Top

Storing Images

Proper storage will help ensure access to and long-term maintenance of image collections. Storage media consists of the materials on which the digital images are written as well as the devices that record, read, and process the information. Choices for the storage of your images will depend on the technical infrastructure you have in place; however, careful consideration of storage choices will help make the investment in image capture and equipment worth the cost, time, and labor.

The CDP recommends that you consider multiple storage media for your digital collections, including adequate backup storage (which may also include offsite storage in case of disaster). Other considerations for storage media and systems include: capacity of the medium (how much it can store); speed (how quickly images can be written, read, retrieved); reliability (stability and longevity of the media); security (risks of the medium, safeguards built into the medium to protect data); scalability (planned growth rates); and costs (purchase costs, housing costs, training, maintenance, costs of access, cost of migration, etc.).

There are several types of storage media available for online, offline, near-line, and archiving purposes:

  • Magnetic disks, such as hard drives and removable or external hard drives
    • Online storage of indexing data and access/thumbnail images
    • Advantages: high speed, declining costs
    • Disadvantages: limited storage capacity, rapid technological change means rapid outdating of current system
  • CD-ROM
    • Most often used for long-term storage of master images or in use at stand-alone viewing stations
    • Advantages: reading and writing of CDs conform to a standard (ISO 9660); relatively stable media; low cost; suited to multimedia applications
    • Disadvantages: life expectancy?; limited storage capacity; complex to network
  • Tape
    • Used most often for backup of archival masters
    • Advantages: low cost; relatively stable medium; high capacity and portability
    • Disadvantages: life expectancy? must be stored in proper environmental conditions or media will disintegrate; sequential access to data; slow access
If material is to be made accessible online, additional equipment may be needed to integrate storage media into networks. General network maintenance and support will also be necessary for any digital collections that will be accessed via the Web. Staff who are trained in network administration will be an essential part of digital projects and system support. The most common "model" for digital projects is to store master images offline on CDROM, and to make access and thumbnail versions available--24 hours--online. Master images may be viewed upon request, in person, off the CD.

Online storage: Refers to media that is access-ready. Retrieval is fast, often in seconds. Reliable medium for accessing information. Multiple users can access information simultaneously.
Problems: Authentication, security, reliability of data. Limited bandwidth. Network/website downtime.

Near-line storage: Refers to data that is accessed from a drive. Retrieval is fast, often in seconds; can be faster or slower than online storage. Retrieval can be slow if multiple users have requested the disc.
Provides more security and reliability, but more limited access.

Offline storage: Data stored on the shelf, must be retrieved by a person. Retrieval time can take minutes to hours. Low cost to store, more security and reliability, but limited access. Data not easily browseable.

Back to Top

Costs

It is difficult to predict just how much a digital imaging project is actually going to cost, and little hard data on the cost, cost effectiveness, and costs over time of digital projects is readily available. Generally, capture and conversion of data often comprises only 1/3 of the total costs, while cataloging, description, and indexing comprise 2/3 of the total costs. Upfront and ongoing costs can be significant, and economic advantage--and reality--may be better realized through collaborative initiatives or cooperative/regional digitization initiatives, where costs, resources, goals, and expertise can be shared. Initial investment in equipment, staff training, capture and conversion, handling, storing, and housing originals, producing derivative files, CD production, cataloging and building the image database system, and developing Web interfaces are all possible areas of cost for any digitization project. However, the costs of a project do not end after conversion. Some on-going costs that an institution must commit to include the costs of maintaining data and systems over time, including media migration costs and infrastructure costs.

© 1999 CDP

 

 

Last Updated: 2003-03-24