THE
LIBRARY OF CONGRESS DIGITAL AUDIO PRESERVATION PROTOTYPING PROJECT*
Carl Fleischhauer
Project Coordinator
Office of Strategic Initiatives
Library of Congress
Introduction
The Digital Audio Preservation Prototyping Project was established
at the Library of Congress for several reasons. The underlying motive—not
always visible in our presentations—is that the time has come
to change our approach to reformatting recorded sound collections,
for reasons I will outline in a moment. The surface motive, the trigger
to action, is the planned move in 2005 by the Library’s Motion
Picture, Broadcasting, and Recorded Sound Division to a new facility
in Culpeper, Virginia. Substantial funding for the new National Audio-Visual
Conservation Center comes from David Woodley Packard (the son of David
Packard, co-founder of the Hewlett-Packard Corporation) and the Packard
Humanities Institute.
The project has embraced sample collections from two Library of Congress
divisions: the Motion Picture, Broadcasting, and Recorded Sound Division
(M/B/RS) and the American Folklife Center (AFC). In another talk at
the Sound Savings conference, the archivist Michael Taft described
the AFC Save Our Sounds effort, which is allied with the prototyping
effort. Overall, the project’s focus has been on reformatting
sound recordings, with an eye on moving into video. We want to reach
some useful conclusions next year, in time to apply the lessons in
the new building.
The prevalent practice for reformatting audio and video from the 1960s
and 1970s into the 1990s has been “copy to analog magnetic tape.” We
see four reasons to change. First, there is the matter of media life
expectancy. Magnetic tape (analog or digital) will not last as long as
the archetypal media used for reformatting: microfilm. Second, there is
the issue of quality loss as a result of making the copy.
Analog-to-analog copying introduces what is called generation loss.
This is tolerable with microfilm, when the time between re-reformatting
is long. But with audiotape the time between re-reformatting is
relatively short, and the adverse effects are troubling. Third, there
is the problem of device and media obsolescence. We are seeing a
virtual cessation of manufacturing of analog-tape media and analog-tape
recording devices. Finally, the digital era is here, and we need to
engage it, and not just to serve reformatting. The next generation of
content to reach our institutions will be digital to begin with, and
its preservation for the long term will depend upon techniques similar
or identical to those we establish to sustain digitally reformatted
content.
The prototyping project has also been motivated by the desire to model
new ways to provide access to researchers. The production of digital
masters makes it relatively efficient to produce service copies, e.g.,
compressed copies that can be accessed in our secure local area network
or streaming copies for the Web. At the Library of Congress, copyright
considerations and consideration of the prerogatives of folk communities
mean that we must limit access to much of our recorded sound collection,
i.e., many items cannot be placed on the public Web. But after the
collections have moved to Culpeper, we want our reformatted content
to continue to be accessible in reading rooms on Capitol Hill, and
the digital service copies that we place in the Library’s secure
storage systems will help us accomplish that goal. We are also exploring
ways to provide access more widely, perhaps to remote sites, legal
circumstances permitting.
One of the advantages of digital-file reformatting is the ability
to reproduce an entire object. For example, here is a description
of our digital reproduction of a sound recording made by the U.S.
Marine Corps in the Pacific during World War II. The 1945 original
was recorded on Amertape Recording Film, sprocketed 35mm film that
ran through a recorder that cut grooves in the surface. Within a year
or so, the Marine Corps copied the film to 16-inch transcription discs.
These have since deteriorated but they were used as the source for
our audio. (We hope to go back to the film at some point.) The digital
copy provides access not only to the audio but also to images of the
film box, the disc labels, and a content log sheet that had been packed
with the film. This virtual package is presented in an interface that
permits a researcher to play the audio, zoom in on the images, and
examine detailed technical metadata.
The preservation approach we are exploring has at its core a digital
object or information package that includes bitstreams,
i.e., the files that contain the audio and images, and metadata. These
packages will be managed in digital repositories, sophisticated
versions of the computer storage systems we are using today. CDs or
DVDs will not be used to store the content. It is worth saying that
content management—what happens inside the repository—has
at its heart a paradox. Digital content depends on specific information
technology systems to keep it alive and to render it for users. But
information technology systems are inherently obsolescent and will
be replaced in relatively short time periods and thus our content
must also be system independent. At any given moment, content lives
on this media—disks in this server, for example—and
is sustained by this information technology system, but the
content must transcend the lifespan of any given media and system.
Our preservation explorations have wrestled with four issues: selecting
the target format for reformatting, determining the quality of the
reformatted copy, shaping the information package and the importance
of metadata, and analyzing longevity in a “media-less” environment.
Selecting the Target Format
The
first issue concerns the choice of bitstream structure and file type.
This entails striking a balance between six factors:
-
Disclosure: Are specifications and tools for validating technical
integrity accessible to those creating and sustaining digital content?
Preservation depends upon understanding how the information is represented
as bits and bytes in digital files.
-
Adoption: Is this format already used by the primary creators,
disseminators, or users of information resources? If a format is widely
adopted, it is less likely to become obsolete rapidly, and tools for
migration and emulation are more likely to emerge.
-
Transparency: Is the digital representation or encoding
open to direct analysis with basic tools? Digital formats in which
the underlying information is represented simply and directly will
be easier to migrate to new formats. Encryption and compression inhibit
transparency.
-
Self-documentation: In part, this is about the package
inside the package. Does the file format include metadata that explains
how to render the data as usable information or understand its context?
Self-documenting formats are likely to be easier to sustain over long
periods and less vulnerable to catastrophe than ones that are separated
from key metadata.
-
“Fidelity” or support for high resolution: Does
the format “hold” high resolution audio?
-
Sound field support: Does the format represent stereo
and even surround sound?
What
formats have we selected? For our audio masters, our bitstream choice
is pulse code modulated (PCM) sampling, uncompressed. This is the
type of bitstream used on audio compact disks and it meets the transparency
test. The file format we use is WAVE, from Microsoft, and it meets
the adoption, disclosure, and fidelity tests. By the way, we feel
that the “PCM-ness” of the bitstream is more important than
the “WAVE-ness” of the file; Macintosh users put their PCM
bitstreams into AIFF files to equal effect. We have not yet begun
using what is called the Broadcast WAVE Format, which would get a
higher score on the self-documentation test than ordinary WAVE. Meanwhile,
we are curious about one-bit-deep formats like the DSD structure on
SONY’s Super Audio Compact Disk (SACD) but this bitstream structure
is not yet widely adopted. DSD also gets occasional negative write-ups
in the trade press, so we are taking a wait and see position. Since
our reformatting is limited to mono and stereo material for the moment,
we can afford to put off addressing the matter of surround sound.
For audio service files, we use WAVE at lower resolution and MP3 compressed
files.
For our image masters, our bitstream choice is bit-mapped or raster,
also uncompressed. The file format we use is TIFF, another industry
standard, originally from Aldus and now from Adobe. Here, the
“bit-mapped-ness” of the bitstream is more important than the
“TIFF-ness” of the file. For image service files, we use JPEGs and
expect to switch to JPEG 2000 after this new format has been more
widely adopted.
The Quality of the Reformatted Copy
The second central issue has been the subject of several interesting
and instructive discussions by LC staff working on the prototyping
project. Our talk revolved around questions like, “What does high
resolution mean?” and “Why should we seek it?” In the end, our
decision-making turned on some unexpected factors, some of which are
beyond the reach of science and objective measurement.
With sound, the analysis of resolution starts with considerations
of sampling frequency, measured in cycles or kilocycles per second.
Roughly speaking, digitizing audio means taking the analog waveform
and representing it as a large number of points or dots—connect
the dots and you have your waveform back. The more dots, the better
you can redraw your sound wave; the more dots, the better you can
represent the fine parts of the curve that represent high frequency
sounds. This parameter can be compared to spatial resolution for images.
A digital image consists of row upon row of picture elements, pixels
for short, often called “dots.” The higher the number of
pixels, the higher the spatial resolution.
The second key parameter is bit depth, which audio engineers sometimes
call “word length.” With audio, the more data you have for
each sample—the longer the word, so to say—the more accurate
the position of the sample in terms of amplitude. Greater bit depth
gives you a lower noise floor and lets you represent a greater dynamic
range, which can be especially helpful when transferring, say, field
recordings made in hard-to-control circumstances. Compact disks usually
have 16 bits (2 bytes) per sample, while many professional recording
systems offer 24 bits (3 bytes). The imaging analogy is that an image
24 bits per pixel can reproduce more colors than 8- or 16-bit representations
and thus offers the possibility of greater color fidelity.
Everyone is convinced that it is a good idea to digitize audio at 24
bits per sample. Keen ears can hear the difference and, although we
have not done so, one could exploit test signals to compare distortion
and noise. And it was in the discussion of bit depth that one of the
“unmeasurable” factors was articulated: “You want a cushion of extra
data,” the engineers said, “just to protect you when you copy items
with a wide or varying dynamic range, or to give you elbow room to fix
things later in the event that an operator doesn’t do a perfect job.”
I have heard an analogous argument regarding imaging, especially when
reformatting photographic negatives. The proposal is to make a preservation
master image with a “flat” (low contrast) contrast curve
and 12- or 16-bits-per-channel instead of the customary 8. Then a
future user could manipulate the image to restore it or for a desired
aesthetic effect, and resave it at 8 bits deep. The outcome of this
process would be an image with a full set of tones at the 8-bit depth,
i.e., the histogram for the new 8-bit image would be free of gaps.
In contrast, if you started with an 8-bit-per-channel master, manipulated
it, and then resaved the copy at 8 bits, the resulting copy image
would lack some tones, i.e., the histogram would have gaps.
In contrast to the consensus we reached regarding the desirability of
greater bit depth for sound recordings, our conversations about
sampling frequency revealed differences of opinion. Some of us on the
administrative side imagined that the starter question would be: “What
is the range of sound frequencies that we might expect in this original
item?” Our idea was that we would set the frequency range of the
digital copy to more or less match the frequency range inherent in the
original item. What frequencies have been captured, for example, on a
78 rpm disc from the acoustic era? From 8-10 kilocycles per second? The
usual rule for digital sampling is to work at twice the highest
frequency you want to reproduce. Therefore something on the order of 20
kilocycles per second should capture the full range of frequencies on
the original 78. Or to take another example, suppose a collector used
an analog Nagra tape recorder to record folk music at 7.5 ips with a
Neuman condenser microphone. What is the highest frequency tone that we
might expect to hear when the tape is played back? Most engineers would
say that such a recording system is not likely to capture much sound
with frequencies above 14 or 18 kilocycles per second. Thus if we
digitally sample at 44 or 48 kilocycles, we ought to capture the full
range of frequencies.
The engineers, however, did not want to work at 44 or 48 kilocycles, to
say nothing of 20. They advocated 96, with some eyeing 192. The
argument here—and really this argument takes both higher sampling
frequency and greater bit depth into account—largely concerns factors
that pertain to practical production matters or “downstream”
possibilities, and which are therefore not very susceptible to
objective testing. The following paraphrases capture some of the dialog:
-
“There may be hard-to-hear harmonics that you won’t want
to lose.”
-
“Copies with less noise and less distortion can more successfully
be restored when you come back to them.”
-
“In the future we’ll have even better enhancement tools
and post-processing, so save as much information as you can.”
-
“What if you need extra data to support certain types of resource
discovery?”
The
last bullet refers to what is called “low-level” features
by the working group developing MPEG-7, an emerging metadata standard
associated with moving image work. In the world of sound, low-level
data would be used to support “find this melody” queries,
for processes that produce transcripts of spoken word content, or
to support a system executing the famous query “find me more
like this one.”
Thus the analysis of the inherent fidelity of the original did not
provide the steering effect some of us had expected. Meanwhile, we
also made test copies at higher and lower resolution and asked, “Can
you hear the difference?” But these informal A-B tests also fell
short of being conclusive. One engineer has proposed carrying out
some empirical tests on post-processing actions to confirm the idea
that the restoration of a recording (e.g., careful cleanup for publication
as a CD) would be more successful if the master was at high rather
than moderate resolution. But we have not yet carried out such an
experiment.
The outcome is that our team generally works at the upper limit of
available technology. We produce most of our audio masters at 96 kilocycles
and 24-bit word length. At this time, we make two service copies:
first, a down-sampled WAVE file at compact-disc specifications: 44.1
kilocycles and 16-bit words, and second, an MP3 file that is very
handy in our local area network. Meanwhile, we produce images of accompanying
matter, like disc labels, tape boxes, and documents. The master images
are at 300 pixels per inch (ppi), with a tonal resolution of 24 bits
per pixel.
Our project development has highlighted two additional topics that
have to do with reproduction quality. The first has to do with practices,
including the use of professional equipment and professional workers.
On the equipment side, one key device is the analog-to-digital converter,
the device that actually samples the analog waveform and spits out
the bits. Professional converters are generally external to the computer
workstation (or digital audio workstation) and are superior to and
more costly than “pro-sumer” a-to-d devices, often installed
as a card in the desktop computer. We avoid cleanup tools when making
masters. And for mono discs in our collections, we copy using a stereo
cartridge to allow for future processes to “find the best groove
wall.”
On the human side, digitizing requires professional skills in both
the digital and analog realms. A professional worker must not only
be conversant with a-to-d convertors and workstations, but must also
be a master of the art and science of playing back originals to the
best effect, no mean task when you confront instantaneous discs, cylinders,
wires, and sticky tapes. In the new center at Culpeper, we see these
professionals as our supervisors, contract overseers, and as experts
who perform the most difficult work.
As we plan the future, we would like to include apprentice workers
in the team, as well as outsource certain types of material. We have
so many items in need of reformatting, that we are seeking ways to
increase efficiency. Elements that we hope will accomplish this include
sorting originals by “transfer efficiency” category, that
is, by putting groups together that have the same technical characteristics.
We would like to find and employ expert systems (automated tools)
to help us judge quality or at least spot anomalies to inspect later.
For some categories, we want to experiment with having a single operator
copy two or three items at once. I will note that some interesting
high-volume production tools are emerging from the PRESTO (Preservation
Technology for European Broadcast Archives) project organized by broadcasters
in Europe <http://presto.joanneum.ac.at/index.asp>.
At the same time, our team has been very interested to learn about
Carl Haber and Vitaliy Fadeyev’s cutting edge experiments at
the Lawrence Berkeley National Laboratory to use high-resolution imaging
to recover sound from discs and cylinders <http://www-cdf.lbl.gov/~av/>.
The second additional topic pertaining to high resolution concerns
the role of objective measurement. In imaging, this is related to
the use of targets and, in audio, to standardized sets of tones. The
outputs from targets or tone sets permit you to measure the performance
of the equipment used to produce an image or an audio file, and the
setup or adjustment of that equipment. The measurement of targets
and tones does not help you evaluate actual “content” images
or sounds directly.
In library and archival reformatting circles, the development of imaging
targets is farther along than practices for using audio tone sets.
I participated in an image-related contracting activity at the Library
in 1995 and, at that time, the appropriate targets, the availability
of measuring tools, and ideas about how to interpret the outcomes
were not at all mature. Recently, experts have wrestled with what
are called performance measures for digital imaging systems.
You can’t necessarily believe your scanner when it says 300 ppi,
we are told. Instead, we should measure what actually comes through
the system. For example, use modulation transfer function (MTF) as
a yardstick for delivered spatial resolution. But the process of implementing
performance measures for imaging has not yet reached its conclusion.
My impression is that the investigators working on this are not ready
to say what the MTF pass-fail points ought to be for, say, a system
used to digitally reproduce a typical 8x10-inch negative.
On the audio side, our work group has made sound recordings of the
standard ITU test sequences known as CCITT 0.33. There is one for
mono and one for stereo, and both are 28-second-long series of tones
developed to test satellite broadcast transmissions. With appropriate
measuring equipment, recordings of the tones can be used to determine
the frequency response, distortion, and signal-to-noise ratio produced
in a given recording system. We have looked at the numbers but we
are not yet ready to say where the pass-fail points ought to be for
the equipment we might use. The recording industry may have more sophisticated
or more appropriate performance measures, not well known in our circles,
but I am sure that those of us working on the problem in the archive
and library community will get smarter (or better informed) with time.
Shaping the Information Package and the Importance of Metadata
The third central issue concerns the information package, a complex
multipart entity. As noted earlier, the package’s data takes
the form of audio, video, or image bitstreams, while its metadata
represents a familiar trio from digital library planning: descriptive,
administrative, and structural.
In our prototyping project, our main descriptive metadata is for the
object as a whole, and is often a copy of a MARC (MAchine Readable
Cataloging) record in the Library of Congress central catalog. The
copy is massaged to create a MODS XML record <http://www.loc.gov/standards/mods/>.
But our complex objects often benefit from additional descriptive
metadata for individual parts, e.g., song titles, artists for disc
sides or cuts, or names associated with a particular element within
a digital package. The descriptive metadata for these elements are
encoded in what MODS calls related items, a kind of “record
within the record.” One type is called a constituent related
item, and this fits our case very nicely.
Our administrative metadata is extensive. For example, we include
a persistent identifier and ownership information, meant here not
in the copyright sense but rather to identify the party responsible
for managing this digital object. We include information about the
source item and any conservation treatment that may be applied, data
about the processes used to create the digital copy (sometimes called
digital provenance data), and technical details about the file
we have created. In the latter two categories, we have made use of
sets of data elements under discussion by working groups within the
Audio Engineering Society; our versions of these data sets are linked
from this Web page: <http://lcweb.loc.gov/rr/mopic/avprot/metsmenu2.html>.
Meanwhile, we have not made a big effort to collect true rights data
but we do categorize objects to permit access management at the Library.
Structural metadata records the relationships between parts of objects.
For example, when we reformat a long-playing record boxed set, we
produce sound files for all of the disc sides, as well as images of
the labels, the box, and the pages in the accompanying booklet. Thus
our digital reproduction will include several dozen files and these
are documented in the structural metadata. In the interface for end
users, this metadata supports the presentation of the package and
enables the user to navigate the various parts of the digital reproduction.
Although we have not implemented this in our prototyping, we know
that there is a need for an additional category of metadata to support
long-term preservation. This category is described in a helpful report
from the Research Library Group and OCLC (Online Computer Library
Center) titled Preservation Metadata and OAIS Information Model
<http://www.oclc.research/pmwg/>.
Examples of digital preservation metadata include “fixity” information,
e.g., checksums to monitor file changes; pointers to documentation for
file formats; and pointers to documentation of the environment required
to render files.
We are encoding all of the metadata using the emerging Metadata Encoding
and Transmission Standard (METS) <http://www.loc.gov/standards/mets/>.
We worry about the extent of metadata that we wish to capture and
count on the pressures of actual production to have a winnowing effect.
Meanwhile, it is critical to continue the development of tools to
automate the creation of metadata, especially administrative metadata.
Longevity in “Media-less” Environment
The
fourth central issue highlights the importance of keeping digital
copies, a need that rivals and may even surpass the need to make the copies in the first place. This is where the repository comes
in, a topic of discussion rather than a point of action for our prototyping
project. Regarding the repository, our project and the planning for
the new National Audio-Visual Conservation Center intersect with Library-wide
digital planning—including a repository—being carried out
by new National Digital Information Infrastructure Preservation Program
(NDIIPP) <http://www.digitalpreservation.gov/>.
We anticipate that the design for the Library’s repository will
be consistent with the important NASA Open Archival Information System
(OAIS) reference model, now an ISO standard <http://ssdoo.gsfc.nasa.gov/nost/isoas/>.
The OAIS model is the source of our packaging jargon. The model
articulates a content life cycle in which a producer sends a submission
information package to the repository, where it is ingested and
reshaped to make an archival information package, suitable
for long-term management. When an end user, called a consumer in the
model, requests a version of the object for viewing or listening,
the repository reshapes the content into a dissemination information
package for presentation. We anticipate that the center at Culpeper
will play the role of producer, preparing submission information packages
for the Library’s repository.
During this period when the Library's repository is under development,
we place our carefully named files in UNIX file systems established
in the Library’s storage area network. Although less sophisticated
than the planned repository, the storage area network has an active
backup system in place, a system that has sustained the eight million
files from our American Memory program for several years. We keep
planning improvements in our practices. For example, we now segregate
our masters and service files so that a higher level of protection
can be applied to the masters. For now, the METS metadata is stored
as individual XML files. In effect, we are storing virtual information
packages, “ready to submit.”
There is a policy implication here. Keeping digital content requires
a significant information technology infrastructure, meaning both
systems and people. That may be fine for larger organizations but
what about smaller or independent libraries and archives? Small sound
archives are clearly not in a position to mount this level of IT infrastructure.
What are they to do? There are two dimensions to this issue. Some
future-oriented discussions in the NDIIPP context have suggested that
there should be many libraries and archives—thought of as those
who organize, catalog, and provide access to content—served by
a few repositories—the keepers of the bits. This suggestion raised
follow-up questions: How might such a many-few structure be established?
Who would pay for what?
As these longer-term policy questions are being considered, there
are pressing questions for today. Is there a suitable holding action
for keeping digital content? For audio, would it be a good idea for
small archives to store their files on multiple CD-Rs or DVD-Rs, or
to write to data tape, as an interim solution? Ought one work in a
hybrid manner, digital and analog, in spite of the extra cost? There
are no authoritative answers for these difficult questions and this
has impaired our ability to provide our colleagues with definitive
answers.
*
Portions of this paper have been taken from a talk presented at the
2003 Preservation Conference at the National Archives and Records
Administration
<http://www.archives.gov/preservation/conferences/papers_2003/fleischauer.html>.
This paper represents work carried out for a federal government agency
and is not protected by copyright.