April 8, 2003, New York City
More than 200 people attended a NINCH meeting about costs
of digital conversion at the New York Public Library on April
8, 2003; it mixed some practical experiences with commercial
sales pitches and searches for funding.
reported for digitizing a book ranged from $4-5 up to more
than $1000; too many of the speakers were so focussed on quality
as to make me despair that they will find funding for their
projects. Key points: (1) prices are too high, we need the
Henry Ford of digitization; (2) nobody knows values, and we're
not convincing the funders that information is important.
excellent introduction tried to balance cost and benefit.
In the non-commercial world of most libraries, people don't
understand the three numbers: cost, price and value. Cost
is what you spend to make a product; price is what you sell
it for; value is what it is worth to the customer. If cost
is above price, you're losing money; if price is above value,
you won't have any sales.
often fail to monetize or recognize some indirect costs (e.g.,
most university libraries don't pay rent for their building)
and so a library may not realize, in deciding whether or not
to buy an electronic publication in place of a paper one,
the extent of shelving and cataloging costs that are saved
by going electronic. On the other hand, few in this area measure
the benefit of information, so that it isn't usually possible
to put a numerical value on delivering information to the
desktop instead of the library reading room.
the value of information has proven very difficult; Don King
has spent his career doing this almost as a lone voice in
the wilderness. Is the distribution of obscure 19th-century
novels by the University of Virginia free E-text center an
example of the rich resources electronics can make available,
or a variant of Gresham's Law in which free books are displacing
mentioned a 19th-century example of a project to provide cheap
books for the working class that got attacked either for distributing
dangerous ideas or for distributing no ideas at all. Innovators
will be familiar with this: whenever you try something new,
some will object that it might not work and others that it
might. In digitization, our worst problem may be that we can't
tell, at least to the extent that in a world run by bean counters
we can not put a cash value on the information we make available.
actual price quotes for digitization projects in the next
session. Maria Bonn gave an excellent talk on the University
of Michigan's service center; they run some 20-27 cents per
page ($60/book, say) of which 13 cents is scanning and the
rest overhead, selection, and processing. The next two talks
were sales pitches from Luna Imaging, which does photographic
and other non-textual materials as a rule, running from $4
for a 35mm slide to $60 for a larger item, and from the Systems
Integration Group, whose example priced text pages at about
$2 each and maps at $12. All speakers emphasized quality,
and planning. You can get quotes down to 4 cents a page or
$10/book if you're willing to disbind, you are willing to
ship to a lower-cost country, and you're not so fussy about
the process and quality. All speakers said that you should
think about the your whole project and not just scan things
without knowing why.
Chapman gave a good talk about the costs of paper vs. digital
repositories, using OCLC's Digital Archive and the Harvard
Depository as examples. Harvard charges about $4/sq ft for
standard space, so that 2202 volumes would cost $689; similarly
Iron Mountain charges $6 for a box about 1 cubic foot which
would also hold about 10 books, I think, so Harvard is somewhat
cheaper but in the same range. OCLC seemed extremely expensive
to me, with charges of $60/GB for up to 100 GB and still $15/GB
for more than a terabyte; this when you can buy a 100-GB disk
drive for under $100. There are not many competitors yet,
however, and I can't find a comparable price quote. At the
prices he sees, digital ascii is cheaper than paper but image
is more expensive; at the prices I see, everything is cheaper
Bickner talked about the visual archives at NYPL itself; it
was very interesting material but relies on temporary soft
money, as do many digitization projects. All too often libraries
perceive digital information as important, but are reluctant
to spend any core budget on it.
Puglia gave the most numerical talk, full of the actual prices
quoted to NARA and in proposals he's reviewed. Perhaps most
interesting was that his projects broke down typically with
1/3 of the cost on digitization, 1/3 on cataloging, description,
and indexing, and 1/3 on administrative costs, quality control,
overhead and the like. In order to reduce the costs of the
projects you need not only to work on scanning process flow,
but also on the other categories of spending. To scan a book,
in his examples, ranged from perhaps $75/book at the low end,
up to $2500 at Questia. (The Questia example, computed by
observing that they had spent $125M and digitized 50,000 books,
was challenged with a claim that $90M or so of the $125M had
been spent on advertising and other such activities).
that all of his numbers were high and reflected (a) vendors
believing that with the Federal budget behind it, NARA and
LC can be charged anything they want, plus (b) excessive quality
specs on the part of the buying organizations. I was glad
that Steve gave comparisons of the online delivery of information
vs. the traditional services; at NARA far more use is made
of the website than of the reading rooms, and the same is
true of many other groups, e.g. LC.
gave an excellent talk explaining that digital information
now includes things like museum specimens, not just traditional
journals and monographs. He discussed revenue sources, but
had no numbers for them.
of the National Museum of the American Indian had a sad story
of a failed RAID drive, which held images of their 800,000
items; but in the end, they still had two sets of DVDs, and
lost only time and money. The British Library still has catalog
entries marked "destroyed in 1940" (when a German
bomb fell on the library), by contrast.
Stephenson talked about getting more revenue by selling digitization
services to others, which is mostly internal reshuffle. Neil
Smith at the British Library once pointed out to me that if
libraries only sell things to each other, they might as well
just digitize their stuff and give it away free; there's a
need to get more money into the system from outside.
of Columbia talked about planning projects in the context
of a digital publication to be sold to readers, but didn't
have numbers we could look at. I note that the ACM digital
library is doing fairly well, with some 30-40,000 readers
paying an individual rate of about $100 per year.
head of Innodata, whom we thank for funding the symposium,
emphasized the need to plan, to preserve, and to expect changes
in the future. Again, though, I fear an emphasis on planning
and quality will make projects unaffordable, especially in
by quoting Voltaire, "the best is the enemy of the good,"
and urging people to go for more material at lower costs and
quality levels. I also think we urgently need help demonstrating
why we need these projects. Institutions don't quantify the
value of new information and fear that it is used by those
outside their community; we may need a new definition of community.
I'm amazed at how much is being done despite high costs and
no estimates of value. I can only hope that if we can make
progress on those issues, we could get even more done; and
at least the costs should decline as technology continues
to improve. Would that economics improved at all, let alone
at the same rate.