A vendor’s fixed unit price is determined by the following variables:
The number of pages that can be scanned per day will be a function of:
In his slides, Pence presented a hypothetical case study developing a quote for digitizing scientific volumes from the nineteenth century. The project consists of 6,300 pages, including 311 color maps. He used this to extract a cost figure of $12 per page for text pages, and $12 per page for maps, at a total of $16,332.
One metric that cannot be quantified is that vendors will take responsibility of a large number of potential project risk factors:
Pence then sated the curiosity of the audience by answering the question in his title: The 10 ways he suggested to spend $100,000 on digitization were:
In the question session, it was pointed out that even cheaper digitization can be accomplished by sending materials overseas for digitization (an approach taken by Innodata and exemplified by the Carnegie Mellon/Internet Archive Million Book Project). Costs can be reduced greatly if the project will accept disbinding, shipping overseas, and lower quality.
Another questioner suggested that the type of costing worksheets illustrated in this session might make a useful addition to subsequent editions of the NINCH Guide to Good Practice.
Peter B. Kaufman Digitizing History: University Presses and Libraries
Kaufman referenced Clifford Lynch’s analysis of the typical limits of scholarly publishing in the genres in which it is disseminated, and Lynch’s vision for a time and place where the institutional repository can serve as a complement or supplement—not a substitute—to traditional scholarly publication. Such a repository could capture and disseminate learning and teaching material, symposia, performances and related documentation of the life of universities. Many collaborators could be involved in such initiatives: libraries could join forces with local governments, historical societies, museums and archives, and members of the community, and public broadcasting might play a role. Kaufman also mentioned several new licensing and media opportunities available to libraries, universities, and museums that are likely to generate revenue.
Kaufman then presented three case studies of university projects that have used Innodata's expertise and that illustrated such shifting paradigms of scholarship and communication:
The University of Virginia requested support in transforming a rare and extensive medical history collection, the Philip S. Hench Walter Reed Yellow Fever Collection. He spent over fifteen years accumulating thousands of documents, photographs, miscellaneous printed materials, and artifacts to decipher the actual events involved in the U.S. Army Yellow Fever Commission work in Cuba at the turn of the 20th century. The archive consists of some 30,000 pages of manuscripts in English, Spanish, and French; technical pamphlets and books; newspapers; photographs; artifacts; and research. These were converted into digital form (XML) for the purposes of preservation and access.
The University of Illinois Press asked Innodata to convert digital files from the leading historical journals in XYAscii, Adobe Frame Maker, Pagemaker, and Quark file formats into a kind of TEI Lite. Journals included: the American Historical Review; the Journal of American History; Labour History; Law and History Review; The History Teacher; Western Historical Quarterly; and the William and Mary Quarterly.
At the University of North Carolina, an ongoing initiative involves the UNC libraries, press, school of information, school of journalism, faculty, the UNC-TV station, and WUNC, and is developing a service center on the model of Cornell’s or Reed College’s.
Presentation Slides: in Powerpoint
Chapman began by addressing the benefits of such long-term storage repositories. Benefits are related to risk management: long-term storage in a managed repository will provide an “insurance” against the following risk factors:
The costs of preservation are contextual – they will depend upon the owner’s definition of content integrity, as well as their tolerance for risk – both of which may change over time. Costs also depend upon the institutional mission of the owners: is long-term storage necessary, (e.g., for an agreed period of legal retention)? The attributes of original materials will affect the cost of digital preservation: for example, their complexity and quantity.
Other cost factors will depend upon the scope of services required and the preservation obligations of the owners: will there be a need to preserve bits (just the file) only; or to preserve file and associated metadata (e.g., related to intellectual property rights)? Do these need to be tracked and modified over time? Is the intention to preserve use (behaviors, and capability)? Or even to preserve faithful rendering? Deciding which of these aspects need to be preserved for the long term will affect cost.
Chapman stated that the attributes of the materials themselves should not matter in terms of cost, and referenced Kevin Ashley of the University of London Computer Centre, who has declared that digital preservation costs correlate to the range of preservation services that are on offer, not the attributes of the materials. That is, preservation costs will not be the same for identical materials held under different service level agreements at different archives. (See Resources)
Reiterating that the repository is the nucleus of preservation activity, Chapman stated that repositories will be required to ensure the longevity of digital materials. The majority of content owners will become consumers of centralized repository services, therefore repository storage costs — independent of costs for ingest or access — must be affordable or owners will withhold materials from deposit, running the risk of their loss. The price for storage should be what owners can afford to pay. He introduced his case studies: the Harvard Depository (centralized storage in a managed environment), and the OCLC Digital Archive, a new commercial service, emphasizing that an analysis of the use of existing traditional (analog) repositories is a relevant indicator of what owners will pay for managed storage of digital objects. Chapman presented a “snapshot” cost recovery billing model for each organization, which appears below:
Both use unit costs that have been priced to recover operational costs of actually managing a repository, as a repository provides more than storage. The OAIS model for digital preservation emphasized data management, archival storage and administration. A great deal of infrastructure is involved in managing data, and there are a lot of costs to recover.
Cost-recovery billing model for both: in each case, size is the metric of billing.
Digital Archive (“bit preservation”)
As consumers of such resources, we are aware that there will be a real “price” but that the “cost” to the end user is what interests us most.
Explaining the cost gaps between analog and digital, Chapman looked at some additional factors, and illustrated them with a series of slides. The costs examined are the costs of storing high-resolution master copies. The relatively low cost of storing ASCII files suggests that digital storage may become affordable.
Other reasons for the cost gap will include key institutional decisions made by each organization related to their business model, and pricing model. These policies will have an impact on business models, including decisions related to where the materials are actually stored – for example, to retain materials in uncontrolled local storage environments, such as library stacks, or to deposit them to managed repositories. Production choices will also affect business models – for example, preservation microfilming produces two copies that will have to be stored. Storage of uncompressed digital images of book pages can be up to 10 times more expensive then microfilm at current HD and OCLC prices. These costs do not factor in OCLC’s volume discount, whereas methods do exist to close the “cost gap” by negotiating and collaborating. Chapman stressed that the most important issue is that there are many variables and contexts that will affect costs. The quality of the files, for example, will greatly affect the cost of digital storage: uncompressed, 24-bit images will be much more expensive to store. Developers should work to close all cost gaps to make repository storage affordable. The cost gap for audio and video is much higher and therefore more significant.
There are other significant costs not included in the pricing model. Key curatorial decisions also matter, as does the issue of integrity of data – how much material can you afford not to keep? What quality is required, and what format (vis-à-vis use requirements). A level of risk must be assessed (e.g., is compression acceptable?), as will the extent of technical and administrative metadata required.
are still some open questions related to the issue. The decisions
taken by an institution are the key to determining costs.
Can institutions afford to digitize at the highest quality
technology allows, then keep the digital objects that result
from this strategy?
afford to keep all versions? Can they afford not to?
Carrie Bickner discussed the NYPL’s Digital Libraries program, and some of the pay-offs from which the institution has benefited as it has made the transition from individual digital projects to more structured digital programs. Most significantly, the infrastructure that has been developed now supports new projects and initiatives.
She described the process of building a team to support digitization initiatives, emphasizing the broad scope of expertise necessary. The Digital Libraries team has 23 staff, with a metadata team of 5 full-time people and some interns. The NYPL projects require a team well versed in all aspects of digitization and technology infrastructure, including support for systems such as Oracle databases and ColdFusion delivery mechanisms. NYPL has elected to use MRSid software to pan and zoom on high-resolution images, a system that can show a great deal of detail in the images. The digital imaging unit is in the NYPL, and most digitization is done locally. The Library also uses vendors for some initiatives, for example JJT has worked with NYPL both on- and offsite.
Bickner emphasized that the technology development should actively reflect and support library standards, and reiterated that metadata specialists were of key importance to the success of such initiatives. Now that the team is in place, and fully equipped, it is using the infrastructure that has been created to do projects that were not originally within the scope of this initiative. For example, equipment and staff are supporting a number of curatorial projects.
The major project is the NYPL Visual Archive, which was formerly known as ImageGate. The project dealt with over 600,000 images from the four research collections of the NYPL. The collection is comprised of many different types of visual materials, including printed ephemera, maps, postcards and woodprints. The project is moving from the Central Building to the Library for Performing Arts and the Schomburg Center. Bickner noted that it is often easier to move people than the materials. In this case, the project is working with glass-plate negatives of images from the performing arts - photos of actors, actresses, set designs, etc. from the 1920s to the 1960s. As glass plates can’t be used in the reading room, this will be the first opportunity for the public to view much of this content.
and team in place, the Digital Library team is starting work
on other projects (see <http://digital.nypl.org/forthcoming.html>),
which will include: American
Shores - Maps of the Middle Atlantic Region to 1850 and
The African American Migration Experience, which will include
both images from the archives and specially commissioned essays
on each of 13 phases of the African Diaspora – from the
early slave trade to recent Haitian experiences. This project
will include materials (some still in copyright) from other
repositories (e.g., Associate Press photo archive for the
1980s on the Haitian migration). The Library is still developing
rights management methodologies for such materials and a half-time
staff position is dedicated to clearing and managing copyright
for these materials.
Bickner raised an important aspect that similar projects will have to face in the future. She showed a page of Whitman's own copy of the 1860 Leaves of Grass, with his annotations for changes for the next edition. Writers today do the same - but with Microsoft Word, deleting previous versions. How will we electronically display the creative process?
In questions, Bickner clarified that most of this work is done by staff funded by “soft” money, i.e. grant funded positions. This is a concern as the team attempts to develop sustainable digital programs.
Tom Moritz Toward Sustainability - Margin and Mission in the Natural History Setting
Tom Moritz began by observing that natural history museums have unique challenges in creating digital collections. The American Museum of Natural History (AMNH) contains over 34 million natural history specimens and these objects, in turn, may have many associated pieces of information in various formats: how do we devise optimal, strategic solutions for the development of efficient, accessible digital programs, with such a mass and range of materials, especially when faced with the constraints on open access to information, dictated by the market, technology, law and norms (using Lawrence Lessig’s terms).
Before continuing with his theme of sustainability, Moritz briefly digressed into further consideration of what James Boyle has labeled "The Second Enclosure Movement," with graphs developed by Lessig in The Future of Ideas that show increased use of the term “intellectual property” and rampant growth in the concept of “ownership” of information. The Flexplay DVD, that self-destructs 36 hours after opening, is a graphic example of technology enabling this land grab.
As one response to this threat, Moritz described an open access, “common knowledge” project: “Building the Biodiversity Commons,” about which he had written in Dlib magazine. For more information about this project, see <http://www.dlib.org/dlib/june02/moritz/06moritz.html>.
Returning again to Lessig’s ideas on the constraints on open access (the market, technology, law and norm), Moritz raised the concepts of "Mission" and "Margin" as they relate to an organization like AMNH. While it is clear how all four Lessigian elements apply to commercial enterprises, how do they apply to a nonprofit, driven by a “non-commercial” mission? How does digital fit into a mission that was developed for an analog world? The mission of AMNH, as stated by the New York State Legislature in 1869, is to furnish “popular instruction.” This endorses the notion of freely available information. However, in difficult financial times there is pressure to generate revenue for all organizations and projects, and more objections to open and free access to digital content.
AMNH explores ideas for mission-consistent revenue generation
options, there are presently several potential and actual
sources of revenue that are generating funds for different
sectors of the organization:
Moritz moved towards a conclusion by asking what the core of natural history is and what are the discipline-specific objectives that can be supported by the digital library, and that will facilitate scholarship in this field? Strategic developments should be informed by close analysis of the requirements of academic research, and should provide a conceptual framework to provide integrated access to publications, archival records, field notes and specimens. Content should be both widely distributed, and strongly integrated. A 1998 article in Nature suggests that there are 3 billion specimens in 6,500 Natural History museums around the world. This variety of content – not merely specimens and artifacts, but also field notes, images, formal publications, exhibit labels, etc., requires careful cataloguing.
The Darwin Core (DwC) has been developed as a "discipline-based profile describing the minimum set of standards for search and retrieval of natural history collections and observation databases". But such solutions need to be efficient and parsimonious. The Semantic Web makes possible an ontologically-based solution applying formal, explicit specifications of a shared conceptualization of ”natural history”. Projects such as the AMNH digital collections related to the Congo begin to illustrate these ideas. See <http://diglib1.amnh.org> and <http://library.amnh.org/diglib/resources/index.html>.
A question raised a very important issue, inspired by the emphasis on the need for shared developments, natural history registries, protocols, etc. The community requires collaborative initiatives, but instead we are all in competition for private funding. The collaborative potential of our shared skills and talents needs to be addressed at the community level.
Steven Puglia Revisiting Costs
See Presentation slides: as pdf
Puglia’s presentation firmly focused on actual costs, and updated some of the data presented in his 1999 article: “The Cost of Digital Imaging” <http://www.rlg.org/preserv/diginews/diginews3-5.html#feature>.
He emphasized that there are many costs involved in digital imaging projects, of which scanning is only a part. Costs will be related to: the selection and preparation of originals; cataloging, description and indexing; preservation and conservation; production of intermediates; digitization; quality control of images & data; network infrastructure; on-going maintenance of images and of data.
Puglia’s examination of overall average costs stemmed from his experience on a number of grant review panels. But to make any decent comparative study he obviously needed access to a range of material – and that has not been easy. He had access to cost information from the National Archive’s Electronic Access Project but there were virtually no published reports on costs from other projects. Neither funders nor project managers have access to useful metrics on the cost of digitization to guide them. And when information is available, comparing it is next to impossible: most cost models are not sufficiently granular and there are lots of hidden costs, especially in the interstices. In order to validate a particular cost model, each step in the conversion process must be articulated in detail. Puglia emphasized that as costs vary so much, what is most important is their range, and he featured this in his presentation.
Overall, he noted that on average, roughly one third of the costs are related to digital conversion, one third for cataloging and descriptive metadata, and one third for administration, quality control, etc. In his 1999 article he quoted an average cost, over three years of data, of $29.55 per digital image (but with a range of between $1.85 and $96.45). Within that, itemized average costs come to $6.50 for digitizing; $9.25 for cataloging; and $13.40 for administration. Adjusted for unrealistically high or low costs, the figures came to $17.65 overall (digitizing $6.15; cataloging $7; and administration $10.10). [See presentation slide 5].
The Library of Congress National Digital Library Program originally planned to digitize 5 million digital images over 5 years for $60M, which would be $12/image – although at one point NDL had 85 people on staff, which would increase overall costs. On examining the NDL annual report for 2001, we can conclude that the project has actually produced 7.5M images, as some 25% have 2 images or versions, 25% have 3, and 25% have 4. Thus there are about 3M unique items or images in the NDL and the cost is really $20/image. This cost does not include the Ameritech collections, which are about 20% of the site, and it had $1.75M over three yrs. Partner institutions paid the rest of the cost - so actually the numbers are low because it doesn't include partner costs.
In figures from the NDL annual reports, Puglia showed that of the $43M grant for the NDL over 5 years, 46% went to personnel, 27% for digitizing & services, and 18% on professional and consulting services.
The Library of Congress reported, in the Report of the Task Force on the Artifact in Library Collections (CLIR, 2001), that it spent $1,600 per book, or $5.33 per page, for base level digitization. Enhanced digitization was $2,500, or 8.33 per page, but this figure was based on the costs reported in Puglia’s article.
Questia Media reportedly spent $125M to digitize 50,000 books, or $2,500 per book. Forbes Magazine (April 2, 2001) stated that it would take $80M, or $200-$1,000 to scan and proofread each book. In an article on Questia in the Chicago Tribune, the Questia CEO was quoted as saying that it took $100M and 2 years to get 40,000 books online and 20,000 in production, at a cost of $1,700-$2,500 per book (however, a commentator at the end of the session pointed out that Questia’s costs must factor in enormous marketing and advertising costs, which would bring down their overall cost per image).
A brief review of other data that Puglia had surfaced included:
Puglia asserted that ongoing costs are key and must be planned for from the beginning. Minimal maintenance of one set of master image files and access files will be 50-100% of the initial investment for the first ten years; larger repositories might be able to drop this to 10-25%. Cost to install, staff and maintain network infrastructure and digital data for 1st ten years is 5 times the initial investment. In the IT world, the full lifecycle cost is 10 times the development cost.
Retrospective digitization initiatives can only justify the maintenance of images that are actually used, and will need a rigorous cost benefit analysis to assess if this is worthwhile. This can be assessed by use – for example, NARA had 6.7M hits per month - 2.3M hits/month on the Exhibit Hall, 1/3 of all hits. 46,000 search sessions per month, 12 searches/session. This is compared to 6,400 onsite researchers, 35,000 oral inquiries and 31,000 written inquiries per month for over 20 facilities nationwide.
Jane Sledge Challenges in Storing Digital Images
See Jane Sledge's paper: download in MSWord
Jane Sledge illustrated the importance of developing workflows and methodologies for generating and storing high-definition images. Digital imaging is becoming an integral part of the National Museum of the American Indian's collection management and outreach activities, and is used for about 85% of photographic activities. Staff use digital cameras in their day-to-day work to prepare condition reports, take preliminary conservation images, prepare high definition images for exhibitions and publications, and document public programming events or generate images as part of public programming activities.
She focused on a specific collections documentation project in support of a key institutional objective: the digital imaging of NMAI’s collections as part of a move of these collections from the Research Branch in the Bronx to the Cultural Resources Center (CRC) in Maryland. The project created a visual documentary record of some 800,000 objects managed by 250,000 electronic records. If an object is lost, misplaced, stolen, or broken in transit, NMAI has a documentary image to show the object’s condition at the time of packing and prove that it was in the possession of the museum. The images enable staff to plan and organize both exhibit development and interactive exhibits planned for certain areas. Because all objects are digitally photographed as part of the move process, the overall cost of the imaging project - $2.5 million - is much lower than one driven by an “on demand” process. These costs include the costs of storing the images.
For each image, two sets of TIFF files are stored on DVD-RAM. One is stored at the Research Branch in the Bronx in a fire-proof safe, the second is sent to the Photo Services Department at the Cultural Resources Center in Maryland, then loaded to a Storage Area Network (SAN). A low resolution JPEG copy is also made. Staff send the JPEG file over the Smithsonian Institute network with the move system data. They also link to the Registration Information Tracking System (RITS) application.
NMAI faced an image storage challenge not yet tackled by most museums. Technology staff estimated that images generated by the move project would be in the order of 250,000 on 500 DVDs. A single NMAI TIFF image can range between 10 and 20 megabytes (MB) in size and one DVD can store about 10 gigabytes (GB) of images. Technology staff estimated that the TIFF images might require about 5 terabytes of storage space in an on-line environment. The economics of creating the images is one thing, but finding a sustainable economic framework for storing them is another matter. The options for holding the TIFF images in an on-line environment and linking these to electronic collections’ records to provide access were evaluated, and on the basis of this, NMAI acquired a relatively new technology known as a Storage Area Network (SAN) to store large volumes of data.
Sledge recounted how a sequence of errors caused a failure of the SAN, resulting in a serious interruption to the project‘s workflow. Some factors that led to the equipment failure included:
Network Administrator had training to operate the SAN in normal
situations, but had insufficient training to operate the SAN
in an emergency and had been given wrong advice on what to
do in the event of a problem.
Furthermore, Sledge explained that the system failure was compounded by a project workflow incorporating insufficient back up processes for high-definition imaging projects (e.g., re-using backup tapes). This was due to an over-reliance on the manufacturer’s claim that the system had built in fail proofs. Another lesson learned was the importance of understanding and reviewing the back-up plans and procedures in detail (NMAI has subsequently revised and upgraded its backup systems). Sledge noted that there is a need for pro-active risk management and planning at the outset of any digitization project, and that staff’s ability to deal effectively with problematic situations should be tested regularly.
In order to reconstruct the data, the project looked to their archival DVD’s, only to discover that DVD technology had changed since NMAI first began to store images on DVD-RAMS. Ultimately, NMAI developed a “workaround” solution to recreate the lost data. Michael Lesk (The Internet Archive) commented at the end of the panel session, that unlike a fire or the willful destruction of a library or archive, NMAI was fortunate in that it had multiple copies of the images in a diversity of media and could recover from this misfortune.
Based on their growing amount of electronic media use, NMAI will carefully consider on-line and near-line technologies, and consider tape storage for rarely used media. NMAI also recognizes the important of migrating digital media storage on a diversity of media. Photo Services staff work closely with Technology staff to review options and select new DVD technologies to create additional sets of Move TIFF images on DVD. NMAI has incorporated digital media management into its collections management policies and has established policies for the deposition of digital media into its archives.
since applied “Integrated Project Team Techniques”
to its overall Media Asset Management project and has staffed
a project team with a mix of program area and IT personnel
to recognize the complementary roles of project sponsors,
managers, decision makers, end-users, IT infrastructure system
engineers, and supporting organizations. In choosing to maintain
and store high-resolution images, NMAI is committed to professional
management, on-going staff training, timely equipment renewal
and maintenance, and strong back-up procedures.
See presentation slides: in Powerpoint
Stephenson described the efforts at the University of Michigan Library to support and expand local conversion efforts by supplementing base funding with revenue generating activity. Revenue generation methods range from straightforward fee-based services to some creative multi-institutional funding models for large projects.
Digital Conversion Services (DCS) is one of four units of the Digital Library Production Service (DLPS) in the Digital Library Services Division of the Library. It provides a variety of conversion services, including bitonal scanning, OCR, continuous tone image scanning and photography, and text encoding. Staff size varies according to the volume of work, with additional staff hired if grant funds are available.
DCS’s core work is digitization of the Library’s own collections. During lulls in its internal workload, DCS services (such as the provision of a full-time photographer/digital imaging technician) are available to other University units and non-profits, on a fee for service basis. These clients can take advantage of the group’s expertise and avoid the acquisition of costly equipment, and DCS can leverage its investment in staff, training and equipment. In many cases, DCS will also host content for clients and provide access through the DLPS federated image delivery system.
DCS has continued to grow its program around the assumptions that they can utilize excess capacity during slack times, leverage investments in specialized and expensive hardware and software, and offer the services of highly skilled technicians to their own and other institutions. In addition, they have been able to respond to special opportunities by adding staff tied directly to the revenue potential of those projects.
External clients have included Early Canadiana Online, the Library of Congress, Harvard, Northwestern, the ACLS History E-Book Project and the University of Chicago Press’s Bibliovault Project. DLPS is about to embark on a ten year project where it will provide OCR conversion for a projected 100-million page images from the Law Library Microform Consortium, to be put online using Michigan’s digital library software, DLXS. The target throughput is over 800,000 pages a month. DLPS also provides some project support for the Early English Books Online Text Creation Partnership or EEBO TCP, a collaboration between ProQuest, the University Libraries of Michigan and Oxford, and the partnership members.
Digital Conversion Services uses a variety of pricing models across these projects. The fee-based services are firmly grounded in the cost of doing business. For each service, DCS has an established recharge rate, based on a relatively standard formula.
Annual labor costs (salary + benefits) are added to the amortized cost of equipment and specialized hardware and software. This produces an annual cost for the digitization method. They then use an average hourly throughput figure (based on either a sample or actual data) multiplied by 1600 hours (the DCS figure for the number of working hours per year) to establish an annual throughput. By dividing the annual cost by the throughput, they arrive at a per-unit cost for each conversion method. For external customers, 30% overhead is added to the unit cost. Rates are refigured each year and submitted to the University’s Office of Financial Analysis for approval.
DCS is also exploring volume-sensitive pricing schemes for larger projects, and other pricing models, such as the partnership structure currently used to fund the EEBO-TCP.
Stevenson listed some of the challenges faced by Michigan in its efforts to explore new funding models for digital conversion, and as they “learn to be a vendor”:
One of the biggest drawbacks to organizing around even partial dependence on revenue is the uncertainty that comes with it, and the insecurity this can create for staff on short-term contracts.
On the other hand, providing conversion services for external customers can be rewarding, and there is a potential for real collaborative opportunities. Stevenson concluded by looking to models such as the UK’s Higher Education Digitization Service (HEDS), which has shown that the presence of a community mandate, the provision of adequate business support and the removal of at least some of the uncertainty might result in a more viable model—and certainly a “learning” model where customer and service provider might explore new methods together to achieve a better result. How such service centers might emerge in our decentralized environment and how they would be managed remains an open question.
Kate Wittenberg Sustainability Models for Online Scholarly Publishing
For presentation slides: see Powerpoint
Wittenberg focused on the issues involved in creating and sustaining a stable and effective scholarly publication.
First, she introduced some basic questions related to sustainability:
Wittenburg then listed four potential sources of revenue, and the implications of each:
1. Institutional subscriptions – this can be a good source of revenue, as long as a project is specific about the resource that is being charged for, and who should be charged. No-one will want to be seen charging third graders for electronic content, but it may be acceptable to charge their school, or school district. This model will require the support of marketing, billing and accounting staff.
2. Individual sales – in scholarly publication, this is not an easy route. Individual book sales are poor, and will not sustain a resource, so this should be seen a supplementary source of revenue only.
3. Foundation support – it is getting harder and harder to attract grant support, and relying on this source leads to what Tom Moritz identified as the “hamster wheel syndrome” – never being able to step off the grant writing treadmill for long enough to do anything else. Grant writing staff will be required to support this model.
4. Institutional support from the host institution - whether universities, schools, museums, etc. Projects will be strengthened if they are supported as a core part of the organizational infrastructure. This is becoming difficult given the present financial situation, and such arrangements usually have to be made at the very start of a project. Staff will be required for negotiation, billing, and accounting. Unfortunately, many institutions are rigidly organized and there is little connection between various programs and departments. It is extremely hard to get “interdisciplinary” projects under way, especially given the complex decision-making processes and necessary buy-in at libraries and universities. The key managers who are empowered to make these decisions often do not talk to one another.
Related issues concern the timing or launch of such resources – when can they be judged to be ready for release, sale, or new funding? What work has to be done before a business model can be developed? What editorial and technical development must be completed? No part of the project should be in an experimental phase when resources are launched. Furthermore, how will any project partners be involved in matters relating to revenues, collaboration, IP protection? Again, this will have to be thought out very early on in the project life cycle.
Decisions that must be made before a product can be launched include
There will also be several long-term questions. How will the business models suggested above affect a project’s technical or editorial development (for example, are advertisements acceptable within the resource? If so, from whom and where should they appear?) How will success be measured? How can you change the business model if required? Can the project lower the overall costs of doing business, such as by merging with another partner, or by outsourcing some aspects of the business plan to other places or people? The situation is constantly changing, and developing sustainable business models is an important and ongoing activity.
As CEO of the organization that co-sponsored the event, Jack Abuhoff offered some observations on the day’s presentations, and gave the audience his sense of some of the critically important points that had been made by earlier speakers:
1. Maria Bonn’s warning to watch out for “ramping-up” costs, which have the potential to derail budget predictions. We tend to base pricing estimates on “steady state” models, when what we need are more dynamic models that predict costs accurately. By paying attention to this issue at the start of a project (even if this will delay the actual project starting point) by document analysis and observing business processes, it will be possible to keep costs – and workflows – under control and prevent ramp-up costs.
2. Nancy Harm’s acknowledgement that Luna Imaging had made mistakes, which indicates that we should want to work with Luna or other vendors who acknowledge and learn from mistakes
3. Also from Harm’s presentation was the message that clearly defined project goals are critical. Project managers should not compromise or accelerate early planning processes.
4. As Steve Chapman pointed out, the community must adopt preservation strategies to enable subsequent users to work with digital resources in the same way that they would be able to continue to work with older, analog materials. This begs the question of whether or not we can afford to scan at a low resolution, or to make other compromises in the digitization life-cycle.
5. Another point made by Chapman was the need to guard against obsolescence – the need for “future proofing”. As technology develops, and costs for bandwidth, for example, decrease, we will see an increase in user demands of electronic information. Much of this will be driven by the emergence of new technologies like the semantic web, which will require changes in the structure of information. We will need our repositories to work in this new environment, and should not feel constrained by the limitations of today.
Finally, Abuhoff explored the metaphor of “home heating costs”. In doing an internet search for this term, Abuhoff had come up with several “hits” – from "99% efficient vent-free gas burners” to “oil burners”. To the consumer, the only concern when shopping for home heating is cost - seventy degrees of heat, from whatever source, will be good enough for the consumer. Digitization is not like this. There are qualitative considerations and benefits to the end user. Digital resources are not consumed immediately, so they will have to be future proofed. We should not approach digitization as buying fuel oil, where the cheapest is the most desirable. Vendors can help the client evaluate what they really want, and what quality they can truly afford.
Michael Lesk The Future is a Foreign Country
In addressing the question of how to pay for digital libraries, Lesk invoked Voltaire: “the best is the enemy of the good”. Doing some things really well makes then too expensive for many institutions. Lesk observed that in discussing prices, speakers at the symposium had presented a huge variety of prices for digitization. While it may seem reasonable to spend thousands of dollars to digitize an important cultural artifact like the Beowulf manuscript, how much should we expect to pay to digitize the books used for the Making of America project – books which, Lesk pointed out, would be of no interest to many used book stores. There has been little research on what users really need from digital resources, but some work has been done – Lesk cited some research by Michael Ester of Luna Imaging into the image resolution that is acceptable to users, and it is less than one would expect (see Michael Ester, "Image Quality and Viewer Perception," Leonardo, vol 23, no. 1, pp 51-63 (1990)).
He then cited the work of the Carnegie Mellon/Internet Archive's Million Book project, established with the mission of digitizing a very large body of content. Scanning is outsourced to India and China, where inexpensive scanning techniques will be used to produce a very low “cost per page”. The goal of this project is quantity, not quality, and this raises the issue of what users really need from digital resources. Lesk referenced the commonly held opinion that, after investing resources into a digitization project, one shouldn’t have to scan again in 5 years. He argued, however, that if there is a demand for higher quality scanning, the demand itself should help facilitate the necessary funding, and newer technology should makeit easier and cheaper (assuming the copyright situation hasn't deteriorated in the interim).
Lesk returned to an earlier theme, introduced by Don Waters, of being able to assess the “benefit” of doing something, as well as its cost. Having built the best analog libraries in the world, how can we now develop the best digital library systems? How will it be possible to make the systems work, and work with each other? And what will be the cost to smaller libraries if large research universities are able to digitize their entire library collection and put them online? Will a smaller institution still need to have a library to become accredited? Will it be worth maintaining small libraries if large research collections are available online in their entirety? And what are the economics of this? We are now able to have services on the desktop that, until very recently, were only obtained by physically going into a library. What is the cost to the library of offering this sort of service online at no charge to the user? And what is the saving to the institution of no longer having to provide other traditional services?
Answers to questions of this nature can be found in addressing the way people work with analog resources, and the benefits of traditional libraries. Overall, we need to understand users and the patterns of use in order to gain the greatest benefits from our future electronic resources.
Ending though with a demonstration of the critical importance of the library, Lesk cited the story of Sir Alexander Fleming and the discovery of penicillin. Fleming (a doctor) first discovered that some substance from the mould Penicillium killed bacteria in 1928, and wrote a paper about the substance, hoping for help from a biochemist. But little happened for over a decade. Prompted by the Second World War to look for antibacterial agents, Sir Ernst Chain, a researcher at Oxford, found Fleming's 10-year-old paper in the British Journal of Experimental Pathology. This discovery in the stacks led Chain and Lord Howard Florey to test and then exploit the first modern antibiotic, to the great benefit of medicine and humanity; Chain, Florey, and Fleming shared the 1945 Nobel Prize. Libraries let us accumulate wisdom for later use; this must be preserved in the digital library of the future.