Update: The report is currently undergoing revisions based on the comments received (thanks to those of you who took the time to review the report!); a final version will be made available soon.
Geneva Henry, executive director of the Center for Digital Scholarship at Rice University, has put out an open call for comments on a recently published draft report titled “Infrastructure Considerations for Large Digital Libraries: A study to support the technical infrastructure decisions for the Digital Public Library of America.” This report is part of the June 2011 Andrew Mellon-funded planning grant to the Council on Library and Information Resources’ Digital Library Federation (CLIR-DLF).
Taking currently functional, large-scale, non-commercial digital libraries as a frame of reference, the report seeks to understand different system architectures, content types, storage and content delivery mechanisms, and metadata formats in order to help support the DPLA’s technical infrastructure decisions. The study is divided into six parts along these lines and focuses primarily, but not exclusively, on mass digitization projects and large digital libraries with “interesting resource management approaches.” The Open Content Alliance/Internet Archive, California Digital Library, HathiTrust, Europeana, the National Science Digital Library (NSDL), and Networked Infrastructure for Nineteenth-Century Electronic Scholarship (NINES) are featured prominently in the report.
The study looks at the many issues surrounding the decision of whether to host content centrally or to provide federated search access to content spread across many repositories. It evaluates storage systems such as clustered storage and the possible need for a separate content streaming infrastructure. It also presents different models of metadata representation and management such as flat, relational, and RDF triples, and different approaches to metadata collection, including mapping services and crawling approaches. After storage, metadata, and harvesting options are examined, the report then looks at various search options such as Lucene, Solr and Z39.50, and concerns related to system architecture and long-term sustainability.
Overall, the report strongly advises the use of open standards and the development of scalable, modular architectures that are founded in Service Oriented Architecture (SOA) principles. It stresses the need for an early and ongoing evaluation of the sustainability of both the technical systems and the organization as a whole in order to create a successful digital library that users will feel they can trust.
Geneva requests that all feedback and comments be submitted via the Technical Workstream listserv or directly to her (email@example.com) by February 17, 2012.