Digital Archives for Science and Engineering Resources
The American Society for Information Science and Technology (ASIS&T) hosted a summit at MIT this past weekend which brought together scientists, information technology specialists and librarians, among others, for a series of talks on developments in the preservation of digital information.
Some of the more visible initiatives include institutional repositories such as MIT’s DSpace (http://dspace.mit.edu), University of California’s eScholarship (http://repositories.cdlib.org/escholarship/), and Caltech’s Collection of Open Digital Archives (CODA) (http://library.caltech.edu/digital/default.htm). While some of these have attracted technical reports, working papers, pre- and postprints of faculty and departmental publications at these institutions, some have launched their own publications (eScholarship) and some aim to accept datasets and superseded courseware (DSpace). All have carefully thought out policies, insisting that the authors must hold the rights (not necessarily the copyright) to place the material on the institutional server.
Other novel ideas include systems for distributing educational resources, such as the SMETE Digital Library (Science, Math, Engineering, and Technology Education) (http://www.smete.org). It was suggested that if academics gained widespread visibility through the distribution of teaching and educational resources, it might provide an alternative to the current reward system based on scholarly publication.
Meanwhile, open research archives are faring quite well. BioMedCentral (http://www.biomedcentral.com) publishes close to 100 open access journals, 35 of them started by scientists (for more info on starting a journal, see http://www.biomedcentral.com/info/authors/startajournal); BMC has published more than 2000 articles in 2003 and had almost four million downloads from their web site. The journals are mirrored on NLM’s PubMedCentral and on servers in Germany and the Netherlands. Soon users will be able to syndicate content from BMC pages using RSS, as can be done already with the Scientist (http://www.biomedcentral.com/info/about/rss).
Another subject-based archive, Harvard’s Astrophysics Data System (ADS), http://ads.harvard.edu, contains complete scanned images from astronomy and astrophysics journals and conference proceedings and observatory publications (although most only as recent as 1996). In addition, there’s a search engine enabling one to enter in a citation and find the online paper (although this is governed by subscription access).
Other issues discussed at the summit included the question of linking datasets with journal content, the development of standards for encoding and retrieving archival materials, distributed and redundant content systems and software and risk management for digital data. The keynote speaker, Clifford Lynch, of the Coalition for Networked Information (CNI) (see: http://www.cni.org/staff/clifford_index.html), provoked much thought with the idea, among other things, that there may be non-human as well as human readers of all this intellectual output – currently, dumb ones, such as web-crawling spiders, but ultimately perhaps intelligent data-mining software agents. Presentations from the proceedings will eventually be available on the summit web site:
http://www.asis.org/Chapters/neasis/daser/