Prepared by the
NLM Digital Repository Evaluation and Selection Working Group
Submitted December 2, 2008
1. Executive Summary 1
2. Introduction and Working Guidelines 2
2.1. Introduction 2
2.2. Working Guidelines 2
3. Project Methodology and Initial Software Evaluation Results 4
3.1 Project Timeline 4
3.2. Project Start: Preliminary Repository List 4
3.3. Qualitative Evaluation of 10 Systems/Software 4
3.4. In-depth Testing of 3 Systems/Software 7
4. Final Software Evaluation Results 8
4.1 Summary of Hands-on Evaluation 8
5. Recommendations 16
5.1. Recommendation to use Fedora and Conduct a Phase 1 Pilot 16
5.2. Phase 1 Pilot Recommendations 17
5.3. Phase 1 Pilot Resources Needed 18
5.4. Pilot Collections 20
Appendix A - Master Evaluation Criteria Used for Qualitative Evaluation of Initial 10 Systems 22
Appendix B - Results of Qualitative Evaluation of Initial 10 Systems 24
Appendix C – DSpace Testing Results 26
Appendix D – DigiTool Testing Results 40
Appendix E – Fedora Testing Results 52
The Digital Repository Evaluation and Selection Working Group recommends that NLM select Fedora as the core system for the NLM digital repository. Work should begin now on a pilot using four identified collections from NLM and the NIH Library. Most of these collections already have metadata and the NLM collections have associated files for loading into a repository.
The Working Group evaluated many options for repository software, both open source and commercial systems, based on the functional requirements that had been delineated by the earlier Digital Repository Working Group. The initial list of 10 potential systems/software was eventually whittled down to 3 top possibilities: two open source systems, DSpace and Fedora, and DigiTool, an Ex Libris product. The Working Group then installed each of these systems on a test server for extensive hands on testing. Each system was assigned a numeric rating based on how well it met the previously defined NLM functional requirements.
While none of the systems met all of NLM's requirements, Fedora (with the addition of a front end tool, Fez) scored the highest and has a strong technology roadmap that is aggressively advancing scalability, integration, interoperability, and semantic capabilities. The consensus opinion is that Fedora has an excellent underlying data model that gives NLM the flexibility to handle its near and long-term goals for acquisition and management of digital material.
Fedora is a low-risk choice because it is open-source software, so there are no software license fees, and it will provide NLM a good opportunity to gain experience in working with open source software. It is already being used by leading institutions that have digital project goals similar to NLM's, and these institutions are an active development community who can provide NLM with valuable advice and assistance. Digital assets ingested into Fedora can be easily exported, if NLM were to decide to take a different direction in the future.
Implementing an NLM digital repository will require a significant staffing investment for the Office of Computer and Communications Systems (OCCS) and Library Operations (LO). This effort should be considered a new NLM service, and staffing levels will need to be increased in some areas to support it. Fedora will require considerable customization. The pilot project will entail workflow development and selection of administrative and front end software tools which would be utilized with Fedora.
The environment regarding repositories and long term digital preservation is still very volatile. All three systems investigated by NLM have new versions being released in the next 12 months. In particular, Ex Libris is developing a new commercial tool that holds some promise, but will not be fully available until late 2009. The Working Group believes NLM must go forward now in implementing a repository; the practical experience gained from the recent testing and a pilot implementation would continue to serve NLM with any later efforts. After the pilot is completed, NLM can re-evaluate both Fedora and the repository software landscape.
2. Introduction and Working Guidelines
In order to fulfill the Library's mandate to collect, preserve and make accessible the scholarly and professional literature in the biomedical sciences, irrespective of format, the Library has deemed it essential to develop a robust infrastructure to manage a large amount of material in a variety of digital formats. A number of Library Operations program areas are in need of such a digital repository to support their existing digital collections and to expand the ability to manage a growing amount of digitized and born-digital resources.
In May 2007, the Associate Director for Library Operations approved the creation of the Digital Repository Evaluation and Selection Working Group (DRESWG) to evaluate commercial systems and open source software and select one (or combination of systems/software) for use as an NLM digital repository. The group commenced its work on June 12, 2007 and concluded its work December 2, 2008. Working Group members were: Diane Boehr (TSD/CAT), Brooke Dine (PSD/RWS), John Doyle (TSD/OC), Laurie Duquette (HMD/OC), Jenny Heiland (PSD/RWS), Felix Kong (PSD/PCM), Kathy Kwan (NCBI), Edward Luczak (OCCS), Jennifer Marill (TSD/OC), chair, Michael North (HMD/RBEM), Deborah Ozga (NIH Library) and John Rees (HMD/IA). Doron Shalvi (OCCS) joined the group in October 2007 to assist in the set up and testing of software.
The group's work followed that of the Digital Repository Working Group, which created functional requirements and identified key policy issues for an NLM digital repository to aid in building NLM's collection in the digital environment.
The methodology and results of the software testing are detailed in Sections 3-4 of this report. Section 5 provides the Working Group's recommendations for software selection and first steps needed to begin building the NLM digital repository.