Ana səhifə

National Library of Medicine Recommendations on nlm digital Repository Software

Yüklə 1.4 Mb.
ölçüsü1.4 Mb.
1   2   3   4   5   6   7   8   9   ...   15

3.4. In-depth Testing of 3 Systems/Software

DSpace, DigiTool, and Fedora were selected as the top three systems to be tested and evaluated. Four subgroups of the Working Group (Access, Metadata and Standards, Preservation and Workflows, Technical Infrastructure) were formed to evaluate specific aspects of each system.

System testing preparation included:

  • Creating a staggered testing schedule to accommodate all three systems. 

  • Selecting simple and complex objects from the NLM collection lists.

  • Identifying additional tools that would be helpful in testing DSpace and Fedora (e.g. Manakin and Fez).

  • Developing test scenarios and plans for all four subgroups based on the functional requirements.

A Consolidated Digital Repository Test Plan was created based on the requirements enumerated in the NLM Digital Repository Policies and Functional Requirements Specification. The Test Plan contains 129 specific tests, and is represented in a spreadsheet.  Each test was allocated to one of the four subgroups, who were tasked to conduct that test on all three systems. 

DSpace 1.4.2, DigiTool 3.0, and Fedora 2.2/Fez 2 Release Candidate 1 were installed on NLM servers for extensive hands-on testing. OCCS conducted demonstrations and tutorials for DSpace and Fedora, and Ex Libris provided training on DigiTool, so that members could familiarize themselves with the functionalities of each system. The Consolidated Digital Repository Test Plan guided the testing and scoring of the three systems. Details of the testing are available in the next section.

4. Final Software Evaluation Results

The Technical Infrastructure, Access, Metadata and Standards, and Preservation and Workflows subgroups conducted the test plan elements allocated to their subgroup in the Consolidated Digital Repository Test Plan. Selecting from a capability/functionality scale of 0 to 3 (0=None, 1=Low, 2=Moderate, 3=High), the subgroups assigned scores to each element, indicating the extent to which the element was successfully demonstrated or documented. Scores were added up for each subgroup's set of test elements. A cumulative score for each system was calculated by totaling the four subgroup scores.

The Fedora platform and Fez interface were evaluated as a joint system.

4.1 Summary of Hands-on Evaluation




Fedora (w/Fez)

Technical Infrastructure








Metadata and Standards




Preservation and Workflows




Total Score




4.1.1. DSpace 1.4.2 Evaluation

See Appendix C for complete testing results. Technical Infrastructure, score=36

  • Data model well suited for academic faculty deposit of papers but does not easily accommodate other materials.

  • All bitstreams uniquely identified via handles and stored with checksums. 

  • Very limited relationships between bitstreams (html document can designate the primary bitstream, hiding the secondary files that make up a web page). 

  • Workflow limited to three steps. 

  • Dublin Core metadata required for ingest.  Other metadata can be accepted as a bitstream but would not be searchable.

  • Versioning of objects/bitstreams not supported. 

  • Some usage and inventory reporting built-in. 

  • DSpace uses the database to store content organization and metadata, as well as administrative data (user accounts, authorization, workflow status, etc). Access, score=40

  • User access controls are moderate, with authorizations logic restricting functions to admin users or authenticated users. 

  • Although objects can have text files associated as licenses, there is not application logic to make use of license data, and no built-in way to facilitate content embargoes/selective user access. 

  • Entire collections can be hidden to anonymous users, but metadata remains viewable. 

  • Audit history written to a cumulative log which must be parsed by scripts into human-readable formats, and metadata actions are only sparsely logged. 

  • External automated access to Dublin Core metadata via OAI-PMH. 

  • Content is searchable by Dublin Core metadata and full text. 

  • Files are listed in the order they were ingested and cannot be sorted. Metadata and Standards, score=16

  • Dublin Core metadata required for ingest. 

  • Other metadata can be accepted as a bitstream but would not be searchable. 

  • Metadata validation not possible. 

  • Exporting of objects as METS files, but METS not currently supported as an ingest format. Preservation and Workflows, score=42

  • Exported data can be re-ingested with a replace function.

  • Checksum checker can periodically monitor the bitstreams for integrity.

  • No normalization capability. 

  • No referential integrity checks. 

  • No tools for file migration.

  • Provenance for record updates is lacking. System support issues

  • Platform support: DSpace runs on Solaris, Linux, other UNIX, or Windows servers.  It is a Java application, and uses Apache Tomcat, Apache Ant, and other open source Java tools.  DSpace uses a relational database that can be Oracle, PostgreSQL, or MySQL. 

  • Deployment and maintenance: OCCS personnel installed several copies of DSpace on Windows computers for initial testing and demonstration.  OCCS then installed DSpace on an NLM Solaris server using an Oracle database for full testing and evaluation. DSpace is relatively simple to install and build, and has limited but adequate documentation. DSpace includes user interfaces for public access and repository administration; however, these interfaces are very plain, and difficult to customize.  Installation and usage problems can often be solved by asking for assistance from members of the DSpace community, by posting a request on the DSpace email list server.

  • Development and user organizations: DSpace has a very active user community and open source development community, with over 400 institutional users worldwide including NLM LHC for the SPER research project. DSpace was initially developed with support from MIT and HP.  In 2007, the DSpace Foundation was formed to continue development of the open source software and support its community. 

  • Future roadmap: Future plans for DSpace are not crystal clear, but there is good promise for continued development and community support:

    • A DSpace 2.0 architecture has been defined that will introduce major improvements to the tool, and development of these enhancements has already begun.

    • Plans are being made for significant collaboration with the Fedora Commons community, to address needs and functions that are common to these two tools.  Grant funding for planning joint activities has recently been obtained from the Andrew W. Mellon Foundation. User Visits/Calls

  • University of Michigan (May 14, 2008)

4.1.2. DigiTool 3.0 Evaluation

See Appendix D for complete testing results. Technical Infrastructure, score=51

  • Overall, the group was impressed with the broad range of tools and continued to discover new functionality, although the discovery was difficult at times.

  • The ingest process is one example of the difficulty the group experienced: understanding the use of the legacy Meditor and the web ingest tool and the difference between deposit and ingest. Ingest workflows seemed overly complex.

  • Certain challenges were a result of the NLM environment: the security lockdown, the Meditor installation, and ActiveX.

  • Quite a few tests were conducted. The group was particularly happy with the range of file types (DigiTool really shines in this area) and areas of metadata handling, especially in terms of METS.

  • Other positive aspects are the automatic format configurations and the support of relationships between digital entities (parent-child, for example).

  • Weak areas include lack of specific support for quality assurance and audit functionality and the overall system configuration management.

  • Standards support is good. Access, score=66

  • The group's evaluation considered staff users as well as end user needs and functionality.

  • Access features in both areas were pretty strong, in terms of granularity of permissions, access protocols (Z39.50, OAI-PMH, etc.), and the search results display.

  • The group would like to see more flexibility in search options, such as relevance ranking, proximity, and "more like this." Poor browsing features and no leveraging of authority control. The group recognizes many of these features are available via Primo and through some customization of Oracle.

  • Good faith effort towards Section 508 compliance is well-documented by the vendor.

  • Generally, the feeling is that DigiTool very strong in the access area. Metadata and Standards, score=27.5

  • Ingest of multiple format types is a feature the group likes.

  • The limitation to Dublin Core mapping is a hindrance.

  • The group would like to see more information on validation (for example, validation that a MeSH heading is MeSH).

  • Updating and adding metadata fields are easy.

  • The group did not see metadata checking for batch files, only individual files. Preservation and Workflows, score=45

  • DigiTool has many rich features, especially the use of METS extraction, JPEG 2000 thumbnail creation, and tagging master files in two ways.

  • The rollback feature is good.

  • Weak areas include the lack of confirmation for ingest and individual rather than batch ingest.

  • The group recognizes that most preservation functionality will be offered with the Ex Libris Digital Preservation System (DPS), currently in development. Many customers will continue using DigiTool and have no need for the enhanced preservation functionality that will be offered by the DPS. System support issues

  • Platform support: DigiTool runs on either a Solaris or Linux server, with an embedded Oracle database.  The Meditor administrative client software runs on a desktop PC.

  • Deployment and maintenance: Installation was performed by Ex Libris on an NLM Solaris server; the vendor will not allow the software to be installed by the user organization.  The installation requirements presented no particular difficulties, with the exception of the Meditor client software which required administrator privilege to install on user PCs.  Parts of the code base are very old, having been migrated from a legacy COBOL product.  Ex Libris provided detailed training on the use of the software, and was responsive in answering questions.

  • Development and user organizations: The DigiTool product development team is located in Israel, and is accessible via web conference and teleconference.  A separate team at Ex Libris is also developing a new repository product, the Digital Preservation System. Contacted users reported mixed experiences with DigiTool - a few are happy (e.g., Boston College), but others were disappointed and abandoned the product (e.g., University of Maryland, University of Tennessee, and Brandeis University).  A small but active user group exists. 

  • Future road map: Ex Libris recently indicated to NLM that DigiTool will cease to be an independent product, and will be reformulated as a module that can be optionally used with the new Ex Libris Digital Preservation System. These plans have not yet been publicly announced.

  • Security: OCCS conducted a web application security scan of DigiTool using IBM's AppScan scanning tool, and found 126 high-severity issues and 22 medium-severity issues.  The high-security issues included Cross-Site Scripting vulnerabilities and Blind SQL Injection vulnerabilities.  An additional 229 low-severity issues and information issues were detected by the scan.  Details are provided in the DRESWG Security Scan Results. User Visits/Calls

  • Boston College (May 2, 2008)

  • Oak Ridge National Library (May 7, 2008)

  • University of Tennessee, Knoxville (email exchange on DigiTool 3 beta testing in 2005; May 28, 2008)

  • Center for Jewish History and The Jewish Theological Seminary (May 30, 2008)

4.1.3. Fedora 2.2/Fez 2 Release Candidate 1 Evaluation

See Appendix E for complete testing results. Technical Infrastructure, score=Fedora: 40.5; Fez: 35.5; Combined Fedora/Fez maximum: 49.75

  • Fedora is very strong in the range of files that can be ingested, metadata requirements, versioning, relationships, and audit trails.

  • Fedora's web services-based interface to repository content makes it easy to integrate with external tools and custom front-ends.

  • Fedora is weak in workflow capabilities. Fez ranges from minimum to adequate in workflow capabilities.

  • Fedora provides good support for standards compliance: SOAP, OAI, Unicode, METS, PREMIS, etc.

  • One question is whether Fedora can catch transmission errors when a file is ingested from a directory, a function available in SPER. Fedora can compute a checksum and add it to the SIP, and it will verify checksums, but there appears to be a bug: the checksums always match. This problem should be fixed in version 3.0. Access, score=Combined Fedora/Fez: 52.5

  • Fedora provides great flexibility and granularity re: access controls at the user, collection, object, datastream and disseminator levels. The downside to this flexibility is that it requires custom policies to be written using a specialized markup - learning curve for the admin/developer staff.

  • Fez also has granular security options, including Active Directory integration.  The Group was not able to successfully test some of the access control logic.  A big downside to the administration of the controls is the need to multi-select values using the Ctrl key, making it very easy to accidently deselect values which may not even be visible to the user.

  • Fedora includes an OAI-PMH service which can provide the Dublin Core metadata associated with an object. This service could run (on Fedora) with a Fez implementation as well.

  • Fedora has a very basic default end-user interface but is extremely flexible in its ability to integrate with third-party front-ends. Fez offers a rich end-user UI including UTF8 character support, controlled keyword searching, and output into RSS.  Both systems do not adequately highlight a preferred version of an object over other versions also made visible to the end user.

  • Full text searching is available with both systems via a third-party indexing plug-in.

  • Fedora's disseminator approach offers much flexibility to content delivery, and Fez's inability to leverage the dissemination is a significant downside to the Fez product. Metadata and Standards, score=Fedora 40.75; Fez 33.75; Combined Fedora/Fez: 40.75

  • Most of the ratings assigned were 3s.

  • The most difficult aspect of Fedora is determining workflows.

  • Fedora conducts all the metadata checks that are needed.

  • Fedora is difficult to use, as is DigiTool; Fez is easier.

  • Fez uses only schemas, not DTDs.

  • Dublin Core, MODS, and so on can be used as long as they are built into the workflow.

  • MARC is ingested as a datastream.

  • Disseminator architecture and other Fedora data model features should enable NLM to implement metadata linkage or exchange between Fedora and Voyager. Preservation and Workflows, score=Fedora: 55; Fez: 41.5; Combined Fedora/Fez maximum: 56.5

  • Fedora provides a solid core set of preservation capabilities that can be extended with companion tools (e.g. JHOVE for technical metadata extraction).

  • Fedora/Fez does not create a physical AIP package but generates a FOXML/METS file that contains metadata and links to all datastreams during ingest.

  • Fedora assigns a PID and generates a checksum for each ingested datastream.

  • Fez can generate three different .jpg derivatives for each ingested image datastream. The subgroup was unable to test Fedora's disseminator.

  • GSearch (the Fedora Generic Search Service) may be implemented with Fedora to index all metadata captured in FOXML/METS but style sheets must be written to enable GSearch functionality.

  • Fedora allows data to be exported in three different ways: archive, migrate and public access but Fez has a very limited data export function.

  • Fedora/Fez provides ingest confirmation on screen but no summary statistics. The subgroup was unable to test mail notification functionality because the mail server was not set up.

  • The purge function in Fez does not delete an object from the repository. In Fedora, purging deletes an object.

  • Still have a need for workflows, if not for the software itself than for external business functions. System support issues

  • User interface: Fedora does not include a public web access user interface, so an external interface must be added. Options include open source tools designed for use with Fedora such as Fez and Muradora, or custom web pages developed in-house.  The Fez product restricts Fedora's flexibility in some key areas (access controls and content modeling) and appears to be more tightly integrated into Fedora than other front ends (which could be swapped out without touching the content or core services).  New versions of the Fez and Muradora tools are expected to be released in the next few months, and the Fedora Commons organization is now focusing attention on the Fedora community's need for a flexible user interface approach.

  • Search: Fedora includes an optional search component called GSearch that can search any metadata or text data in the repository.  Because of time limitations, only the more limited default Fedora search component was tested.  The full GSearch component should be implemented with Fedora.  Resource Index database for storing relationships among objects as semantic concepts for querying by discovery tools.

  • Platform support: Fedora runs on Solaris, Linux, other Unix, or Windows servers.  It is a Java application, and uses Apache Tomcat, Apache Ant, and other open source Java tools.  Fedora uses a relational database that can be Oracle, MySQL, PostgreSQL, McKoi, or others. 

  • Deployment and maintenance: OCCS personnel installed several copies of Fedora on Windows computers for initial testing and demonstration.  OCCS then installed Fedora on an NLM Solaris server using an Oracle database for full testing and evaluation.  Fedora is easy to install and is accompanied by clear and comprehensive documentation.  An installation script is provided that guides the installation and configuration process.  Fedora 2.2.2 was the production release version of the software when the NLM evaluation began, and was the version installed for testing.  During testing, Fedora 3.0 was released, a significant upgrade with new features and simplified code base.  NLM spoke with several Fedora users, and all plan to upgrade to version 3.0.  Fedora 3.0 should be used instead of earlier versions.

  • Development and user organizations: Fedora has an active user community, with more than 100 user institutions listed in the Fedora Commons Community Registry.  The first prototype of Fedora was begun in 1997, and the project was led for several years by University of Virginia and Cornell University with grant money obtained from the Andrew W. Mellon Foundation.  In 2007, Fedora Commons was incorporated as a non-profit organization, and received nearly $5 million in grant money from the Gordon and Betty Moore Foundation to continue development of the Fedora software, and to provide the resources needed to build a strong open source community.  Fedora Commons supports the user and developer community with an active project web site, a wiki, and several email lists. All source code is managed on SourceForge.  The Moore grant funds a leadership team, chief architect, lead developer, and several software developers.  Several dozen additional developers are actively involved in the community at user institutions.  Fedora is being used by leading institutions that have digital projects goals similar to NLM's.  The users NLM has contacted are enthusiastic and confident in their choice of Fedora.  They are building effective digital collections, and they can provide valuable advice and lessons-learned to NLM.  Fedora is built using technologies that OCCS is prepared to support, including Java, Tomcat, XML, and web services. 

  • Future roadmap: The Fedora Commons Technology Roadmap is published on the Fedora Commons web site, and defines the Fedora vision, goals, priorities, and five major projects, with detailed development plans and schedules.  Some projects are primarily directed by Fedora Commons, and others are collaborations with other open source projects.

  • Security: OCCS conducted a web application security scan of Fedora using IBM's AppScan scanning tool, and found 1 high-severity and 1 low-severity issues.  The high-security issue was a Cross-site scripting vulnerability. The remediation for this vulnerability is to filter out hazardous characters from user input.  This issue should be addressed in consultation with the Fedora Commons community leadership.  The AppScan tool provides detailed information about the vulnerability and the coding approach needed to correct it.  Additional details of the security scan are provided in the DRESWG Security Scan Results. User Visits/Calls

  • University of Maryland (August 7, 2007 Site Visit)

  • University of Virginia (Sept 11, 2008)

  • Indiana University (Sept 16, 2008)

  • Tufts University (Sept 17, 2008)

  • Rutgers University (Sept 18, 2008)

  • Presentation from Thornton Staples of Fedora Commons (Sept 29, 2008)

  • Yale University (Oct 3, 2008)

1   2   3   4   5   6   7   8   9   ...   15

Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur © 2016
rəhbərliyinə müraciət