National Library of Medicine Recommendations on nlm digital Repository Software

5.3. Phase 1 Pilot Resources Needed

The following summarized resources are estimated for the phase 1 pilot. Additional resource needs may be identified during the pilot and may be dependent on the collection(s) to be implemented.

5.3.1. LO

  • .8 FTE Project Manager and Analyst. Develops phase 1 pilot plan including scope, schedule and deliverables. Tracks changes to requirements and monitors project progress. Provides technical input and oversight of all major functional areas.

  • .5 FTE Metadata Specialist

  • 2.1 FTE Analyst

  • All the above to perform the following:

    • Analyze and develop workflows for various ingest and process models. (Refers to both single-file and batch mode).

    • Determine metadata schema(s) and element requirements for technical and descriptive metadata.

    • Define user community and access permissions. Develop specifications, specify requirements for interfaces with other internal systems and assist in developing integration plans for identified tools.

    • Develop specifications for management, preservation, and statistical reports including access methods, file formats, and delivery options.

    • Define data requirements including file formats, directory structure and information package for ingest.

    • Develop QA checklists for automatic and manual processes including data integrity checks and file format identification, validation and characterization.

    • Specify automatically generated error/confirmation/summary reports. (Refers to master, derivative and metadata files). Define derivative requirements.

    • Develop preservation plan including master file management, integrity checks, backup plan, file migration, etc.

  • .5 FTE User Interface Analyst. Takes lead in designing staff and public web interfaces, including search options and viewing capabilities. Insures that usability testing, performance analysis, and 508 compliance are conducted according to NLM guidelines and standards. Additional guidelines may need to be developed depending on user needs for repository collections and formats.

5.3.2. OCCS

  • 1 FTE Systems Architect/Analyst/Engineering Project Manager. Responsible for working with LO on implementation specifications, advising on technical options, tracking development progress, providing status updates, coordinating implementation efforts among different OCCS groups, building development team, etc. Performs analysis of open source and commercial software tools, including discussions with users, community members, and vendors.

  • 1 FTE Software Engineer/Programmer. Responsible for installing, developing and testing programs and scripts. Provides overview and demonstrates new tools. Implements and tests integration of new and existing tools.

  • .3 FTE Web Developer/User Interface Specialist. Primary responsibility for public interface design and programming. Works with User Interface Analyst on designing usable administrative/staff interfaces.

  • Systems Engineer responsible for server preparation, network setup, system software configuration, etc.

  • Database Administrator responsible for database configuration and administration.

5.4. Pilot Collections

The Working Group recommends the following digital collections as pilots for the repository in order to gain early implementation experience with many of the key capabilities of the selected NLM digital repository software. The files and metadata needed for the proposed collections are already available or can be compiled without significant effort. The Working Group recommends a variety of collection and file types be selected.

5.4.1. Cholera Monographs

HMD/RBEM and PSD/PCM have already scanned over 400 English language monographs in the collection relating to cholera dating from 1830 to 1890.  HMD has already loaded many of the files online on a web site called Cholera Online, but the site is not searchable, except as part of the general NLM web search. Many of the PDFs are too large to download easily without a high speed connection. LO has high resolution tiff files with high quality technical metadata and METS/ALTO packages, of which the NLM digital repository should be able to use. Descriptive metadata for the materials already exists in Voyager.  The Working Group would like to see a page turner installed for easy viewing of the materials in an online book-like format.

5.4.2. Digitized Motion Pictures

HMD has digitized a number of its historical audiovisuals for preservation and access purposes, and those created by the government are in the public domain. Metadata for these historical films already exists in Voyager. The Working Group proposes that as a pilot project, LO attempt to load about ten of these historical audiovisuals into the NLM digital repository. NLM may need to gain a waiver to post material in the NLM digital repository that are not 508 compliant; in the case of digitized motion pictures, this would require expensive closed captioning of any films put into the NLM digital repository.

5.4.3. Image Files from Historical Anatomies on the Web

HMD has selected and digitized over 500 images from important historical anatomical atlases in the collection and put them onto the web site, Historical Anatomies on the Web. The images are not searchable, however, by subject, artist, or author.  Metadata does not exist for these individual images, so the Working Group proposes to add about 50 of the images from two of the most famous atlases (Vesalius' _De Fabrica_ and Albinus' _Tabulae sceletai_) in order to allow the pilot team to learn how to handle image files and enter metadata into the system.

5.4.4. NIH Institute Annual Reports (jointly with NIH Library)

Each year NIH Institutes and Centers issue annual reports, documents that provide historical perspective on research activities. Annual reports consist of a list of investigators for each research project and a project summary. More detail may be provided through individual project reports, which describe research objectives, methods, major findings, and resultant publications. In the mid-1990s, digital copies of many of the reports began to appear on Institute and Center web sites. Since 1998, intramural reports also have been submitted to the NIH Intramural Database for searching and viewing by NIH staff and the public (see NIDB Resources at The NIH Library maintains a collection of older print NIH annual reports, totaling more than 700 volumes. To fill gaps in digital access, the Library plans to digitize the annual report collection, beginning with reports issued by the Clinical Center. The Clinical Center annual reports span thirty-five years, from 1958 to 1993. A pilot collection of eleven volumes has been selected for digitization and deposit in the NLM digital repository, covering fiscal years 1981 through 1993.

