Laying the Foundations for Repository Preservation Services
The PRESERV project (2005-2007) investigated long-term preservation for institutional repositories (IRs), by identifying preservation services in conjunction with specialists, such asnational libraries and archives, and building support for services into popular repository software, in this case EPrints.
We began by producing a simple model showing how preservation services might support repositories. This model changed quite substantially as a result of the project’s findings, towards a more powerful and flexible framework that suggests a range of granular Webbased services and providers
At the heart of most digital preservation activity is the need for accurate identification of the format of the original source objects. Formats are essentially the signatures of the applications used to create the objects, and because applications change over time to exploit the capabilities of new technology, older versions of digital objects can become unreadable. One approach to this problem is to migrate the original format to a current, readable version. By knowing the formats of all objects in a repository, preservation strategies can be planned and action taken at the appropriate time on those objects that may otherwise be at risk of becoming obsolete. PRESERV was able to work with The National Archives, which has produced PRONOMDROID, the pre-eminent tool for file format identification. Instead of linking PRONOM to individual repositories, we linked it to the widely used Registry of Open Access Repositories (ROAR), through an OAI harvesting service. As a result format profiles can be found for over 200 repositories listed in ROAR, what we call the PRONOM-ROAR service (Brody et al. 2007).
The lubricant to ease the movement of data between the components of the services model is metadata, notably preservation metadata, which informs, describes and records a range of activities concerned with preserving specific digital objects. PRESERV identified a rich set of preservation metadata, based on the current standard in this area, PREMIS, and where this metadata could be generated in our model. We found that PREMIS appears to provide an excellent basis on which to assess the needs of IRs with respect to preservation metadata, and it is possible to map the PREMIS elements to an extended model incorporating preservation services (Hitchcock et al. 2007b).
The most important changes to EPrints software as a result of the project were the addition of a history module to record changes to an object and actions performed on an object, and application programs to package and disseminate data for delivery to an external service using either the Metadata Encoding and Transmission Standard (METS) or the MPEG-21 Part 2: Digital Item Declaration Language (DIDL). One change to the EPrints deposit interface is the option for authors to select a licence indicating rights for allowable use by service providers or users, and others.
Through the results of a survey of repository preservation policy and activity we have a better understanding of what IRs are actually doing to prepare for preservation, and this survey will help service providers to target appropriate services at repositories (Hitchcock et al. 2007c). PRONOM-ROAR changes the outlook for preservation services. By making format profiles openly available and demonstrating Web services as a way to deliver the information, repositories can make more informed decisions about preservation services, and service providers can begin to interact with repositories in more flexible ways. This suggests a range of granular services and providers, starting with example services such as PRONOM-ROAR, which can be tailored to suit the needs of diverse institutions and their repositories, from the largest research university to the smallest teaching college.
PRESERV has identified a powerful and flexible framework in which a wide range of preservation services from many providers can potentially be intermediated to many repositories by other types of repository services. It is proposed to develop and test this framework in the next phase of the project.