FAIR Synthesis: Repository Management and Practice
This webpage has been archived. Its content will not be updated.
View web retention policy
There’s more to setting up a repository than getting the technology and standards in place. At a practical level, processes are needed. These need to be planned, managed, and fine-tuned over time in the light of experience. At a more conceptual level, there are models for how repositories work, and different models for different types of repositories. In the long term there will need to be economic models to demonstrate how repositories will be sustainable in the long term.
Many of the FAIR projects have created repositories and gained practical experience with repository practice and management. This section highlights their experience to date. Already some generic repository models are being developed and shared with the community. As the FAIR programme progresses, more models are likely to emerge.
Repository models
Where most of the FAIR projects are developing repository services and exploring issues related to creating and running them, some projects are taking a broader view of repositories and how they may evolve.
Three FAIR projects worked in the area of ETDs – DAEDALUS, Electronic Theses, and Theses Alive!. Each developed its own repository and together the projects explored key issues like production, submission, workflow, metadata, dissemination, and use. Based on their experience, Electronic Theses outlined a model for handling ETDs in the UK. They recommended that:
- Where possible individual Higher Education institutions should create their own EDT collections as part of, or in parallel with, an open access institutional repository that also contains e-prints and other research output. They also recommended establishment of a national EDT collection maintained by the British Library, which would meet a range of needs at national level, including issues related to preservation.
- Where institutions operate on a federal basis, current arrangements regarding the deposit and storage of theses can be adapted to accommodate PhDs in electronic format. An example of this would be the liaison between The University of Wales and the National Library of Wales.
- Copies of ETDs from both individual and federal institutions should also be kept in a national collection maintained by The British Library. Smaller institutions that cannot afford the costs of establishing and maintaining their own collections should still provide the BL with copies of their PhDs in electronic format where possible.
Results from the three projects and the draft model have been shared with the community at workshops and via discussion lists. An important outcome is the EThOS project (funded by JISC), a collaboration across a number of universities and the British Library to further investigate ETDs on a national basis. The project is led by University of Glasgow (DAEDALUS), and partners include University of Edinburgh (Theses Alive!), The Robert Gordon University (Electronic Theses), University of Southampton (TARDis), and the SHERPA consortium led by University of Nottingham. See further information.
Hybrid Archives has developed a new model for the preservation of datasets. The model allows for data to be deposited at the Arts and Humanities Data Service through harvesting via OAI-PMH, but for the content to also be held by the data owner who then provides access to it. This differs from current practice where data preservation traditionally involves handing over the entire dataset to the AHDS or similar body, who then preserves it and also provides access to it. It therefore provides a bridge between the complexity (and burdens) of full ‘traditional’ deposit of institutional collections and the more simplified approach embodied in harvesting methodology.
There are legal requirements to this model, to lay down who has rights to do what with the data. The licence produced addresses these requirements and has been tested with data owners. The model allows for materials with IPR issues to be deposited for preservation, and access then provided at a later date once the IPR issues have been resolved. The AHDS guarantees to update file formats as required as part of that preservation. The model requires a sense of trust between the data owner and the AHDS, but this can be encapsulated within the licence. The project has thus shown that legal arrangements can be put in place to facilitate different actions upon data. The model does not impose data standards, but by encouraging their use through the terms of the licence can hopefully bring about their use by data owners.
The model has been welcomed by the research community. In the recent round of AHRB funding the AHDS was approached by bidders wishing to make use of the Hybrid Archives model. As a result, the AHDS is looking to offer the Hybrid Archives model as a regular service. As a first step, AHDS is undertaking a survey to determine whether they have the necessary metadata and harvesting capability to undertake hybrid deposit.
Submission
Projects developing repositories of e-prints and ETDs have had to plan the submission process, fine-tune it in light of experience, and explain it to depositors. The reports below document the processes for Electronic Theses and Theses Alive!. Depositor guides developed for users are listed in the section on
advocacy.
At the start of the FAIR programme, many projects assumed they would follow a self-archiving model where the user deposits documents unassisted. Some projects have subsequently moved to a mediated archiving model (where users are assisted by staff) to encourage population of the repository. This has been the case for DAEDALUS, and they plan to document their mediated archiving model in a future paper. TARDis took the mediated archiving route early on in order to get content into the repository, and adapted to the needs to different departments in order to do this. Both projects feel that the mediated model needs to be carefully managed as it raises expectations about what can be provided in the long term.
Workflow
Similarly, projects developing repositories for e-prints and ETDs have had to plan workflow, for both processes and systems, and adjust this in the light of experience. The reports below document these.
Preservation
SHERPA is exploring the preservation of e-prints. The premise is that it makes sense to plan for the future and explore the long-term preservation issues now when the development of e-print repositories is at an early stage. The work is being led by
AHDS and is complementary to their work on Hybrid Archives developing new preservation models for datasets. Work to date is described on the
SHERPA website and key documents are listed below:
- Selection criteria for the preservation of e-prints, Gareth Knight, SHERPA Deliverable D4-4, December 2004
- Report on preservation standards, Gareth Knight, SHERPA Deliverable D4-5, December 2004
- Pinfield, S. and James, H., The Digital Preservation of e-Prints, D-Lib Magazine 2003, 9(9)
- Feasibility and Requirements Study on Preservation of E-Prints, Hamish James, Raivo Ruusalepp, Sheila Anderson, and Stephen Pinfield, October 2003.
SHERPA's work on preservation will be extended through the JISC-funded SHERPA DP project (SHERPA Digital Preservation: Creating a Persistent Preservation Environment for Institutional Repositories). SHERPA DP will create a collaborative, shared preservation environment for the SHERPA institutional repositories projects framed around the Open Archiving Information Systems (OAIS) Reference Model. The project will bring together the SHERPA institutional repository systems with the preservation repository established by the AHDS to create an environment that fully addresses all the requirements of the different phases within the life cycle of digital information.
Hybrid Archives has developed a new model for the preservation of datasets. The model allows for data to be deposited at the AHDS through harvesting via OAI-PMH, but for the content to also be held by the data owner who then provides access to it. This differs from current practice where data preservation traditionally involves handing over the entire dataset to the AHDS or similar body, who then preserves it and also provides access to it. For further information about the hybrid model, see the section on repository models. The model will be supported by reports on preservation requirements and preservation metadata, to be posted in the project web site in summer 2005. Hybrid Archives has also done a useful report on the preservation of dynamic collections.
Costs and Economic Models
SHERPA provided evidence to the House of Commons Science and Technology Committee for its Inquiry into Scientific Publications. This included a report providing guideline costs for establishing and maintaining an institutional repository. Their report on the preservation of e-prints briefly outlines a cost model for e-prints.
ePrints UK has developed a national service provider of e-print records by harvesting metadata from subject-based and institutional repositories and making them available through a single search interface. The report below introduces some of the business and intellectual property rights issues associated with the metadata approach taken by the project. This includes an outline of sustainable business models that could be adopted by aggregator services like ePrints UK. It builds on the work of the Open Archives Forum project (2001-2003) in which UKOLN participated.
Managing Medical Images
The BioMed Image Archive carried out an ‘independent expert review’ of legacy images that would be carried forward from the existing Bristol BioMed Image Archive to the new BioMed Image Archive system to ensure they were appropriate for the web. The review was undertaken by specially commissioned medical imaging and legal specialists, and covered a variety of areas, primarily focusing upon the suitability of external images of human patients and external veterinary images. The review was valuable in that it resulted in a set of images whose provenance is 100% reliable. However, the most important outcome of the review was the creation of the ‘BioMed criteria’ for the independent image review. No work in assessing the suitability of existing (biomedical) image collections has been carried out before. The criteria produced by BioMed Image Archive, in conjunction with the specialists, will undoubtedly be of value to the wider education and healthcare communities, who can now adopt or adapt criteria that have been fully tested on a wide range of biomedical images.
A revised licence agreement and terms and conditions, plus disclaimer and privacy policy, were also generated by the project, and these can also be used in the setting up and maintenance of other image archives, particularly where ethical concerns are an issue. They have been used, for example, to guide policy across all image repositories at the University of Bristol. See the section on legal issues for further details.
Virtual Handling of Museum Objects
One facet of the Accessing the Virtual Museum project was to explore the concept of the ‘virtual museum’ and the extent to which digital surrogates of a museum objects could form a surrogate for the museum itself. Static images of museum objects are the first step, and Accessing the Virtual Museum reasoned that the next step might be more interactive. They undertook research in ‘virtual handling’ to investigate how museum objects could be made available via video link. The objective was to develop a model for virtual handling and explore the feasibility and practicalities of offering such a service. Reports summarise the results; please contact project staff for copies and further information.
- Virtual handling evaluation report
- Guidelines on virtual handling
- Virtual handling questionnaire