Keeping research data safe (Phase 2)
Download the report
This is the report for phase 2 of this work. Phase 1
The first Keeping research data safe study (Beagrie, Chrusczcz and Lavoie, JISC 2008) made a major contribution to understanding of long-term preservation costs for research data by developing a cost model and indentifying cost variables for preserving research data in UK universities. This aim of this follow-on project (2009/2010) was to provide a larger body of material and evidence against which existing and future data preservation cost modelling exercises could be tested and validated. This report presents the results of a survey of available cost information, validation and further development of the Keeping research data safe activity cost model, and a new taxonomy to help assess benefits alongside costs. A range of supplementary materials in support of this report are available on the project website
Data has always been fundamental to many areas of research but in recent years it has become central to more disciplines and inter-disciplinary projects and grown substantially in scale and complexity. There is increasing awareness of its strategic importance as a resource in addressing modern global challenges and the possibilities being unlocked by rapid technological advances and their application in research (NAS 2009).
The first Keeping research data safe study (KRDS1) funded by JISC made a major contribution to understanding of long-term preservation costs for research data by developing a cost model and indentifying cost variables for preserving research data in UK universities (Beagrie et al, 2008). The Keeping research data safe 2 (KRDS2) project has delivered:
- A survey of cost information for digital preservation, collating and making available 13 survey responses for different cost datasets
- The KRDS activity model has been reviewed and its presentation and usability enhanced
- Cost information for 4 organisations (the Archaeology Data Service; National Digital Archive of Datasets; UK Data Archive; and University of Oxford) has been analysed in depth and presented in case studies
- A benefits framework has been produced and illustrated with two benefit case studies from the National Crystallography Service at Southampton University and the UK Data Archive at the University of Essex
Examples of our key findings
Our main findings are presented in full in the Conclusions section of the full report (section 9)
Long-term costs of digital preservation for research data
The costs of archiving activities (archival storage and preservation planning and actions) are consistently a very small proportion of the overall costs and significantly lower than the costs of acquisition/ingest or access activities for all our case studies in KRDS2. This confirms and supports a preliminary finding in KRDS1.
Benefits of preserving research data
We have recognised that the identification and promotion of the 'near term benefits' are particularly important in advocacy to researchers: we can show in our benefit case studies and also our costs work at Oxford that there are significant benefits in the short-term to current researchers as well as long-term benefits to future research.
Our survey and sources of information for costs
11 responses were received from the UK and two from mainland Europe. Unfortunately a further two offered from the USA could not be available within the deadline for publication of KRDS2. Cost information from respondents is available for most of the KRDS2 main activity phases (pre-archive, archive, access, support services, and estates), although the depth and breadth of information available from different collections varies considerably (see section 6 for individual responses).
Application of the KRDS activity model
- The KRDS activity model has been reviewed by partner institutions and found to be broadly robust and fit for purpose: some small changes have been made to the sub-activities as part of KRDS2 (see section 4) and guidance on its application extended
- We have recognised that the activity cost models should be applied at different levels of detail for different purposes: as a result KRDS2 now caters for potential dual application of the activity model with two versions presented at different levels of detail (see sections 5.2, 5.3, and 5.4 of the full report)
Our work has confirmed the strengths of the approaches underlying the original Keeping Research Data Safe report produced in 2008 but also allowed some limitations and areas needing further development to be defined (in section 9 of the full report we have discussed these areas)
Recommendations for future work
|Future researchers and their funders should note from our work that longitudinal studies of digital preservation costs are best developed from relatively recent cost evidence (and future prospective evidence accumulated to it). This is more amenable to mapping into a consistent framework for analysis and often more complete than more historic cost evidence. A range of potential sources of such cost evidence are identified in our survey.|
|The KRDS project team should seek future opportunities to extend the costs survey; raise awareness of KRDS internationally; and develop research partnerships on digital preservation costs.|
|From KRDS2 outcomes, it is likely that the largest potential cost efficiencies will come from future tool development supporting ingest and access activities. Funders may wish to focus on investigating the potential benefits that could arise from further automation of these activities.|
|Examine further development of the pre-archive phase of the KRDS2 activity model and produce versions of the model from a researcher’s perspective.|
|Seek to implement KRDS2 in cost spreadsheets and continue research on implementation variables and metrics that could enhance them.|
|Develop presentation of KRDS as a tool with elements such as guidance notes updated and packaged alongside components such as the activity models and future potential elements such as cost spreadsheets.|
|Elements from KRDS2 and its findings should be considered by JISC for inclusion in its Research 3.0 campaign to disseminate the results and findings to other end users.|
|JISC and other funders to consider further work on identifying and quantifying the benefits of research data preservation. |
In summary, in KRDS2 we have identified and analysed collections of long-lived research data and information on associated preservation costs and benefits and provided a larger body of material and evidence against which existing and future research data preservation cost modelling exercises can be tested and validated. We believe this work will be critical to developing preservation costing tools and cost benefit analyses for justifying and sustaining major investments in repositories and data curation.