- Home
- » Publications
- » Identifying the Benefits of Curating & Sharing Research Data
Identifying the Benefits of Curating & Sharing Research Data
It is becoming increasingly clear that effective and efficient management and reuse of research data will be a key component in the UK knowledge economy in years to come, essential for the efficient conduct of research and its dissemination and use. In recognition of this, there have been many calls for access to science data at national and international levels.
Executive Summary
JISC and other UK funding bodies have developed a number of initiatives concerned with the management and curation of research data. The report by Lyon (2007) was pivotal in delineating the issues that need to be addressed and this project aims to take forward Recommendation 30: JISC should work in partnership with the research funding bodies and jointly commission a cost‑benefit study of data curation and preservation infrastructure.
The project’s objectives were to:
- Identify the benefits of curating and sharing research data
- Identify a methodology by which to estimate the benefits to UK Higher Education and the UK more generally of curating and openly sharing research data produced by researchers in UK HE
- Use the methodology, as far as possible, to derive an estimate, expressed in financial terms where possible, for the identified benefits
- Document case studies and examples of data re‑use, where that re‑use led to tangible benefits.
Potential benefits of the open sharing and re-use of research data include: maximised investment in data collection; broader access where costs would be prohibitive for individual researchers/institutions; potential for new discoveries from existing data, especially where data are aggregated and integrated; reduced duplication of data collection costs and increased transparency of the scientific record; increased research impact and reduced time-lag in realising those impacts; new collaborations and new knowledge-based industries.
Broader indirect benefits might include transparency in research funding, use of data sets in education to enhance data awareness of students, enhanced researchers’ skills through access to a broader range of data, tools and standards have potential to increase data quality, and increased visibility and promotion of institutions and researchers.
The project used a mixed method approach, including a literature review and qualitative case studies to inform the development of a model on which to build a business case for data sharing in UK Higher Education. The case studies investigated were the European Bioinformatics Institute (EBI) and Qualidata, which is part of the Economic and Social Data Service. The case‑studies were based on semi‑structured interviews with service providers and users of the service. The interviews were supported by documentary evidence in order to identify and illustrate the benefits and costs for the different stakeholders.
Benefits may accrue in a variety of ways, including cost savings, efficiency gains, and new opportunities to create value through doing things in new ways and doing new things. These are, successively, more difficult to quantify: not least because they often emerge over time and can only be realised in the future. We present a simple example of cost‑benefit analysis applicable to an individual dataset or repository, based on costs and potential cost savings. It describes the data requirements and walks the reader through the process step‑by‑step. The approach is then extended to explore the more diffuse benefits of data curation and sharing at the institutional and disciplinary levels.
Recommendations
Recommendation 1 – Baseline reporting
A key finding of this research is that there is, as yet, no standardised and consistent system of reporting of the data necessary to make a business case. Therefore, we recommend:
-
The development of guidelines for data collection and reporting through consultation with stakeholders, taking account of the need to minimise the reporting burden
-
Further classificatory work to explore how costs and benefits differ according to institutional and disciplinary factors such as intellectual field, objects of research, data types, analytic techniques and approaches
Recommendation 2 – Model Questionnaire
This project has focused on identifying the data necessary to make a compelling business case for data curation and sharing. In doing so it has provided a foundation for the development of a model data collection framework that could be further developed. We recommend that this is taken forward by:
-
The development of a model questionnaire building upon the questions outlined in Section 9.2 information sources. This could then be combined with an extended version of the 'Beagrie model', which would capture repository cost data
-
To reduce duplicative effort in building business cases we recommend that JISC host a web based survey/data gathering instrument, and invite repository/data centre management staff and users (from both deposit and withdrawal sides) to use the instrument for reporting purposes
-
Public dissemination of such a survey could be confidential and anonymised by aggregating repositories according to the key institutional and disciplinary factors identified as part of recommended guidelines for baseline reporting
Recommendation 3 – Developing a community resource
To achieve an empirical and scalable evidence base upon which policy makers and funders can evaluate benefits at different levels of granularity, e.g. across types of repository/centre or discipline, a system of consistent recording and reporting needs to be developed. Given the differences in practices and types of re-use across disciplines that this study has highlighted this system would need to be implemented in a culturally sensitive way. We recommend that:
-
The centralised collection and collation of data resulting from the development of guidelines for baseline reporting and participation by community members in the model questionnaire be made available as a shared community‑level resource and that
-
Such a resource should stipulate what basic core data might be collected and reported annually
-
The model might include collection of the following data in a consistent way: annual acquisitions (data submitted, data accepted), annual usage (downloads, requests), citations, external funds received, annual spend (split across main budget headings)
Download the full report