Our study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context.

e-Science Curation Report

Science is being transformed by accelerating change in information technology, with huge increases in computing power and network bandwidth, accompanied by an explosion in data volumes and information.

Executive summary

The term e-Science – or more inclusively “e-Research” - has been used recently to describe the research culture and opportunities enabled by these developments, and the collaborations of people and of shared resources that are needed to resolve new research challenges, whether in the sciences, social sciences or humanities.. e-Science enables a new order of collaborative, more inter-disciplinary research, based on shared research expertise, instruments and computing resources, and crucially increasing access to collections of primary research data and information - the knowledge base of research. The term e-Science is applied to these techniques when applied to the sciences. In this report we use the term e-Science.

There are challenges, however: these same technology changes put the very data they create and use at risk, and raise serious and complex issues of strategy and policy regarding its creation, management, and long-term care – its curation – for which top-level responsibility urgently needs to be adopted to protect and further UK research.

Our study examined the current provision and future needs of curation of primary research data in the UK, particularly within the e-Science context. At a strategic level we found: 

  • Confirmation that the data revolution presents significant challenges and opportunities. However, our surveys show that the UK is not fully prepared to capitalize on the opportunities and urgently needs to address this
  • There is a lack of a government-level, overall strategy for data stewardship and data infrastructure to which research administrators can refer, still less to support researchers in their evolving roles and duties with regard to data curation
  • Existing data centres are usually supported by sponsors whose primary funding focus is research projects
  • The current short-term funding models for the provision of curation are antithetical to its long-term nature and needs
  • There will be an exponential increase in data volumes from e-Science over the next decade as planned new scientific instruments and experiments come on stream. However, to benefit fully from this major investment, further action is needed to support the curation of the data that will be generated
  • Not all primary research data needs to be retained or has long-term value. Its potential value for generating new research will vary, and the level of investment in the curation of datasets therefore needs to be identified and graduated accordingly.

At a policy level we found: 

  • Provision of curation is patchy, and more advanced in some disciplines than others; the basic life sciences, and “big” collaborative sciences such as particle physics and astronomy are examples where provision is most advanced
  • Where retention and curation of primary research data is a requirement set by funding bodies, the majority of researchers surveyed stated this requirement was not funded. Where guidance is provided, researchers frequently felt that it was out of date or inadequate
  • Awareness of the issues - particularly data longevity difficulties - is generally low among researchers. Consequently the good practice needed to assure data longevity is rare, putting valuable resources at risk
  • For curation to be effective the researcher needs to be engaged in the curation of his or her own data, working in partnership with curators. But few incentives or procedures are in place to ensure that this engagement is achieved
  • Whilst practice and experience in curation is increasing rapidly, areas of curation are still in a research and proof-of-concept phase. Much research and practical, exploratory activity is being undertaken in the UK, and its quality is world-class
  • The data revolution raises many issues of trust which must be addressed before data-based research can flourish – issues of security, confidentiality, ownership, assured provenance, authenticity, and data and metadata quality
  • There is little interaction and sharing in curation experiences between science-based industry and the academic sector. Within the next decade the curation of digital content and data is likely to be critical to science- and engineering-based industries and to knowledge-based economic activity.

Based on these findings we set out our major, strategic recommendations in the list overleaf. Our report details further specific recommendations within the body of the report, where we also outline proposals for the organisational structuring of curation provision and provide a table showing which recommendations address the findings summarised above. These recommendations cross organisational boundaries and span organisational levels. 

Strategic recommendations

A1 Strategic-level advocacy for data curation is needed and should be assigned to a respected and influential champion so that strategic objectives are clearly articulated, to set the UK’s curation agenda over the medium term, and to enhance the UK’s standing, contribution and opportunities in this area.
A2  A curation task force made up of curation experts, practising researchers and research administrators should be established to inform and guide this agenda. This task force should work closely with and inform the work of the new UK Digital Curation Centre.
A3   The mismatch of short-term funding against the long-term needs for data retention needs to be addressed by providing new specific, long-term funding stream(s) for data centres and curation, thus ensuring that there is a strategic approach to data stewardship which addresses holding information indefinitely, makes it widely available and encourages cross-disciplinary usage, including linking to other digital information.
A4 Funding bodies should consider supporting research-led exemplars of curation to demonstrate and promote the benefits of curation for new research.
A5   Our findings endorse the need for the Digital Curation Centre and we recommend its establishment as part of a national provision for data curation in the UK.
A6 Criteria need to be established to determine what data we should keep, why and what level of curation is appropriate, together with mechanisms to monitor, validate and to modify them with accumulating experience.
A7 A programme of activities, both national and international, should be initiated to promote incentives which will foster a scientific culture of engagement in data care.
A8 Educational materials, guidelines and policy documents for researchers need to be developed and publicised.
A9 There should be increased investment, knowledge transfer, and cross-sector partnerships with knowledge-based and science and engineering industries to capitalize on UK expertise in data curation. This should be led by the DTI.
A10 Investment should be strengthened in those areas of curation research which will enhance data re-use; in particular we recommend focusing on those areas of research needed to establish trust in curated information.

 
  
It is our view that, as the highest priority, responsibilities should be assigned for the strategic recommendations. Following feedback from JCSR, the following responsibilities for taking actions are recommended:

Action to be taken by:

A1 e-Science Core Programme to follow up with the RCUK.
A2 JCSR should take responsibility for establishing a Curation Task Force which could inform the strategic implementation of the digital curation agenda.
A3 HEFCE and OST.
A4 The production of research-led exemplars of curation could be co-ordinated by the new Digital Curation Centre.
A5 The Digital Curation Centre is now being established, managed by JCSR.
A6 These recommendations are the responsibility of the Research Councils and should be included in the paper which will be presented to the RCUK at a future meeting.
A7 The Digital Preservation Coalition and the Digital Curation Centre.
A8 e-Science Core Programme and Research Councils.
A9 e-Science Core Programme to follow up with the DTI.
A10 As for A2 above.

  

Read the final report below
  
 
  
  

Documents & Multimedia

Bookmark and Share
Summary
Author
Philip Lord and Alison Macdonald
Publication Date
1 November 2003
Publication Type
Topic