SageCite explores the foundations for a framework for citation of datasets. SageCommons offers an exemplar of a data-sharing initiative, in the area of disease network models. Requirements and options for citing datasets will be studied and a demonstrator will explore the technology issues in citing data, for example the granularity of citation and choice of identifiers for datasets and contributors. Citation is considered necessary for underpinning systems of accreditation to contributors, which is key to motivating the practice of data sharing. We will report on infrastructural and other issues such as implications for the record of scholarly process and link with publications.

SageCite: Citing network models of disease and associated data

SageCite explores the foundations for a framework for citation of datasets. SageCommons offers an exemplar of a data-sharing initiative, in the area of disease network models. Requirements and options for citing datasets will be studied and a demonstrator will explore the technology issues in citing data, for example the granularity of citation and choice of identifiers for datasets and contributors. Citation is considered necessary for underpinning systems of accreditation to contributors, which is key to motivating the practice of data sharing. We will report on infrastructural and other issues such as implications for the record of scholarly process and link with publications.

Aims and objectives

Aim: to explore a Citation Framework that combines data, process and publication.

Objectives:

  • Report on the approaches, options and requirements for citing large-scale predictive network models of disease.
  • Demonstrate a citation service for network models and associated data in the Sage Commons, using a linked data approach, in a citation-enabled workflow demonstrator.
  • Report on technical and policy issues for citing data in publications in partnership with DataCite, Nature Genetics and PLoS Computational Biology.
  • Report on evaluation, stakeholder analysis and benefits mapping.
  • Engage in outreach and community engagement across bio-informatics, research, data management and library and information communities.

 Project methodology

UKOLN provides project management and reviews of standards and technology, as well as research into user requirements. The University of Manchester develops Taverna and implementation of standards in a demonstrator, and expertise in infrastructure provision. The British Library represents DataCite. Nature and PLoS contribute the publisher’s perspective. SageCite represents data contributors. All partners are engaged with information and science communities and standards-setting organizations.
The general methodology involves desk research of standards and technologies available, discussion with users to understand their data, workflows and citation requirements, a demonstrator to explore infrastructural issues, collaboration with publishers and with service providers.

Anticipated outputs and outcomes

Outputs:

  • Report on requirements and standards for citing network models.
  • Demonstrator of citation embedded in workflow.
  • Reports on social and technical issues of data citation as part of the scientific record.
  • Benefits evaluation report.

Outcomes:

  • Understanding/documentation of requirements for citation in biological network modelling, and the nature of data and workflows.
  • Promoting and supporting change in practices.
  • Bridging discussions across disciplines and communities (scientific, information management).
  • Evaluation of technical approaches and understanding of infrastructural implications for supporting citation.
  • Gain experience of linked data for exposing data/metadata in citation infrastructure.
  • Understanding and documentation of the views and needs of stakeholders.

Technology/Standards used

No one single standard has emerged for citation of datasets. A number of options are available that offer potential to be employed in the infrastructure for data citation, and these will be reviewed and some will be implemented in a demonstrator. Taverna is the tool that will be used to manage workflow; Taverna produces metadata that is encoded according to linked data principles and conforms to standards where these are available e.g. Open Provenance Model. Standards for identification of datasets include DOI. Identifiers for contributors include ORCID. We will look to re-using ontologies where possible.

Project Staff

Project Manager
  • Monica Duke, University of Bath, UKOLN, BA2 7AY tel. 01943 462917,01225 386838 m.duke@ukoln.ac.uk
Project Team

 

Bookmark and Share
Summary
Start date
1 August 2010
End date
31 July 2011
Funding programme
Managing Research Data (JISCMRD)
Strand
Citing, linking, integrating and publishing research data (CLIP)
Project website
Lead institutions

UKOLN

Partner institutions
Committees
  • JISC Support of Research committee
Topic