SageCite: Citing network models of disease and associated data
SageCite explores the foundations for a framework for citation of datasets. SageCommons offers an exemplar of a data-sharing initiative, in the area of disease network models. Requirements and options for citing datasets will be studied and a demonstrator will explore the technology issues in citing data, for example the granularity of citation and choice of identifiers for datasets and contributors. Citation is considered necessary for underpinning systems of accreditation to contributors, which is key to motivating the practice of data sharing. We will report on infrastructural and other issues such as implications for the record of scholarly process and link with publications.
Aims and objectives
Aim: to explore a Citation Framework that combines data, process and publication.
Objectives:
- Report on the approaches, options and requirements for citing large-scale predictive network models of disease.
- Demonstrate a citation service for network models and associated data in the Sage Commons, using a linked data approach, in a citation-enabled workflow demonstrator.
- Report on technical and policy issues for citing data in publications in partnership with DataCite, Nature Genetics and PLoS Computational Biology.
- Report on evaluation, stakeholder analysis and benefits mapping.
- Engage in outreach and community engagement across bio-informatics, research, data management and library and information communities.
Project methodology
UKOLN provides project management and reviews of standards and technology, as well as research into user requirements. The University of Manchester develops Taverna and implementation of standards in a demonstrator, and expertise in infrastructure provision. The British Library represents DataCite. Nature and PLoS contribute the publisher’s perspective. SageCite represents data contributors. All partners are engaged with information and science communities and standards-setting organizations.
The general methodology involves desk research of standards and technologies available, discussion with users to understand their data, workflows and citation requirements, a demonstrator to explore infrastructural issues, collaboration with publishers and with service providers.
Anticipated outputs and outcomes
Outputs:
- Report on requirements and standards for citing network models.
- Demonstrator of citation embedded in workflow.
- Reports on social and technical issues of data citation as part of the scientific record.
- Benefits evaluation report.
Outcomes:
- Understanding/documentation of requirements for citation in biological network modelling, and the nature of data and workflows.
- Promoting and supporting change in practices.
- Bridging discussions across disciplines and communities (scientific, information management).
- Evaluation of technical approaches and understanding of infrastructural implications for supporting citation.
- Gain experience of linked data for exposing data/metadata in citation infrastructure.
- Understanding and documentation of the views and needs of stakeholders.
Technology/Standards used
No one single standard has emerged for citation of datasets. A number of options are available that offer potential to be employed in the infrastructure for data citation, and these will be reviewed and some will be implemented in a demonstrator. Taverna is the tool that will be used to manage workflow; Taverna produces metadata that is encoded according to linked data principles and conforms to standards where these are available e.g. Open Provenance Model. Standards for identification of datasets include DOI. Identifiers for contributors include ORCID. We will look to re-using ontologies where possible.
Project Staff
Project Manager
- Monica Duke, University of Bath, UKOLN, BA2 7AY tel. 01943 462917,01225 386838 m.duke@ukoln.ac.uk
Project Team
- Liz Lyon, University of Bath, UKOLN, BA2 7AY 01225 386580, +44 (0) 1225 386838 e.j.lyon@ukoln.ac.uk
- Adam Farquhar, British Library, Digital Technology, 01973 546515, no fax, adam.farquhar@bl.uk
- Max Wilkinson, British Library, Digital Technolgoy, 020 7412 7040, no fax, max.wilkinson@bl.uk
- Carole Goble, University of Manchester, School of Computer Science, carole.goble@manchester.ac.uk
- Peter Li, University of Manchester, School of Computer Science, peter.li@manchester.ac.uk