ADMIRAL: A data management infrastructure for research across the life sciences
The final report of the project is available here: 
A detailed list of all project outputs can be found here
The purpose of the ADMIRAL Project was to create a two-tier federated data management infrastructure for use by life science researchers, that provided services (a) to meet their local data management needs for the collection, digital organisation, metadata annotation and controlled sharing of biological datasets; and (b) to provide an easy and secure route for archiving annotated datasets to an institutional repository, The Oxford University DataBank, for long-term preservation and access, complete with assigned Digital Object Identifiers and Creative Commons open access licences.
As such, it sought to promote datasets as first class information objects, and particularly aimed to serve the ‘long tail’ of small research groups that do not have the resources to create their own data management tools.
The particular reasons to adopt this two-tiered approach to research data management were several-fold:
- First, researchers have a need for improved local data management. We have created tools and services to meet this need, for their own personal benefit, in ways that fit with their normal working practices and that impose as little as possible in terms of cognitive overhead – what we term sheer curation. The local ADMIRAL file management system benefits from multi-platform integration (Windows, Mac and Linux), daily backup, sophisticated user access control, and web access via a browser, in addition to direct mapped drive access.
- Second, by obtaining engagement with researchers in this way, we have facilitated subsequent data submission to appropriate institutional or subject-specific data repositories.
- Third, by first creating a well-organized local data management infrastructure, we have been able to simplify the process of selecting data files for repository submission, and to automate the process of submission itself, while providing a convenient web interface into which users can enter metadata describing the submission.
In so doing, we have reduced what for the user was a large technical barrier to a sequence of easy stages, thereby all but eliminating the previously huge disincentive to date deposition in the institutional repository, and the accompanying publication of the data with an open CCZero waiver and a DataCite DOI for citation purposes.
The objective of this work has been to change the research data lifecycle. Conventionally, research datasets are initially poorly managed at the local level, with risk of loss from poor backup and management practices. Thereafter, they become essentially neglected and abandoned, once they have been used to provide the summary data included in the journal articles arising from the research project, as the researchers’ interests move on to the next project. With ADMIRAL, the local data is better managed, and mechanisms are provided whereby those datasets worth preserving and publishing can be submitted to the institutional data repository, specifically the Oxford DataBank, for preservation and reuse.
During the ADMIRAL Project, we intentionally lowered the barriers in terms of metadata requirements for initial data submission, with the possibility of the researcher or a curator enriching the metadata at a later date – what we call curation by addition in order to kick-start the cultural sea change required for data deposition to become routine. We have been trying to avoid the best – the requirement for perfect and complete metadata – becoming the enemy of the good – data publication by any means.
The ADMIRAL Project has provided a working exemplar of useful data management of real research datasets provided by world-class biological research scientists, and has formed the starting point for a follow-on project.
We are currently funded by the University Modernization Fund through the JISC DataFlow Project to further develop this prototype two-stage ADMIRAL data management system into robust services that can be rolled out for general use across UK HEIs, funded by the HEFCE University Modernization Fund:
(a) DataStage is the name of the private local data management file system (developed and previously known as ‘ADMIRAL’), with automated backup, Web access, and security access control, for use by individual research groups, and
(b) DataBank, a secure, embargo-competent cloud-deployable data repository for use by universities, research institutes or large research consortia.
This will involve addressing the following issues that remained outstanding at the end of the ADMIRAL Project: - Ensuring the software is brought up to professional production quality and robustness.
- Ensuring that all the software is fully covered by unit tests, and becomes part of a continuous integration environment.
- Debian-packaging the DataStage and DataBank services, so that they can be easily installed, either locally or on the Eduserv educational cloud, as VMWare instances.
- Permitting local institutional branding and configuration of these services.
- Creating seamless integration between the access control system used for local drive mapping and that employed to control web access, and if possible, integrating this with institutional single-sign-on systems for user authentication and authorization.
- Linking these services by use of the SWORDv2 repository communication protocol to standardize data package ingest, such that SWORD-wrapped data packages from a DataBank instance can be submitted to any SWORD-compliant data repository including DataBank, and that DataBank can ingest SWORD-wrapped data packages created by any SWORDv2-compliant data source including DataStage.
- Ensuring that these services conform to the emerging cloud deployment standards, so that they can be deployed in a number of cloud environments in addition to the VMWare Eduserv cloud.
- Creating user and developer communities around these DataFlow services, to assist with their long-term maintenance and development.
|
Through the new DataFlow Project, we thus intend to create fit-for-purpose open source data management services that will be made freely available for installation by third parties on the Eduserv academic cloud and elsewhere, to benefit research groups, institutions universities, commercial companies and governmental organisations both in the UK and internationally. We seek early adopters and test users of the DataStage and DataFlow systems!
Please contact the DataFlow Project Manager at jisc 'dot' dataflow 'at' gmail 'dot' com
Project Staff
Project Manager
Project Team