Developments in technology can have a big impact on research data infrastructure.
Back in the mid-2000s, the open-source ‘data lake’ emerged. Structured and unstructured data could flow in, promising innovative and unlimited insights. Today data science, machine learning, and algorithmic approaches to data intensive research have become a cornerstone of research.
It is a complex space, and research is increasingly dependent on infrastructure to manage these ever-growing volumes and increasing diversity of data.
Data diversity and complexity
Data used in or created by research may be personal and covered by legislation for their collection, management, access and usage. They may be subject to disclosure control and mechanisms which govern linkage.
Data may be commercially sensitive, contain intellectual property (IP), or have specific licence conditions for usage. They may have been obtained under a non-disclosure agreement or have conditions for usage negotiated from a commercial provider.
Data may require additional protections for platform accountability or national security. They may be streamed at a scale or volume, which requires magnitudes of computational power. Data may be qualitative or quantitative, graphical, aural, chemical or biological.
These different characteristics all require technical and above all, strategic approaches to assure the governance, utility, accessibility and potentially, reproducibility of the data.
Data are governed by a wide range of legislation and international standards, for example information security management. Research collaborations awarded with UK Research and Innovation (UKRI) funding are also expected to follow good research data management practices. Each Research Council has its own data policies and UKRI has developed a set of seven common principles, which focus on openness, standards, discoverability, ethics, embargo, recognition and cost effectiveness.
Convening the right components at the right time or research projects are diverse too
Data creators and researchers increasingly depend on well-governed data infrastructure, and on trusted research environments that place focus on compliance and assurance, but also on research ethics and integrity as key to realising impact.
There is increasing demand to extend this approach to multi-partner research projects. A focus on well-governed data infrastructure could enable greater collaboration.
Rebalancing the narrative
Jisc’s focus on supporting a national data infrastructure for research is aligned with the UK government’s Research and Development Roadmap ambition to “develop our digital research infrastructure capability – by building an internationally leading national digital research infrastructure.”
The UK already benefits from world-leading infrastructure such as the Janet Network, state-of-the-art cyber security services, and cloud infrastructure. Yet I see an opportunity to support research and innovation collaborations further, by offering a flexible set of solutions for institutions and research collaborations to implement data-focused infrastructure more rapidly.
We know that research projects and collaborations are diverse, so we are convening cost-optimised and trusted digital, software and technical components scaled to need as technology develops. Our focus is on interoperability, acknowledging requirements for different configurations across different disciplines, across multi-partner research consortia.
We aim for a supportive approach to enable research consortia to move through technologies as they emerge and recede, saving research projects valuable grant time.
This approach will offer faster routes to implementation, while keeping pace with technical developments.
Jisc is uniquely placed to help coordinate and shape this national infrastructure. We can already demonstrate data transfer securely and at scale: just last year, the Janet Network enabled the transfer of a 5TB COVID-related dataset between the European Bioinformatics Institute (EBI) and Imperial College London in less than three hours at a data rate of around 500MB/s (4Gbit/s).
Our EU-compliant Open Clouds for Research Environments (OCRE) procurement framework already supports research institutions to get Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and complementary Software as a Service (SaaS) up and running quickly and cost-effectively, helping researchers implement machine learning and artificial intelligence, as well as embedding advanced analytics platforms, which are growing in capacity and scope.
National data infrastructure for research also needs to support the coordination of research outputs such as digital preservation to keep data usable over extended periods of time.
Sector-wide preservation approaches will support compliance with funder mandates that support streamlined data curation and automated workflows for a range of research outputs. Such approaches will also support a focus on what to keep - and for how long – also creating opportunities to focus on innovation in storage over time.
Ultimately research management teams will be able to coordinate the components of their research infrastructure, keeping pace with technical developments avoiding the potential to incur technical and environmental debt, inefficiency and risk from legacy technologies.
Jisc is already working with several institutions, looking at the components they want to see configured to help them in their current research activity. We look forward to continuing to support such research collaborations with our growing portfolio of solutions.
More about our vision for research can be found in our Research and innovation sector strategy 2021-2023.