It was in the Antarctic that I came to understand the importance of open science and the infrastructure that is needed to support it.
I was working at the Polar Data Centre to ensure that scientific polar and cryospheric data was available to various research teams both in the UK and in the field. It struck me that access to this rich data was limited to the research teams directly involved, but didn’t go beyond the biological and environmental research communities.
Breaking down barriers
Research funding is still presented in siloed ways, and we need to look at how we can break down those barriers. Some of the barriers began to be addressed by the Big Data Network. In 2013, the Economic and Social Research Council (ESRC) invested in projects to bring together data that helps to inform government policy makers. Subsequent investments such as the Administrative Data Research UK (ADRUK) and Health Data Research UK (HDRUK) are transforming the way researchers access the UK’s wealth of public sector data to enable better informed policy decisions that improve people’s lives. All researchers need this national research data infrastructure to capture, store, process and access data to enable collaboration across all disciplines.
However, we see a lot of money invested in some superb research assets, but they are not all joined up. Especially research communities, such as humanities, are often not (yet) integrated into a national data infrastructure even though rich peta-scale data sets have become the norm for most research projects.
Catching up with big legacy projects
When I was at the British Antarctic Research Survey ten years ago, I already had about six petabytes of data to manage of which most was remote sensing and climate modelling data. None of that research was fed into a national infrastructure and there’s still a lot of scientific data that is stored outside a national research infrastructure. This is not a criticism; we just need to review where we’re at and see how we can join up these rich sources of data.
The problem we’ve got is that research communities are granted big pots of money to support their area of research. The UK has some fantastic research facilities such as the MET office and the Square Kilometre Array, but investments in these fabulous projects does not necessarily create a true national data infrastructure.
What I take away as a learning from the Big Data Network is that a lot of the assets are standing on their own. This is where we’re missing a piece. We’re trying to stand up a lot of information, but it needs a lot of computational support and we’re getting to a point that we can no longer move the data around so easily. We need to distribute access rather than data.
If you look at the government’s National Data Strategy or the grand research challenges of ‘healthy aging’ or ‘sustainable food’ we need to bring in a lot of requirements to overlay or wrap data so that we can connect various data sources.
Now is the time to get a greater understanding of this national federated asset piece. My vision is that for every UKRI grant made for research, a small proportion will be allocated to the creation of a national infrastructure for research data. Again, there are lots of great examples of building federated data off a FAIR data approach (FAIRsharing) and computing infrastructures that require ongoing support and investment in the biological sciences such as I have encountered in my new role at the Norwich Biosciences Institutes Partnership (NBI Partnership).
Currently, research projects need to present a research data management plan and define a place of deposit, which can be localised silos without long-term resources to invest in data management and curation. Many research communities are still without robust data management and their storage plans do not have a central or federated repository strategy either at a national level or at a broader discipline level.
National data infrastructure for all research
As a person that has worked within number of UK academic institutions (HEI & UKRI), I’ve supported a lot of effort just capturing research data. I estimate that around 70% of the research still isn’t part of a larger data infrastructure.
I propose that we look at the UKRI investment funds and say that all research communities now have the same requirements as CERN and the MET office ten or so years ago. Large computational and data needs are now the norm as any projects that are tracking the UK Government’s strategic objectives or the grand challenges agenda, all work in an interdisciplinary and multi-institutional way.
We can’t ignore the pivotal interlocking piece of engineering that is not funded and that is harder and harder to do due to the increasing complexity of data sets. This is why it is essential that all research computing roadmaps clearly emphasise this challenge.
Creating a knowledge base and applying AI
As a trusted, not-for profit organisation, Jisc could offer this basic service like you would normally enter into when you work on a specific project. This base level connectedness will promote collaborative working and interdisciplinary working. It will create knowledge hubs where we can apply all the new technologies from big data to AI driven learning.
We need to leverage this baseline of investment and very much need to bring in linkages between different research communities. That way we can meet grand challenges, such as coupling oceanographic and atmospheric research and we can bring medical, social and biological research to other communities by extending the trusted environment they’re currently working in.
It is the art of the possible. I believe this infrastructure will allow us to lay down a more coherent capability across the research landscape. I genuinely think that it will allow us to continue to compete globally, preventing us from falling behind.
Find out more about how Jisc will support research and innovation.