Member story
EMBL-EBI South Building
Creative Commons attribution information
EMBL-EBI South Building
©Jeff Downling via EMBL-EBI

Powering discoveries that benefit humankind

Understanding the causes of disease is crucial to identifying potential treatments and, ultimately, finding cures.

This is a customer story.

A collaboration powered by the Janet Network, between EMBL’s European Bioinformatics Institute (EMBL-EBI) and UK Biobank, is enabling approved researchers worldwide to access genetic and health data to gain new insights into human health and disease.

A world leader in bioinformatics, EMBL-EBI makes public biological data freely available to the global scientific community. This enables scientists and researchers around the world to realise the potential of big data in biology, to make discoveries that benefit humankind.

Treasure trove of data

EMBL-EBI and UK Biobank first started working together in 2017, when UK Biobank shared their first release of genetic data via the European Genome-phenome Archive (EGA) - a joint resource developed by EMBL-EBI and the Centre for Genomic Regulation (CRG) in Barcelona, Spain.

UK Biobank, a large-scale biomedical database and research resource, is a huge enabler of scientific discoveries that improve human health. 500,000 people from across the UK, between the ages of 40 and 69, participated in the project between 2006 and 2010, undergoing extensive measurements and genotyping. They provided biological samples for future analysis, including whole genome sequencing.

Since the work began, the EGA has ingested over 8 petabytes (PB) of data from UK Biobank with another 7 PB planned for 2021/22 (one petabyte is equal to 3.4 years of continuous full HD video recording – so that’s a lot of data!)

The UK Biobank database is hugely valuable, and a major contributor to the advancement of modern medicine and treatment all over the world. Researchers are highly dependent on these large-scale biological data sets to transform future research.

Mallory Freeburg

"The UK Biobank resource enables a wide variety of biomedical research areas,”

explains Dr Mallory Freeberg, project lead at the EGA.

“For example, the effects of lifestyle factors like diet and physical activity on health, how genetics contributes to responses to population-scale health concerns like the coronavirus pandemic, and development of early detection methods for common diseases among the UK population.”

Infrastructure is the backbone

Steven Newhouse

“Jisc and EMBL-EBI have a long-standing relationship,”

explains Dr Steven Newhouse, EMBL-EBI’s head of technical services.

“In addition to providing low-latency high-bandwidth (40Gbs) access to our data from across the UK and a worldwide audience through their network partners, Jisc is also able to provide the internal connectivity (100Gbs) between our data centres on campus near Cambridge, our leased data centre space in Harlow, and Jisc’s shared data centre space located in Slough.”

In 2019 we worked with EMBL-EBI to make substantial infrastructure changes to support their world-wide role of providing open data to the life-science community. Supported by £43M of infrastructure funding from UKRI, in addition to operating costs coming from EMBL’s member states, EMBL-EBI needed to upgrade its infrastructure to meet the increasing demand for open-access data resources. As part of this investment, their Janet Network connection was upgraded and a scalable data-transfer network between their data centres was created, built on top of Jisc’s existing national backbone.

Having robust infrastructure in place is enabling EMBL-EBI and UK Biobank to continue collaborating. In addition to providing long-term storage of UK Biobank data, EMBL-EBI is currently transferring this data on behalf of UK Biobank from the EGA to UK Biobank’s cloud-based analysis platform hosted by DNAnexus. For EMBL-EBI, it is among the first data transfers at this scale to a cloud-based service, something that will become increasingly routine, and that has contributed to a doubling in downloads over the past year.

Steven says,

“Our data transfer volume is always increasing as the size of the data that we host increases. In addition to this background load, we were able to transfer 4 petabytes of data for UK Biobank from EMBL-EBI to DNAnexus, with the remaining data set to be transferred in 2021 - Jisc are critical to us completing this by providing an infrastructure that can flex with our increased load.”

Supporting millions of researchers worldwide

Understanding human disease is made possible thanks to organisations like EMBL-EBI and population-scale data resources like UK Biobank.

"UK Biobank’s data is the first at this scale to be hosted by EMBL-EBI, but it won’t be the last,”

says Mallory.

“Increasingly, more large-scale initiatives around the world are underway, and through its collaboration with Jisc, EMBL-EBI is well positioned to support sharing of these data.”

The more data we can share, the more research can take place.

Steven says:

"The hosting of UK Biobank data still has several more years to run. There will be future UK Biobank data releases that will provide a network load in addition to our own growth,”

says Steven. Reliable and capable infrastructure is key to this, as it is powering the sharing of data more quickly and efficiently to the wider research community.

“In recent years we’ve seen an explosion in data and uptake in scientific activity, especially with the COVID-19 pandemic. The Janet networking capability and shared data centre space we access from Jisc gives us the flexibility to increase capacity and meet demand.”

Since increasing its networking capabilities in 2019, EMBL-EBI has seen a large increase in web requests. These requests can be anything from a query to the downloading of an entire dataset. In 2019 the institute had 62 million daily requests and in 2020 that went up to 81 million.

As EMBL-EBI's work supports millions of researchers worldwide, the increasing demand highlights the importance of this data being shared and a capable infrastructure being in place.

This story is featured as part of our annual review 2020-21. Read the other stories.