Inform feature
Health information illustrations
Creative Commons attribution information
©filo via iStock
All rights reserved

In health informatics we trust? Thoughts on big data in healthcare research

The burgeoning use of big data in healthcare research is revolutionising the way health records are collected, used and shared. But, with controversial projects such as NHS still fresh in everyone’s minds, how can researchers reassure the public that highly sensitive data is safe in their hands?

Remember the NHS controversy from a couple of years ago? Of course – it was a complete public relations disaster for both the health service and health data research. 

Eventually fully scrapped in July this year, wasting millions of pounds, it caused a public outcry over privacy and data sharing, amid accusations that the creation of a vast database of medical records was being rushed through without explaining to patients of the security implications for their highly sensitive information. Leaflets posted through letterboxes failed to include any information on the risks of sharing data (and in many cases apparently didn’t arrive at all) while it also emerged that many patients who had opted-out were still having their data shared.

NHS is one example, albeit a calamitous one, of the rise of health informatics - a rapidly expanding field seeking to exploit the potential of big data analytics for healthcare - and some of the red-hot issues around this new industry.

Health informatics is revolutionising the way that healthcare information is collected, managed, analysed and shared, and the research and clinical potential is unprecedented.

Healthcare revolution

"We are seeing the convergence of traditional health records, laboratory test results, imaging and genomics data, all being used to answer more and more complex questions regarding the determinants of health and effectiveness of treatments" says Jonathan Monk, director of IT at the University of Dundee. "When we combine this with advances in analytics, such as natural language processing, we will hopefully start to see changes in the way patients are treated, which should ultimately lead to better outcomes for everyone."

Infinity shared data centre
Creative Commons attribution information
Infinity shared data centre
©Infinity SDC
All rights reserved
How do we keep data safe?

The implications stretch far beyond the integration of electronic health records. From allowing patients to access their own medical records online, to analysing enormous datasets to identify trends in disease and treatments, to ascertaining gaps in healthcare provision and directing more efficient allocation of resources, health informatics is a burgeoning industry.

A 2014 Burning Glass report projected that demand for health informatics workers would increase 22% by 2018 – more than twice the rate of growth for all industries, translating to more than 40,000 new jobs. Once primarily an administrative field, health informatics is now struggling to recruit sufficient numbers of qualified coders, software developers and e-infrastructure specialists.

As with all revolutions, however, it is controversial. "More and more data is becoming available to for research, but the sheer scale of it creates new problems ensuring privacy and confidentiality," says Monk. NHS stalled on the issue of informed consent. "Much of the rich patient data available within the NHS was not collected specifically for research purposes," he explains, "so patients have not given consent for researchers to use it".

Building trust

A governance committee review of the NHS’s failings stressed the need to build trust with the public. Nathan Lea, senior research associate at the UCL Institute of Health Informatics, argues that the NHS remit was "too broad to be able to get any kind of consent".

He reflects that "the overriding message is that people have a right to opt out and we should respect that". The research community needs to encourage open, informed discussion, while accepting that "trust is not something you build, trust is something you may or not get – it’s something beyond your control".

For Lea, the issue of trust encapsulates a wider problem with public engagement and articulating the benefits of data collection in healthcare. "On one hand," he explains, "people are bleeding personal information on Twitter, Facebook, social media, collecting data and putting it through apps, often to their detriment, yet at the same time they are concerned that medical records are being used without their knowledge."

Instead of trying to persuade people, he suggests, a more productive line of inquiry would be to investigate that dichotomy – why are people apparently more trusting in one environment than another?

Security and risk

Lea works with the Farr Institute, a UK-wide research collaboration involving 21 academic institutions and health partners in England, Scotland and Wales working to develop governance frameworks to underpin the safe and trusted use of patient data, and answer the sector’s most urgent question: given the particular requirements of health informatics in terms of data management, privacy and data protection, what constitutes good governance for health research?

Farr Institute: #datasaveslives

A campaign designed to highlight the positive impact of health informatics research on public health

For Lea, while technical elements are important, security is also more fundamentally about how you assess and understand risk, and use it to develop procedure, ranging from encryption to NDAs1.

"Security will make it more likely for people to trust you," Lea says, "but ultimately you cannot guarantee security, there’s always some uncertainty. My position has always been that if you’re clear with people that there is always some risk – “we think it’s unlikely but always possible” – that’s a good place to start an open, honest conversation that you deal with together."

People don’t need to be technically savvy, he explains, but they do need to be able to articulate and understand risk.

Collaborative approach

It seems significant, then, that at Dundee, security and research ethics is not solely an issue for the university’s Health Informatics Centre (HIC) but requires a collaborative, interdisciplinary approach.

"We are seeing a lot of interest from other disciplines that work with confidential, identifiable data such as social sciences and computing", says Monk, which means the university needed to develop tools enabling researchers to collaborate securely both internally and externally. As a result, Dundee has clear published guidance that helps research staff use the most appropriate method for storing and managing data, from the enterprise file sync and share service Box, through to the large IBM Spectrum Scale Research Data Stores and the highly secure environment within the HIC.

Monk explains that Dundee has several key approaches to improve secure storage and access to files. "Firstly, the data is anonymised for each research question. Secondly, to mitigate the risk of data leakage, the researcher does not get given the data; rather, they are given access to a virtual desktop from which they can run analysis. We have a lot of automated pre-processing and validation that ensures consistency in data definitions, usage and reduces the manual effort needed to get data to the researcher."

"Finally," he concludes, "we have built a detailed set of documented processes and standards, all managed through Box, that have enabled us to be certified as compliant to the ISO27001:2013 standard and to be considered an accredited Safe Haven by NHS Scotland."

Monk’s advice for other institutions would be to understand that it is highly likely that they will need a palette of tools and technology to meet diverse learning and research needs, one combining technical security measures such as anonymity and encryption with stringent governance procedures in order to ensure research is conducted with access to data only where appropriate and with the approval of public benefit and privacy panels.

Monk maintains that, as a result, "we are seeing a sector-wide improvement in the management of data and information due to both legislative pressure and more widely available best practice."

The safe share project

A similar plurality of approach characterises a recent secure data access pilot called the safe share project. Led by Jisc, safe share creates encrypted and assured network connections between parts of infrastructure and safe places where research can happen, with intrusion detection and security monitoring at network level, provided by a impartial not-for-profit service.

At Swansea University Medical School, safe share assists projects for the Administrative Data Research Centre as well as healthcare research. At its most basic it effectively provides "a managed service, where Jisc put a firewall here and then encrypt traffic between here and wherever the data is trying to go, where there will be another Jisc-owned firewall, which can decrypt the data there," explains Simon Thompson, systems analyst at the school.

Safe share currently connects universities at Swansea and Cardiff and, will soon be connecting Leeds, Manchester, Southampton and Edinburgh and also ensures compatibility, becoming a de facto standard for interconnecting securely. Its potential resides in "making things easier on a per-project basis - this project is allowed to talk to this institution on this dataset – and we can make data requests on a more impromptu basis, because we know all the endpoints are secure," says Thompson.

While he fears that "you wouldn't be able to create such a large dataset now compared to when the repository was first created ten years ago, there's too much nervousness," he insists "you do need this size to avoid bias in the research; you can look at subgroups of people but that only makes sense in comparison to everyone else. You need everyone's data to get real value." And for that, research institutions need public trust.

Trusting dialogue

Nathan Lea concludes that there are potentially disturbing parallels with the use of data and the case of Henrietta Lacks, or the Alder Hey organs scandal. Both provoked highly emotional public debates, but they were also defining moments in the history of health research, engendering crucial ethical re-evaluation and, ultimately, new codes of practice.

"What we must do," Lea says, "is show that we have learnt some lessons, especially in the research community. You have to foster a trusting dialogue between the public and the data science community, and you have to maintain that – trust is not something you get and keep, it is ongoing – a process."