Speaking at Jisc’s forthcoming Networkshop 47 conference, Kieron O’Hara warns that even anonymised data can reveal sensitive personal information. We must ensure data is both safe and useful, he says.
Anonymisation is controversial. Even if a dataset is nicely anonymised, if someone comes along with extra information, they may well be able work out who is who.
For example, you may have an anonymised medical dataset, but you know that the prime minister has been in hospital over a certain period of time for some mysterious ailment. You just need to look out for a woman of Theresa May’s age and add in the common knowledge that she has diabetes. If diabetes is mentioned in the dataset you can, with a reasonable degree of probability, identify the prime minister's visits.
The data environment
Anonymisation is an ongoing process, because as more data gets published - on the web, for example - it may become easier to crack a dataset. The anonymisation that was perfectly safe two years ago may not be adequate now.
The environment data sits in is crucial too. Who's got access to it? What other datasets might be relevant? What are the security and governance arrangements? Anonymisation is about manipulating these aspects, as well as taking out obvious identifiers such as addresses and names. The problem nowadays is that there is so much information around that almost anything could potentially be an identifier.
With this in mind, you've got to be very, very careful as to how you manage personal data, how you anonymise it, and what context it sits in. We call this environment-sensitive method functional anonymisation.
So I might say, for example, that you can see some data, but you have to come my offices and analyse it on a standalone computer without access to the internet. Or I might not let you see the data but say, if you give me some queries, I'll send the answers back. Our approach at UKAN isn’t about making data 100% safe, because that's just not possible. What we’re trying to do is reduce the risk of anything going wrong.
Assessing the risks
The trouble with GDPR is it tends to produce a box-ticking mentality. The focus is on compliance, not on reducing risk. The easy solution is to simply decide not to share your data with anybody. That's totally safe – or, at least, as safe as your security systems are. But then, there's a lot of value in the data that's not being exploited.
So how do we balance utility against risk? At UKAN, we look at what we want to achieve and do with the data. Then we ask questions, such as what's the minimal amount of data I need? You start to tailor your requests and think about the risks.
Suppose I share data and an intruder manage to get a copy. What could other people do with it? What would that cost them? Would it take an immense amount of processing power, or could they do it on a simple laptop? We’re thinking hard about the data and the potential impact of a hack.
Protecting people, not 'data'
Ethical and responsible data stewardship is about taking the risks seriously - not only to yourself, your company and the liabilities that your company might find itself with, but also to the people who are represented in the dataset themselves. They’re significant beings, not simply data points. Then there’s your ethical responsibilities to wider society. If data can be used for social good, look for opportunities to share it in a way that will provide social gains.
I would like those who are worried about GDPR to think more positively about the possibilities of sharing data. Meanwhile, those people who have been taking a lot of risks with their data might want to think rather more seriously about it.
I want to communicate the sense of an ethical framework that applies not only to data, but also to the people that the data is about.
This blog is based on an interview from the Networkshop magazine 2019. Delegates can hear Kieron's presentation on data anonymisation at 15:30 on day one of Networkshop, on Tuesday 9 April, in Lecture theatre 2.