Precision Neuroscience Reimagined: Privacy by Design

In this episode of Precision Neuroscience Reimagined, Tina is joined by Simon Pillinger, Head of Information Governance, Ethics and Patient and Public Involvement at Akrivia Health. Together, they revisit the topic of Information Governance, diving deeper into what’s happening in the world right now, best practices and anonymisation. Simon offers invaluable guidance for navigating the complex terrain of data governance in healthcare.

Tina Marshall: Hello, my name’s Tina Marshall and today I’ve got Simon Pillinger with me, the head of Information Governance, Ethics and PPI at Akrivia Health. Information governance was one of our most popular episodes actually for this podcast. So we wanted to revisit and go deeper into information governance, particularly what’s happening in the world right now, best practices and anonymisation. Anonymisation is something that comes up and is critical for data organisations like Akrivia Health.

Let’s go straight into anonymisation because there are differences aren’t there? In anonymisation de-identification, what patient data can be shared? Can patient data be shared? Would you be able to break that down for me, and for everybody listening, so we can understand the differences and the nuances between them?

Simon Pillinger: Last time we talked about those differences in a bit of detail, but essentially the law differentiates between personal data and non-personal data. It’s a binary thing to begin with. And personal data is essentially data that relates directly or indirectly to a living individual. Data isn’t owned by an individual. Data protection law is very agnostic. Thoughts of data ownership aren’t relevant. There are some good reasons why that’s the case, but it is all about being able to infringe on someone’s right to privacy lawfully, which is why to process data we have to ensure that it’s lawful, it’s fair, it’s transparent, and we meet all the data protection principles. So that’s the first part of processing personal data.

What are the data protection principles?

Simon Pillinger: So the data protection principles are enshrined in Article Six of the UK GDPR and the EU GDPR, the UK GDPR is the primary bit of legislation in the UK and they are, that data has to be processed fairly lawfully and transparently. So that ensures you’ve got a legal basis for processing.

What’s classed as fairly?

Simon Pillinger: That is a really good question. A lot of the GDPR is really, really straightforward. Fairness, I think is probably one of the most difficult bits to interpret, but by and large, there is understood to be a disproportionate power relationship between an individual and an organisation. Organisations have more resources than individuals generally, and if you think about the relationship between me and Google, that probably has a lot of data relating to me, I am way, way outpowered by Google. So the fairness part is about ensuring that information is processed in a way that I would reasonably expect. Again, that reasonableness is another really kind of hotly debated piece. But to give a good example in terms of bringing this back to patients, should patients expect their data to be used in research for example? And my answer would probably be yes. Again, this one is probably a bit contested, but part of the NHS’s role in the UK is to support research that’s enshrined in the NHS constitution.

It’s enshrined in law, I think it’s section 1. E of the NHS Act 2006. And so although the NHS has duties to care for people, to things around public health and treating patients both as a composite body that is the UK populace and preventative medicine as well as you go to your GP, you’ve got a cough for six weeks. The NHS has a duty to support research. And so we talked about reasons for processing data on a lawful basis. Their lawful basis is that they’ve got duties that are enshrined in law and they are processing that data to comply with those duties. So when you see a GP, the lawful basis they have for processing your data for your healthcare is that they’ve got that duty laid out in law and they’re just complying with that.

It’s in the public interest and law, but they’ve also got an imperative there in terms of helping research. Now it can be difficult to confuse what a reasonable person might, doesn’t necessarily mean what the average person on the street might mean. And that’s a confusing thing.

In law, we have this principle of the Man on the Clapham Omnibus the reasonable man of English law. This goes back to a Roman legal fiction called the bonus pater families, the good family father, the good bloke who is righteous in his community, he’s upfront, he is a good member of society, what would they reasonably expect? That doesn’t necessarily mean the average person on the streets. Someone who has a fair amount of information, would go and look at the law, they go and look at the NHS constitution that isn’t necessarily reflected for society and that’s where the contention comes in because what the reasonable man of English law might expect is not necessarily the same as if we were to stand on Oxford High Street or Oxford Street in the middle of London and ask people what they reasonably thought, those might not be the same thing.

And that’s why coming back to fairness, it’s a tricky contentious issue. And what we see in case law is this is one of the things that is quite unpredictable. Fundamentally, it comes down to practitioners like me being able to assess the pros and cons and document that part of that fairness is also informing people about how their data is used. So there are duties within the law that have been for in terms of data protection laws since the beginning to inform people about how their data is used if you are a controller. For example, your GP has a duty to inform you about how that data is being used, the purposes, the lawful basis and quite a lot of information where they are using subprocessors or processors to process that information on their behalf. If there is an organisation collecting this data from someone else, or if you are collecting it from an individual yourself, you have a duty to inform them at that point. If a third party is giving you that data, then you are still obligated to inform them unless it would require disproportionate effort. And disproportionate effort means you might have a load of pseudonymised data, which you haven’t got the contact details and it would require disproportionate effort to re-identify them, to inform them.

Tina Marshall: So I think that’s a really good explanation and it leads me to want to open up to really break those elements down.

So for me as a patient, Tina Marshall, I go to my GP, they say, we want to use your data for research. And I say, oh my goodness, I don’t want a big pharma company or I don’t want the government to know that Tina Marshall has or is doing X Y, Z. How likely is that to happen?

Does my personal identifiable information go over? And where is the difference there between the anonymised and the de-identified data?

Simon Pillinger: Let’s break that down to a couple of things. There are certain obligations laid out in law and there are certain national policies that require NHS organisations to send reports to NHS England and the Department of Health and Social Care. And those are often for quite longstanding purposes like national cancer registries and the ability to track those sorts of information. If you remember during the pandemic, there were particular duties to report on cases. Often these are for completely reasonable purposes, they’re to understand public health, they’re to understand the progression of diseases. Think about diseases, one hundred years ago when there were a lot of infectious diseases and now we’re fighting a lot of degenerative diseases. Just the fact that we are getting older as a population and that causes differences and is sometimes more costly. so for the government to be able to plan for NHS England to be able to plan that they need data now they will pseudonymised that wherever they can pseudonymised means. So pseudonymised has two potential meanings. Pseudonymisation in more general meaning, just means to replace a direct identifier with an artificial identifier. So for example, if I were to go into a sexual health clinic, I might use the pseudonym, Ricky Gervais, because I don’t want someone to call out my name in a sexual health clinic. That would be awkward for several reasons.

However, the specific definition in data protection law is that it refers to data which has been processed and has the majority, if not all, of the direct identifiers removed. So previously in the UK law, we call it de-identification. In the Data Protection Act, that’s how it is referred to, but it’s made very clear in the explanatory notes that those are the same thing. It’s an offence within the data protection now to try and re-identify pseudonymised data without the permission of the controller. There’s still personal data but it’s much harder to identify.

Now we talk about that kind of binary between personal data and anonymous data. And this is where it gets fun because it’s not binary, it’s not black and white. It is many shades of grey in between. It’s a spectrum of identifiability and likelihood. So if I were to say there is a male, he’s in his thirties, he lives in North Oxfordshire, he has children, he’s born in the early nineties, in the first half of the year and he’s married. It describes me, but it also describes a whole lot of other people. If I say there’s a guy, his first name is Simon, he lives in this postcode. He’s got three children, he’s got three boys, he’s heterosexual, he’s male. Then you are starting to provide far more identifiable information. And when you are looking at the kind of information we have like longitudinal records, it gets hard. So one part of it is just what information is there that you can identify.

But when it comes to anonymisation, anonymisation is not really a defined term in data protection because data protection law is only really interested in personal data.

It’s my favourite interview question: how do you work out if it’s anonymous data while you work out if it’s personal data?

And so actually if you can work out whether it’s personal data, and every data protection practitioner should be really good at this, you can work out whether it’s anonymised data. Essentially, you are looking at the reasonable means likely to be used to identify that data. So there’s a recital 26 in the GDPR. You have the articles which are the parts of the law that tell you what you have to do to meet the legal conditions for processing to be lawful. And you’ve got the recitals, the recitals are what the legislators were thinking and intending. So if you are ever unsure how to interpret the law, I think there are over a hundred recitals that say “This is what we are thinking”. In recital 26, which determines how you work out if it’s personal data or not, it goes through it; have a look at what the means reasonably like to be used. Are you going to single out information? Are you going to combine it with other information, which is called mosaic re-identification, the idea that you can take a little test or AI to create an entire image? And how much is that going to cost knowledge are you going to need? How much time are you going to need to spend to do that to understand what the reasonable likelihood of identification is?

Lots of people have tried to do this statistically. And my problem with that is that if you were to do a Venn diagram of data protection practitioners and people who know the law really well and another circle of people who are statisticians and can do statistics, there’s probably very, very little overlap between those two circles. So we take a risk-based approach because the language of the law and the law is the stick by which we measure, is a measure of likelihood and risk. And by using a risk-based approach, we can be much more scalable in our approach. We can apply those models relatively quickly. It applies integrity to apply them fairly. And the argument against this is that it is very easy to fudge. I think if you’re doing it correctly, actually it’s not. You say well statistics is much more reliable, but harder to implement. And what the statistics don’t necessarily allow you to do is weigh certain factors. So for example, if you publish data, if you were to Google anonymised re-identified, you’ll come up with a load of articles from conferences like Defcon from researchers whose job this is. And the characteristics of these articles in terms of re-identifying anonymised data is that the data is almost always public and therefore you’ve got one potential barrier to prevent re-identification removed and that individuals do them with the time and resources who are dedicated to trying to prove that this concept doesn’t exist. And what you’ll also find is the proportion of data within that dataset that they’ve linked to a living individual is fairly small. So you get the idea that anonymisation isn’t an unreal concept. It’s not an impossible concept and anonymisation doesn’t mean that you’ve got to eliminate the possibility of an individual being identified, but you’ve got to take it to the point where it’s so difficult, but we never say never when it comes to probability. The advantage of using anonymised data is that it falls outside the scope of data protection law, but you have to continually assess, and look at what the technologies are around you to be able to, that might actually mean that your data isn’t as anonymous as you first appraised it to be.

How can that data be protected? How am I supposed to trust an organisation that says, I’m only going to use the anonymised data but then they’re sent my personal data or are they sent my personal data?

Simon Pillinger: Personal data is also going to be a little bit relative as well. It is based on how easily can a person identify someone from that information. This gets a little bit confusing because courts tend to kind of go around in circles on this. Recently, we had a ruling from the CJU, the Court of Justice European Union, which although in the UK we’re no longer part of the EU, the laws are substantially similar that you can apply, the laws are substantially similar that you can compare the principles that they’ve come out with. You can look at judgments from the Court of Justice EU and you can say that’s just as applicable in our context. This was what happened with SRB versus EDPS (if anyone wants to look it up). And the judgement in that was really interesting. So it was a controller that shared a bit of data with a processor and essentially they said it was identifiable to the controller but not to the processor. All it was was a couple of opinions and they were doing some kind of categorisation work. And so there is a point that identifiability is relative to the activity and the beholder. What we do is I think the way that that sort of assessment should be done is to assess how anonymous a piece of data or a data set is to your organisation.

And that includes looking at whatever data you have as a controller or process as a processor. It even involves looking at who you’ve got working in your organisation. So if you’re an organisation like Akrivia Health, with loads of really clever scientists who’ve got statistics at their beck and call, then does that increase your risk? Probably. But then you’ve got to have a look at what their motives are. You’ve got to have a look at their likelihood, you’ve got to have a look at all the access controls in place. Then if you are making this available for third parties or customers, how are you controlling that access? Are you providing aggregate levels of information? This would be the best possible outcome because you can apply very solid statistical controls. So things like K-anonymity= five is the kind of general rule and consensus as being the gold standard, or one of them. Again, this gets really contentious, in this field. And then look at what the purposes are, and how you are curating that data set. You can still apply a lot of the data protection principles, so you use as little data as necessary to achieve your purpose. So although it’s not within the scope of data protection law anymore, it still behoves organisations to treat anonymisation with the same principles that are applied because they’re good principles to use.

Tina Marshall: Thank you for that. Going into information governance in the tech world and with AI, the buzzword AI, which I always find really interesting because I see it as another sector.

For AI, what are the challenges or the watch-outs that people should be mindful of?

Simon Pillinger: So I’m a Harry Potter fan, so I always remember the phrase from the second book, which is when one of the characters is enchanted by an enchanted book and her dad says, after everything’s gone down: “Never trust anything that can think for itself if you can’t see where it keeps its brain?” And I think one of the really big pitfalls of some AI uses is that we don’t necessarily know how they work. If you are doing your assessment of a product that’s using AI, it’s difficult to understand what data goes in and what data comes out. Now that could be, and this comes back to that there are different uses of AI. So you’ve got some things like what we do at Akrivia Health, such as natural language processing, which is essentially being able to use AI to structure information. It speeds up what would otherwise be a human task.

It’s replicating judgement to a certain extent, but it’s only on words and it’s we can do validation things on that. Where AI gets a little bit more, I’m going to say scary, is where you are looking at replicating human decisions that have a legal effect on somebody. So for example, if we start getting into health tech, do we go down the route of AI-assisted decision-making? There’s a really good book on this, which is Cathy McNeil’s Weapons of Math Destruction. She was working in investment banking during the crash in ’08- ’09 and her take on some of the decision-making things that are going on.

GDPR has not to be subjected to automated decision-making where it has a legal effect on that person. For example, there are tools out there that will scan CVs and give people a rating, which might mean they get a job or don’t get a job.

Tina Marshall: That is worrying. Some people might just be bad at writing a CV.

Simon Pillinger: Exactly. I mean there’s a whole other question that this brings, which is actually how good are the humans at doing these tasks? And I think this is the really interesting thing is it gives us pause to question, yeah, computers might not do these tasks very well, but do we prefer humans not to do those tasks very well as well? And there are some really good studies about the UK judiciary about judges passing harsher or more lenient sentences based on the time of day and how close it is to lunchtime.

Tina Marshall: So based on their own emotions, and their own prejudices, again, I guess even though they shouldn’t take it into account, they probably have unconscious bias.

Simon Pillinger: Absolutely, and this is a profession in which they are there to judge and are probably still some of the most impartial people. Anyone who’s read enough summaries of judgments probably knows that they are far from whatever that Daily Mail headline was “The enemies of the people”, they’re far from that. But the point remains as before, I think people are generally more comfortable having a person than a machine because the person is human and has emotions.

Tina Marshall: Humans have experience.

Simon Pillinger: Absolutely. And that’s for better and for ill as well. I think we are seeing lots of progression in AI regulation, I think that is the way to go. I think the other thing this comes back to is a data protection principle, which is to understand your purpose and it’s making sure that you process the data that’s necessary for that purpose. And I think what’s really tempting with AI is to think there’s a great AI tool. And there was this horrible horror of the story that came out of the US where I think a policeman had taken DNA sample, run it from analysis, got the alleles for certain phenotypic characteristics, like eye colour, generated an artist’s impression from that genetic information and put it for a facial recognition scanner. So you can see several layers of potential failure that could lead to harm to an individual and potentially a miscarriage of justice. And all of that is because you haven’t got the processes in place to understand what are they trying to achieve. Is this the way to do it? Are there potential harms to innocent people? And when we get that wrong, it damages our confidence as a society with the whole system.

Tina Marshall: To somebody new coming into the profession, or even if they’re not new, they’re moving into a new sector:

What would you say would be your top five key points to develop best practices for information governance in this changing sector with evolving technology?

Simon Pillinger: Network. There are some absolutely brilliant people pushing out commentary and content on forums like LinkedIn. People like Robert Bateman are one of the people that comes to mind. He’s done a lot in the development of US data protection law. Daragh O’Brien and Carey Lening from Castlebridge over in Ireland produced some really good books on digital ethics, and Daragh did a fantastic piece on ChatGPT when it came out. He had searched for books on digital ethics and was slightly annoyed that his own book hadn’t come back, but also seven out of the other 10 books didn’t exist or he couldn’t find any record of them. There are more than I can probably really name but we can probably do some shout-outs. There’s a great convening of people. There’s an event I go to twice a year called Privacy Space, which brings together data protection practitioners, both veterans of the field but also kind newcomers if you’re in a health tech space. There are people like Barry Moult, Andrew Harvey, Richard Newell, and many other people who are really good to know in that sector. People who’ve got a lot of experience, way more experience in time than I’ve spent in it who can provide sector-specific advice.

Tina Marshall: You seem quite experienced, Simon, you know what you’re talking about.

Simon Pillinger: I can talk. It’s building those networks, and getting to know people and quite often data protection specialists can find themselves isolated in an organisation in terms of expertise and knowledge. It’s really important and crucial to have people that you can go and sense check things because if you are the sole IG person or, if you’re the sole specialist in any organisation, you can have yourself thinking, I know this but I just need to check. Actually, the advice you give is often gospel. That’s the way that specialists work. It is followed and you want to make sure you give the right guidance, particularly if that guidance is not going to help the business. That’s the reality of regulatory affairs.

The other key bit of guidance I would give is to read one bit of case law a week or one ICO decision a week. It will help you build up your knowledge of the application of the law. And the other thing is you don’t read law, you reread it and you reread it. You reread it until you probably do not know which article you are thinking of. Still, you probably aren’t going to remember every single clause and every single article and every single recital. But you’ll have an inkling of where you’ve got to go back to check. I don’t practise law in the kind of professional sense, but being a practitioner in a field around law means that you almost certainly will have to go back and double check the legislation because there is a lot out there. There’s case law being generated at an ever greater pace.

Tina Marshall: It is constantly evolving now. It’s not like a fixed door. It’s constantly evolving with new legislation changing all the time.

Simon Pillinger: And I’ve talked to people like Barry Moult who’s been around in this business since the Data Protection Act 1984 came out and I said, I’ve been in this game for the best part of 10 years. It feels like it’s moving faster than it ever has. I said, is that just me not being as long in the game? He said, no, it is. So if you think you had the Date of Protection Act 1984, then you had 1998 and you had the FOI Acts 2002 I think it is, or 2000. It came into action in 2002. You’ve had the GDPR published in 2016 come into force in 2018. You’ve got the Data Protection Act, which I think the Royal Ascent was on the 24th of May, 2018 just before the GDPR came into force. And then we’ve had goodness knows how many amendments to the Data Protection Act following leaving the European Union.

Everything has become so much more complex. The rate of change has increased and that’s partly I think a response to the rate of technology is increasing. I feel like we are probably going for a rate of change now, which is not dissimilar to the rate of change it was going through in the industrial revolution. We are seeing ways similar to the ways that manufacturing was revolutionised at that time. We are seeing information manufacturing and use usage going through whole new shades of acceleration. It has never been easier to profile people and gather information about them. And I think the regulation is keeping up with that or at least trying to. I think some of the principles though, from Data Protection Act 1998, actually a lot of those principles have carried over into the GDPR. What we are seeing is just the rate of how, and those principles in law are simple. The application of life, into the world, is complex. So keeping on top of those, and that comes back to reading a bit of case law each week, reading an ICO decision. And even if you are a practitioner not working in the public sector, read Freedom of Information requests and the decisions from that because a lot of data protection cases.

Where do you find those then? Freedom of information requests

Simon Pillinger: ICO website. So the ICO website publishes their decision notices. They publish enforcement notices, which are when they’ve taken enforcement decision notices where they’ve had a particular complaint or a freedom of information request hasn’t been responded to. The reason I’d say about reading FOI is that FOI requests have created quite a bit of case law that relates to data protection because people will ask for this information and you can’t ask for personal data under FOI. And so it generates lovely bits of case law and because FOI requests are frequently made, it generates a high proportion. So my guidance would be that even if you’re not in the public sector, go through them because there are some gems of knowledge in there. The other person probably to follow is Tim Turner and his page DPO Daily. He is a fantastic textual analyst of law and comes out with some really good nuggets of how to apply. He’s also just a fantastic communicator.

Tina Marshall: Learn on lots of different levels.

Do you have any last thoughts to share?

Simon Pillinger: I think if you are a data protection practitioner or A DPO and you are selling into another organisation, often people in NHS organisations, governance people have got a lot coming across their desk. The best way to stand out is to make their life as easy as possible. So if you are part of Health Tech, developing a product, get data protection involved as soon as possible as part of the privacy by designing the vault. If you are actually, it’s probably the bit of guidance not for DPOs, they know what they’re doing. It’s for founders in Health tech. If you’re building a product, get the DPO involved, and get data protection involved. Privacy by design as default, it’s going to cause you fewer problems downstream. If you’ve got this data protection, the worst possible way is, well, it’s the cost of doing business. But the best possible way, if you are in health tech, the likelihood is that you want to help patients and you help patients by making it more,

Tina Marshall: Getting the right thing out to them sooner.

Simon Pillinger: Absolutely. And if you go to a DPO, an NHS organisation and you’ve not done those things, they’ll send you back. If you send them something which makes their job as easy as possible, you will stand out because the bar is not high. So get them involved as soon as possible, and take it as an opportunity to save costs in the long run. But also data protection practitioners are often really good business analysts. They have to look at the whole, so you really start to understand what your product is for and what you need to power it. So if you’re looking at a journey across a desert, you want to know how much fuel and water you’ve got to take with you. If you’re building a product, you’ve got to work out where you’ve got to get to. You’ve got to work out what data is going to fuel that in the future. Data protection by design.

Tina Marshall: Thank you very much. Data protection by design, keep it in mind. Thank you very much to Simon Pillinger and do follow Simon on LinkedIn. He didn’t give himself a shout-out, but I will. He’s got some really good content, slightly quirky and crazy content, but it always makes information governance, really easy to understand people from my perspective.

For this episode of Precision Neuroscience Reimagined, and more: https://spotifyanchor-web.app.link/e/KJr7swiQaIb OR you can find the full episode on our YouTube channel: https://youtu.be/xp_iiTR8-V0