Connecting Society: How everyday data can shape our lives

6. Your data, your rights

ADR UK (Administrative Data Research UK) Season 1 Episode 6

In a data-driven world, administrative data holds the power to tackle society’s toughest challenges - from improving healthcare and education to boosting the economy. But how do we ensure this data is used securely, ethically, and for the public good?

Featuring Nikhil Harsiani, Data Ethics Policy Advisor at the UK Statistics Authority, and Mhairi Aitken, Senior Ethics Fellow in the Public Policy Programme at The Alan Turing Institute, this final episode dives into how legal safeguards and public engagement combine to protect privacy while demonstrating trustworthiness. Discover trusted research environments, the Five Safes framework, and how involving communities makes research more transparent and impactful.

This conversation unpacks how strong protections and meaningful public dialogue are building a system where data serves as a force for public good, delivering solutions that are as fair and trustworthy as they are effective.


Wondering what administrative data is? Visit https://www.adruk.org/our-mission/administrative-data/.

If we used any terms you're not familiar with, check out ADR UK's glossary at https://www.adruk.org/learning-hub/glossary/.

For information on legal frameworks: https://www.legislation.gov.uk/ukpga/2017/30/part/5/chapter/5, https://www.adruk.org/fileadmin/uploads/adruk/Documents/The_legal_framework_for_accessing_data_April_2023.pdf, https://www.adruk.org/learning-hub/skills-and-resources-to-use-administrative-data/accessing-data-for-research/.

Learn more about ethical data use and the public good: https://www.adruk.org/our-mission/ethics-responsibility/, https://uksa.statisticsauthority.gov.uk/publication/guidelines-on-using-the-ethics-self-assessment-process/, https://uksa.statisticsauthority.gov.uk/publication/considering-public-good-in-research-and-statistics-ethics-guidance/pages/1/.

Discover approaches to public engagement: https://www.adruk.org/our-mission/working-with-the-public/, https://www.adruk.org/fileadmin/uploads/adruk/Documents/PE_reports_and_documents/ADR_UK_OSR_Public_Dialogue_final_report_October_2022.pdf, https://uksa.statisticsauthority.gov.uk/publication/considering-public-views-and-engagement-regarding-the-use-of-data-for-research-and-statistics/pages/1/.

Connecting Society is brought to you by ADR UK (Administrative Data Research UK). Find out more about ADR UK on our website, or follow us on X (formerly Twitter) and LinkedIn. This podcast builds on a pilot series known as DataPod, produced by ADR Scotland.

Shayda: Hello, and welcome to Connecting Society, a podcast about how everyday data can shape our lives. I'm Shayda Kashef, Senior Public Engagement Manager for Administrative Data Research UK, or ADR UK to me and you.
 
Mark: And I'm Mark Green, Professor of Health Geography at the University of Liverpool. We are your co-hosts and guides around the wonders of administrative data.

So Shayda, you're our resident ethics expert. Are you excited about today?

Shayda: I'm seriously nerding out about today's episode. For those who don't know, I studied philosophy and medical ethics at uni, so it might be harder than usual to shut me up today. Our episode is all about ethics, which means how we understand what is right and wrong and how people should behave. Ethics is about having principles and values that help us decide what is good, fair, and just in our actions and decisions.

Mark: So do you have a favorite ethics story? I guess bonus points if you can make it about data too.

Shayda: Well, it isn't about data, but a few years ago, I was at an event hosted by GPs on ethics and GP experiences, and I remember one of the event hosts, who was a GP, started the event by saying, "I've been a GP for 30 years, and the first time I came across ethics was two years ago." He was referring to an ethics course he had taken, but I thought it was strange how he didn't consider his daily interactions with patients as doing ethics.

As someone who studied ethics, to me, it was a bit shocking. I guess it just made me think how the way we understand ethics can be different in different contexts. So some people might understand ethics as what not to do to get sued or not to go to jail. I don't know how the GP understood ethics himself, but I see it more as how we interact with one another and how we interact with the world around us.

What about you, Mark?

Mark:
So I recently was invited to a series of focus groups around data ethics. So yes, listeners, I don't spend my time going to sexy cocktail parties all the time. I do have to go to these types of events as well. 

But anyway, the thing that really struck me from attending these focus groups wasn't the amount of time and technology that goes into data ethics, which is really considerable—from the laws in place to prevent abuse, to the technical infrastructure to keep data safe, to the forms and processes to make sure researchers use data for the public good. It wasn’t all of that, but really it was that these things are only good and useful if we can bring the public with us, and that without building that trust with the public, it’s all rather meaningless.

Shayda: Yeah, you're totally right, and your story actually is a perfect segue to introduce our episode today. 

In a world driven by good evidence, connecting administrative data sets together—for instance, connecting data on health with data on education—can serve as a really important tool for research. When used responsibly and ethically, it can drive meaningful progress across public policy, healthcare, education, and more, ultimately, hopefully, improving people's lives. 

But as we know, there are some people out there who don’t always want to use data for the public good, which can mean ethical standards are overlooked, or public trust can be compromised.

This means for people like us who want to use data for the public good, making sure that administrative data is held securely, safely, and used responsibly is completely necessary if we are going to be using data for research. As we know, administrative data is created when people interact with public services. Without people, there is no data. So it’s crucial that those of us working in this space demonstrate trustworthiness by adhering to legal frameworks and listening to and responding to the public. Today, we want to explore these issues in a bit more detail.

Mark: And to help us do that, we have two wonderful guests with us: Nikhil Harsiani, who is Data Ethics Policy Advisor at the UK Statistics Authority, which we might refer to as the acronym UKSA sometimes today, and we have Mhairi Aitken, who is Senior Ethics Fellow at the Alan Turing Institute. Welcome to the show, both of you.

Nikhil: Hi. Nice to be here.

Mhairi: Thanks so much. Good to be here.

Shayda: Please, can you tell us a little bit about who you are, where you're from, and what you do? Nikhil, let’s start with you.

Nikhil: I’m Nikhil. I am a Data Ethics Policy Advisor, as Mark said, at the United Kingdom Statistics Authority, or the UKSA. The UKSA is the governing authority for the Office for National Statistics, which a lot of viewers will know as the body that administers the census. It also encapsulates the Government Statistical Service (the GSS), which is an incredibly wide-reaching group of statisticians at work across government, and also the regulator, the Office of Statistical Regulation.

Shayda: Before I joined ADR UK, I had never heard of the UK Statistics Authority or the GSS or any of these other big acronyms. How did you end up at the UK Statistics Authority? Did you ever expect to be working somewhere like here?

Nikhil: It's a really great question. I, like you, was not very familiar with the UK Statistics Authority, with the UKSA. Actually, my first job out of university, I applied to the UK Statistics Authority as an executive assistant to the country’s national statistician, Professor Sir Ian Diamond. That was just under five years ago, and since then, I’ve been working in different roles within the authority.

Shayda: Sounds like a very important, high-pressure job straight out of uni. I think my first job out of uni was working in a clothing store. So, very impressed by that. 

Mhairi, what about you? How did you end up working in data ethics?

Mhairi: I work at The Alan Turing Institute, which is the UK’s National Institute for AI and Data Science. I’m a Senior Ethics Fellow, so my work looks very broadly at ethical and social considerations around advances in AI technologies and advances in data science. But my background is absolutely not in data science or AI. I studied sociology as my first degree, and my PhD, which is going back quite a number of years, was around renewable energy developments, particularly onshore wind farms.

But all my work has been focused on, I guess, controversies around new technologies and the importance of involving members of the public in decision-making around new technologies. My PhD looked at public involvement in policymaking and planning processes relating to renewable energy developments. That’s really my passion—the role of the public in informing ethical decision-making around innovation and technology.

I worked for about 10 years at the University of Edinburgh, particularly focusing on data-intensive health research, and again, the role of the public in informing decision-making or governance practices around data for health research. From there, my work gradually turned to focus more on AI, artificial intelligence. I worked for a couple of years at Newcastle University on a project looking at the role of AI in banking and financial services, and then from there, I started working at The Alan Turing Institute, where I’m focusing more broadly on the many different ways that AI is used across different sectors and industries and looking at ethical practices around data and AI.

Mark: Excellent. We want to get to know you both just a little bit better. So, first question: Is it "day-ta" or "dah-ta"?

Mhairi: Always "day-ta".

Nikhil: Always "day-ta". But I like "pah-sta" instead of "pa-sta".

Shayda: Pie chart or fruit pie?

Mhairi: Pie chart. I do like a pie chart.

Mark: Oh... this is the first time it’s happened.

Nikhil: Fruit pie. You can see I’ve got lots—I’m in the kitchen, though. I need to be around food. I do love a good pie chart in all its many forms, 3D, sector bits coming out of it. It’s great.

Mark: I can’t believe you’ve just brought up 3D pie charts as being good. 

One more question: What is your favorite statistic?

Mhairi: This is a tricky one, but one of my passions is horses. So, this is going to be horse-related. It’s an example of where I think horses have been unfairly treated. We measure the power of cars in horsepower, and so you think that one horsepower is really the power of a horse. But the power of a horse at top speed is actually 24 horsepower. So I feel like horses have been misrepresented. One horsepower is probably like a slowly meandering pony. So, I feel like that’s an injustice that needs to be corrected.

Mark: Oh, good. I’d always assumed that one horsepower was one horse. This is enlightening.

Nikhil: That’s a great one. I didn’t know that either about horsepower. 

One of my favourite statistics is one that I learned recently. I was recently introduced to blackjack, and I was shuffling a deck. A lot of blackjack is about learning certain patterns in the deck, people trying to work out cards and what’s coming next. I came across the statistic that, given the different permutations when you shuffle a 52-card deck, the number of permutations is 52 factorial, which is something like 1 times 10 to the power of 68—so billion, billion, billion, billion, billion. That means every time you shuffle a deck, the likelihood is that deck has never been shuffled in the history of time in that order.

Mark: Excellent. I suppose we better get into the more serious questions before the producers start sending me messages about going off-topic. 

Anyway, Nikhil, I’d like to start with you. My dad always goes on about the fact that he studied at the University of Life. He’s always concerned with how his data is used. What would you say to reassure someone like my dad that his data is used safely, securely, and for the public good?

Nikhil: Your dad has a very valid concern there. The amount of information that we give as citizens of the United Kingdom, through our health records, through our tax records, through our housing records—there is a lot of information that the government has about us.

There's two things here. There are lots of means through which researchers and government bodies, academia, and business can use that data, mostly for research.

The second thing is that there are also many ways about how data is used securely and safely. And when I say securely and safely, I'm also talking about ethically. A big part of how administrative data is used securely, and the way that that's checked, is that researchers, people that would like to use administrative data and access it for research purposes, have to go through an accreditation process that is defined by the Digital Economy Act 2017. And I'll refer to that as the DEA.

So the DEA has within it a research code of practice. And the research code of practice is a set of ways that three things have to get accredited. So if you think about you're using data, you've got a researcher who's using data, and they want to access, let's say, education records because they want to do research about, let's say, inequalities in schools across Wales. So they're accessing education records.

There are three things that the Digital Economy Act requires through an accreditation process. One is that people are accredited. The researchers themselves have to go through an accreditation process where they get assessed about their background, their training, the skills that they have to be able to handle that data with propriety.

The DEA also has an accreditation for the research project itself. It will be described and that will be assessed for its being in the public good, that it's in the interests of the public.

And accredited processes—so processes such as the way that the data is being held, the way that the data might be linked—these are processes that are done in secure environments.

All of these three things have to be accredited and assessed. Our data is important. It's a valuable asset, not just to the government, but obviously to us, and it has to be treated ethically and properly. So researchers, projects, and processes must go through this process.

Shayda: Nikhil, would you mind explaining what it means for something to be accredited?

Nikhil: It's kind of like a seal of approval, and that's done by the authorities that I work for—the UKSA. But what's really important is that under the DEA, if things change—if technologies are found in certain areas of modeling or themed analysis—the Research Accreditation Panel or the authority can say, "Okay, we need to re-review what it means to be approved." They can conduct that review. 

So it's not just, "We give you the seal for five years; off you go with a bunch of education data or schooling data." It can be reviewed at any point, especially if technologies change, circumstances change, policies change, or if certain topics become really important or politically charged, they may want to put more focus on a specific process or a specific set of data being used by researchers. It's an ongoing approval. I work closely with colleagues on the accreditation panel, and they are passionate about ensuring that trusted researchers, research environments, and research projects go through due diligence and are comprehensively reviewed over time—not just at a single point.

Shayda: That's really helpful to understand the intricate, complicated process around giving people, projects, and data the seal of approval. 

Mhairi, you have a history of exploring how to involve the public in research that impacts society. What does data ethics mean to you? And what does ethical use of data look like to you? 

We've heard what the legal definition is and how to use data responsibly in a legal way, but is that different from using it ethically? Or is using it ethically broader—does it include the legal aspect but go beyond it?

Mhairi: Clearly, these are overlapping areas, but to me, I would say that the law or regulation sets out what you must and must not do in terms of data practices—data collection, processing, storage, or usage. Those are the minimum requirements. You absolutely must comply with the laws and regulations.

But while the law dictates what you must and must not do, ethics is much more about what you should and should not do, and that's far less clear-cut. It can be very ambiguous. We will all have different ideas about what is the right thing to do in different contexts or for different purposes. Ethics is about grappling with those tricky questions of what you should and should not do, and in many instances, that might mean going well beyond legal compliance.

It might mean having different kinds of standards or expectations that go beyond what is set out legally or regulatorily. To me, what's really important in addressing these ethical questions—what we should and should not do—is to engage with a diverse set of viewpoints and perspectives. Different people will have very different ideas depending on the context, the purpose of the data, or the type of research.

We need the widest set of viewpoints and perspectives to begin answering those questions and think about what is the right thing to do in a given context. Often, ethics is conceptualised as being about avoiding risk or addressing negative impacts, and of course, a large part of it is. But it's also really about finding ways to maximise the value and benefits of innovation or research.

To me, that's what's really exciting. And we can only do that by having wide-ranging conversations with members of the public across society to identify those opportunities to maximise the value and benefits of data.

Mark: I really like that you touched on the positive side of it because I think we often focus a lot on the more negative side, which is rightful because we need to handle data very carefully and securely. Something that ADR UK has built its remit on is about research using data to support the public good.

We talked a bit about this in episode one—the public good. Can you tell us a bit about how UKSA thinks about public good? How is it defined and assessed, and what might it look like in practice? If you could give us a few examples, that would be helpful too.

Nikhil: A key point here is something Mhairi mentioned—ethical research and ethical use of administrative data serve the public good and are in the public interest. But these things change over time, and the way we define what is in the public good is by understanding what is important to the public and what the public wants.

As an example of research that serves the public good and is topical, there's ongoing research right now into how the questions in ONS-based surveys ask people about their ethnicity. This work is known as the Ethnicity Harmonised Standard.

The UK is an ever-changing country with diverse cultures and ethnicities, and the proportions of those change over time. One of the things the ONS does is establish these demographic qualities. With ethnicity, not only do the proportions change, but the way people want to identify their ethnicity also changes over time.

Some of the current research focuses on categorisation in the next census or upcoming surveys to include people who may have felt excluded before—people who may have had to tick an "Other" box and write in their response. This research meets the criteria for public good.

It provides an evidence base for decisions that will significantly benefit society and improve the quality of life for people. People want to feel included in the UK, and they should be included. Better data leads to better inclusion.

Shayda: I like what you said about how the definition of public good is ever-changing and follows what society feels is public good at the time. It reminds me of a public dialogue a colleague and I from the Office for Statistics Regulation did two years ago.

We spoke to members of the public across the UK about what they think public good is within the context of data and statistics. They also couldn't agree on one core definition. Interestingly, they understood public good, public interest, and public benefit to mean three different things, but that's a conversation for another time.

Something you mentioned earlier, which also came up in the public dialogue, was the need for safeguards around how data is stored, used, and accessed. It’s important to note that what we’re discussing here relates to data use for research specifically.

Can you tell us a bit about the safeguards around how data is protected?

Nikhil: That’s a really great question. All data is de-identified. What that means, in a nutshell, is, for instance, let’s think of our tax records. They have our names against them, of course, because we need to pay taxes, get tax refunds, etc. But those records, held by HMRC or DWP, are scrambled when they enter a secure environment known as a TRE—a trusted research environment.

Before that data even enters the environment, it is scrambled and assigned numerical codes instead of names. That is the most basic level of de-identification.

Once the data is de-identified, it is entered into a secure environment. Researchers can only access it through an accreditation or approval process. Additionally, the data itself is not allowed to leave the secure environment.

These controls are in place to help us feel confident that administrative data is being used responsibly by researchers.

Mark: That’s really detailed. Mhairi, if I could bring you in here. We have all these rigorous rules and frameworks to keep everything safe. I imagine it’s quite a lot for the public to keep track of, be aware of, or even know about.

We haven’t heard much from the public’s perspective on all this. Could you tell us a bit about what you think some of the risks are around not involving the public in this kind of research?

Mhairi: In my work, I'm really interested in idea of a social license for data practices. A social license is about, again, recognising that there’s a difference between what’s legally permissible and what’s socially acceptable—and to have a social license. So, you need to show that the ways that data is collected and the ways it’s used match up with public expectations or society’s expectations around how that data will be handled and used.

If we don’t have a social license, there may well be opposition. There may well be concern. There may well be that the ways data is going to be used will not be sustainable or supported in the long term. So, it’s very much in the interest of everybody working in this area to get this right and ensure that there is a social license—so there is public support for the ways that data is collected and used.

That means taking account of concerns that are raised, but also reflecting on priorities and interests of the public around how data should be used.

This is very much the case in other areas of innovation and technology I’ve worked in. Going back to the work I did around renewable energy and wind farms, it’s very clear that as they’re looking to develop a wind farm, for example, if they don’t meaningfully involve and engage local communities within those decision-making processes, they’re much more likely to face resistance. The development, I suppose, is not going to be acceptable to the community.

Whereas if you work with a community in those processes, there are opportunities to change what that project looks like—to change the way it’s developed—to do it in ways that better reflect community interests.

That’s a really strong parallel with approaches to data practices. There are opportunities to do things better if we take public interests and public concerns on board.

It’s really important that this isn’t just about communicating what’s happening—saying, "We have this data; we’re going to use it for this purpose. What do you think about that?" It’s important to be transparent around what’s happening, but it’s much more important to have those conversations earlier.

Maybe talking to community groups around: what matters to you? What are the areas we could do research in that would be really beneficial? What are the problems within society or that community groups are facing where maybe we could find a way to use data to address those challenges or to improve our understanding?

That kind of early stage is really important—and then continuing throughout the process: are these the right datasets to be using? Is this the right data to answer those questions? Is this the right approach to answering those questions? What should we do with those findings? How can we use them in ways that are meaningful and beneficial and are actually going to have public benefits?

Mark: I really like this idea of the social license and engaging in a responsible way—building meaningful ways of engaging with the public. Some of these risks around using data by the public, they’re very honest and true, but sometimes they end up getting mixed up with myths or conspiracy theories.

My dad likes to share a lot of these over WhatsApp. It’s the way he engages with what I do as a job—at least when he’s not sending me insults about Liverpool Football Club over WhatsApp.

Anyway, could you tell us a bit more about what are the key myths around data, particularly administrative data, and how we can start to address them?

Mhairi: Yeah. I mean, I guess where there are misconceptions or myths around how data is used or how it might be accessed, that’s a clear sign that we need better transparency.

We need better engagement to ensure there’s access to accurate information about what actually happens.

Maybe some of the biggest misconceptions or worries might be around whether data is being sold—whether data is being accessed by, say, private companies for commercial interests. Obviously, as Nikhil set out, that’s absolutely not what we’re talking about here when we’re talking about administrative data being used for research purposes.

But it’s important we have open conversations around what is actually happening.

In general, research that’s looked at public attitudes towards uses of administrative data in research has fairly consistently found there are very high levels of public support for data use in research purposes.

But that support is not unconditional.

It’s really important that we pay attention to the conditions that underpin that support and that will need to be met to sustain that support on an ongoing basis.

There are also concerns around: what is this data going to be used for? What decisions might be made as a result of this research? Does this mean it’s going to inform approaches to policymaking or service delivery that are treating people as generalised categories of service users, rather than taking into account individual circumstances and the nuances of individual needs? That’s a concern we hear quite a lot. What’s the aim of this? What’s going to happen as an outcome of this research?

Again, I think that’s where it needs to be really clear messaging—clear communication—about: what is the purpose of this? Is it informing better service delivery? Better policymaking? Audit of service delivery, perhaps? But how does that affect individual service users—individuals who might have different needs that can’t be generalised?

I think sometimes there’s a nervousness to discuss some of these things in public because there’s a worry there’s going to be opposition. But actually, opposition can often come because there’s a sense things have been covered up or there’s not openness—and then there’s worry.

That’s when we get the sensational tabloid headlines about dodgy deals and data sharing. Whereas actually, if we have more open conversations from the beginning—around, "these are the potential ways we could use this data. This is the potential good that could come from this research. If there are different organisations or actors involved, this is what that might look like" - and involving the public in those conversations is absolutely key - generally, if we’re doing that, we find people are quite understanding of the realities of what research looks like.

If we can have assurances that this is going to lead to public benefits, and there are clear steps in place to maximise the potential of those benefits being realised, then generally, the public is largely supportive. But it’s important we don’t take that support for granted—and we engage meaningfully with those concerns. It’s about bringing the public with us, not doing it without them.

Shayda: Reflecting on my own public engagement work, I’ve found similar findings. Once you have these discussions with members of the public, they want to know how the data can benefit them, and they are broadly supportive of data being used for public good.

How do we protect against bad outcomes—like hacking or people wanting access to data who maybe don’t have the best intentions? Nikhil, what is the role of UKSA in counteracting these issues?

Nikhil: Some of the things UKSA does include providing guidance for trusted research environments and policies through the DEA that define the levels of security they need.

There’s something called the "airlock" in trusted research environments. These are where people access de-identified administrative data—data from all of us—but there’s an airlock, meaning any other data coming in cannot be interfered with. That’s the first thing.

The other thing is that UKSA, through the Research Accreditation Panel, ensures that when university researchers, for example, access data in the research environment, it’s in a secure location. This could be at the university, in an office, or even at the ONS themselves. There’s a Secure Research Service, and the computers cannot access both the data and the internet at the same time, so there’s no way the data can be transferred. 

These are some of the minimum policies UKSA has set for research environments.

Shayda: Thanks, Nikhil.

We’re coming to the end of the podcast now. To wrap up, we like to ask our guests: what’s the point of all this?

Something that’s come out very clearly in these conversations is how we need to involve the public in discussions around how data is used or how it can be used for public good.

Can you summarise how we can better communicate the role of data research to the public in a way that is transparent and promotes trust?

Mhairi: At the crux of it, it’s about relationships, open communication, and recognising that to do the best we can with data, we need to do it in a relationship with the public. 

This is the public’s data, and we need the public to be part of that process to get it right and maximise the benefits to the public. It requires dialogue and conversation—not just telling the public what we’re doing, but bringing them into the process and conversation around this area.

Shayda: And you, Nikhil?

Nikhil: I just want to echo what Mhairi said. Ethics doesn’t exist in a vacuum. When things are done properly, this is all applied. This isn’t some obscure theoretical moral philosophy. Data ethics is about applying what is right and wrong for the time and views of the people to the way data is used.

As Mhairi said, the point of all this is for people to get involved—because it’s all of our data. We are citizens; we own our data. By getting involved, I encourage people to speak up to their hospitals, schools, universities, politicians, and councillors. Make your views known about how your information is being used. That’s the only way everyone can do it ethically and in the public interest.

Shayda: That’s everything for today’s episode. Thank you again, Nikhil and Mhairi—and thank you for listening.

Mark: Until next time, stay curious about how your everyday data might shape society.

People on this episode