Are PETs and Algorithmetic Accountability at loggerheads?
Combining MPC and ML with the goal of obtaining something "practical" may not be as straightforward as it seems. In fact, MPC might exacerbate certain issues that arise in deployments of ML. As security practitioners, we must be aware of these potential pitfalls and guide our efforts accordingly.
This post is an extension/adaption of a presentation I gave at HotPETs 2023 in Lausanne.
Background
Machine Learning (ML) is more popular than ever for consumers and businesses alike. ChatGPT have managed to acquire 100 million users in just a few months, and a recent(-ish) survey by McKinsey states that >50% of companies employ ML in some form or another.1, 2
Many of the practical applications of ML involve some kind of sensitive information. A system used to generate text, for example, will undoubtedly receive sensitive prompts3. Another system may attempt to predict whether a patient is at risk of disease, and as such will probably involve that patients medical records.4
These sensitive applications of ML have fuelled an interest in so-called "Privacy-Preserving Machine Learning", or PPML, that in a nutshell, are a class of techniques for securely and privately evaluating ML models.
Privacy, however, is not the only issue in applications of ML. And in fact, it is but a small one. An arguably larger issue is the that of accountability of the system. Indeed, the main appeal of ML is that it automates a human process with greater efficiency, and, hopefully, accuracy as well. But when we replace a human process with that of a machine, then accountability of bad actions becomes more opaque.
ML has been demonstrated to be biased in many cases, in particular when it comes to minority groups. In light of this fact, accountability is even more important. User's of an ML system are entitled to know if the output they get from an ML system is actually good, or if they are being unfairly treated.
In the EU, The need for accountability is currently being written into the "AI act"; China is working on a similar law, and maybe the US will someday do as well. Who knows. In any event, accountability appears to be just as important as privacy when it comes to ML systems.
Title
This post is an extension of a talk I gave recently at HotPETs 2023 in Lausanne. The basic thesis of that talk being that PPML solutions should consider how to account for accountability as well as privacy. At least from a glace, achieving accountability seems to imply that agreat deal of insight into an ML model is required. On the other hand, PPML seemingly removes this insight. That might cause problems.
Slides from the talk can be found here.
Threat modeling for PPML
At the center of the issue I'm writing about here, is the fact that PPML operates with two threat models, not one. First, there is whatever threat model that the privacy tech operates in. And second, there's whatever threat model the ML model operates in.
If we specify each in isolation, then we are effectively talking about two distinct adversaries that operate independently.
This may work. In fact, that's essentially the threat model of a recent cool paper by Jovanovic et al.
But such a separation is a bit hard to justify in some scenarios. In particular those where the server is involved in the computation. Let's consider an example to see what I mean:
Suppose we have a single server \(S\) and a client \(C\). The server has a model \(M\), the client has an input \(x\) and the purpose of the system is to allow the client to learn \(M(x)\). The client wishes to keep its input private, so it resorts to using a secure computation protocol with semi-honest security. That is, it trusts the that the server won't deviate from the protocol.
Now, it is important to remember that guarantees about the input is not part of the protocol. So, technically speaking, the server is allowed to input a "bad" (e.g., biased) model. And such an action might be perfectly reasonable for a misbehaving server: Perhaps the server is resigned to not being able to see the clients inputs—by using a biased model, it can still unfairly deny a loan, insurance or job to targeted clients.
If the threat model we are using (for the secure computation) is semi-honest, then an argument can be made that the server won't use a biased model on purpose, and so, perhaps we are good.
If the threat model on the other hand assumes a malicious adversary, then it suddenly seems hard to argue that, on the one hand, the server cannot be trusted to even send the right messages in the protocol, but on the other hand, the server is trusted to use a non-biased model.
Solving this conundrum
Whether or not the above is an issue, it does lend itself to some interesting research questions: In particular, is it possible to provide both privacy and some reasonable notion of fairness, robustness, and/or explainability?
What makes these questions interesting, is the fact that whatever ML properties we wish to satisfy, must somehow be satisfiable while working in a secure computation context. And that seems hard to do, at least intuitively. Indeed, guaranteeing certain properties about an ML model seemingly implies a great(er) degree of insight into the model's parameters; but this insight is exactly what secure computation techniques aim to remove!
And while this aparent clash between properties related to accountability and those related to privacy, have been noted before, it doesn't mean that the problem cannot be solved (in some way, shape or form).
Footnotes:
For example, multiple example prompts on https://stackdiary.com/chatgpt/role-based-prompts/ involve potentially private information. Specifically the prompts that attempts to get ChatGPT to act as a health coach or therapist.