Training machine learning models with private data on untrusted hardware

 

Can we trust the Cloud with data? How do we know it is not malicious? How do we verify that the computation done by the Cloud is actually correct? In his first talk as Smt Rukmini–Shri Gopalakrishnachar Visiting Chair Professor at the Indian Institute of Science, Murali Annavaram addressed two challenges in cloud computing, namely privacy and computation integrity.  A recording of the lecture is available here. Here is a summary of his lecture.

Deep learning has many applications in our daily lives from manufacturing, health care and transportation. Machine learning (ML), as a service, is going to become a popular system. However, the problem is that the vast majority of people have no clue what ML is. They have all heard the word, they all want to use it, but they are not experts in the machine learning domain. This means that companies have to make it very simple for people to deploy machine learning algorithms in their workflows.

A Cloud service provider, say Amazon or Google or Microsoft, will provide the clients with a model and give them a set of application programming interfaces (APIs). All that the clients need to do is to code into the APIs and then, their models will be trained using their own data in the Cloud. So, what is the problem? Can we ensure data privacy and computation integrity?

One way to address this problem is to use a trusted execution environment (TEE) or secure enclave, which is a hardware-level isolation provided by certain central processing units (CPUs). In this virtual environment, data is protected from hackers and even from the root user on the specific computer. The enclave is completely isolated from the operating system (OS). The data stored in this memory is encrypted and the memory is not accessible to any code outside the enclave including the OS. It can be assumed that the enclaves can be trusted.

The challenge in using the TEE is that enclaves are very slow for any application using more than 128 megabytes of data, since they are hardware protected. It is possible to store part of the data in the enclave (private) and part of it in a non-secure CPU/GPU (graphics processing unit). However, most users would prefer that all their data be safe. Hence, the aim would be to re-purpose the TEE to (i) provide data privacy; (ii) perform minimum computations; (iii) obfuscate the data; (iv) perform computations on obfuscated data on GPUs; and (v) return the results to TEE to decode.

The more data that can be supported on the enclave, the more data that can be obfuscated. How does one split the deep neural network (DNN) between the enclave and the non-secure CPU/GPU? As many layers as possible need to be put in the enclave and the rest goes into the non-secure CPU/GPU. It is possible to use a distributed set of enclaves, but this requires the communication channels between the systems to be encrypted; it has to be proven that there is no information leakage. To prevent two colluding GPUs from obtaining access to the original input, additional noise can be added in the system. The Hopper architecture (NVIDIA H100 Tensor Core GPU), released a few months ago,  allows the enclave to be inside the GPU itself.

How does the need for privacy impact performance? If all the data is put in the enclave, there is a 112.13x slowdown; if it is split in the sixth layer, there is a 43.21x slowdown; if there is a CPU-only execution, there is a 1x slowdown.

The DarKnight was designed by Annavaram’s group for theoretically-robust privacy with TEE. In this system, there is cryptographically zero leakage. Here, there is collaboration between the TEE and the GPU. The computation-intensive operations can be offloaded to the GPU while the TEE can be utilised for non-linear operations. The following steps are involved. (i) The image set is encrypted by the client using mutually-agreed keys with software guard extensions (SGX). (ii) Inside the SGX, a decryption mechanism is used to recover images. (iii) Linear encoding of inputs is done within SGX using the encoding scheme. (iv) The encoded data is sent to the GPUs. (v) Linear operations are computed using layer weights. (vi) The results are returned to SGX. (vii) The results are decoded within SGX. ((viii) Non-linear operations are performed with SGX. The GPU computation integrity is verified using redundancy.

Another interesting use of trusted execution environments is FedVault, designed by Annavaram’s group. This is an efficient gradient outlier detection mechanism for byzantine resilient and privacy preserving federated machine learning. The process here is the opposite of that used by the DarKnight. Here, there are multiple users who are unwilling to share their data with the Cloud in order to train the data. So, instead of sending the data to the Cloud and having it trained (DarKnight approach), the model is sent to the clients, and the clients train using their own independent individual subsets of data. Once they finish training, they generate gradients that they send to the gradient accumulation engine in the Centre called the server. The server will then aggregate all these gradients and generate a new model. This new model is sent back to the clients, and the cycle repeats.

The problems that can crop up with this mechanism are that (i) local updates can leak information about the local data; and (ii) byzantine nodes (which can lie or intentionally mislead other nodes of the network) can send arbitrary updates to the central server. What can be done? The TEE can be used to privately aggregate the local updates and outliers can be detected for byzantine robustness. Krum, a well-known approach to detect outliers, has been used here. Krum requires pair-wise distance computation between all the gradients sent by clients. This has to be done by the GPUs. However, we do not trust the GPUs. Hence, encoded gradients are sent to the GPUs.

In conclusion, Annavaram pointed out that going forward, performance can no longer be the only driving factor; privacy would be a greater challenge, and theoretical solutions would have to collaborate with hardware for broader adaption.

 

Scroll Up