The Different Types of Privacy-Preserving Schemes

Machine learning (ML) is a subset of artificial intelligence (AI) that provides systems the ability to automatically improve and learn from experience without explicit programming. ML has led to important advancements in a number of academic fields, including robotics, healthcare, natural language processing, and many more. With the ever-growing concerns over data privacy, there has been an increasing interest in privacy-preserving ML. In order to protect the privacy of data while still allowing it to be used for ML, various privacy-preserving schemes have been proposed. Here are some of the main schemes:

Secure multiparty computation (SMC) is a type of privacy-preserving scheme that allows multiple parties to jointly compute a function over their data while keeping their data private. This is achieved by splitting the data up among the parties and having each party perform a computation on their own data. The results of the computations are then combined to obtain the final result.

Homomorphic encryption (HE) is a type of encryption that allows computations to be performed on encrypted data. This type of encryption preserves the structure of the data, which means that the results of the computations are the same as if they had been performed on unencrypted data. HE can be used to protect the privacy of data while still allowing computations to be performed on that data.

Differential privacy (DP) is a type of privacy preservation that adds noise to the data in order to mask any individual information. This noise is added in a way that it does not affect the overall results of the data. This noise can be added in a variety of ways, but the most common is through the Laplace mechanism. DP is useful for preserving privacy because it makes it difficult to determine any individual’s information from the dataset. 

Gradient masking is a technique that is used to prevent sensitive information from being leaked through the gradients of an ML model – the gradients are the partial derivatives of the loss function with respect to the model parameters. This is done by adding noise to the gradients in order to make them more difficult to interpret. This is useful for privacy preservation because it makes it more difficult to determine the underlying data from the gradients.

Security enclaves (SE) are hardware or software environments that are designed to be secure from tampering or interference. They are often used to store or process sensitive data, such as cryptographic keys, in a way that is isolated from the rest of the system. 

There are many ways to preserve privacy when working with ML models, each with their own trade-offs. In this article, we summarised five of these methods. All of these methods have strengths and weaknesses, so it is important to choose the right one for the specific application.

Human-centered AI Course

In the fall of 2019, I enrolled in the PhD course titled “Introduction to Human-centered AI. ” The course is delivered and managed by Cecilia Ovesdotter Alm from RIT university.

Human-centered AI is essentially a perspective on AI and ML that algorithms must be designed with awareness that they are part of a larger system consisting of human stakeholders. According to Mark O. Riedl,  the main requirements of human-centered AI can be broken into two aspects: (a) AI systems that have an understanding of human sociocultural norms as part of a theory of mind about people, and (b) AI systems that are capable of producing explanations that non-experts in AI or computer science can understand.

Human-centered AI

Course introduction lecture held at Malmö University (2019).

One of the course learning outcomes is to be able to demonstrate critical thinking concerning bias and fairness in data analysis, including but not limited to gender aspects. With regard to this, I have put together a 10 minutes presentation of the article “50 Years of Test (Un)fairness: Lessons for Machine Learning” written by Ben Hutchinson and Margaret Mitchell.