-
TETRIS: Composing FHE Techniques for Private Functional Exploration Over Large Datasets
Authors:
Malika Izabachène,
Jean-Philippe Bossuat
Abstract:
To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of…
▽ More
To derive valuable insights from statistics, machine learning applications frequently analyze substantial amounts of data. In this work, we address the problem of designing efficient secure techniques to probe large datasets which allow a scientist to conduct large-scale medical studies over specific attributes of patients' records, while maintaining the privacy of his model. We introduce a set of composable homomorphic operations and show how to combine private functions evaluation with private thresholds via approximate fully homomorphic encryption. This allows us to design a new system named TETRIS, which solves the real-world use case of private functional exploration of large databases, where the statistical criteria remain private to the server owning the patients' records. Our experiments show that TETRIS achieves practical performance over a large dataset of patients even for the evaluation of elaborate statements composed of linear and nonlinear functions. It is possible to extract private insights from a database of hundreds of thousands of patient records within only a few minutes on a single thread, with an amortized time per database entry smaller than 2ms.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
slytHErin: An Agile Framework for Encrypted Deep Neural Network Inference
Authors:
Francesco Intoci,
Sinem Sav,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
Abstract:
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, developing HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantl…
▽ More
Homomorphic encryption (HE), which allows computations on encrypted data, is an enabling technology for confidential cloud computing. One notable example is privacy-preserving Prediction-as-a-Service (PaaS), where machine-learning predictions are computed on encrypted data. However, developing HE-based solutions for encrypted PaaS is a tedious task which requires a careful design that predominantly depends on the deployment scenario and on leveraging the characteristics of modern HE schemes. Prior works on privacy-preserving PaaS focus solely on protecting the confidentiality of the client data uploaded to a remote model provider, e.g., a cloud offering a prediction API, and assume (or take advantage of the fact) that the model is held in plaintext. Furthermore, their aim is to either minimize the latency of the service by processing one sample at a time, or to maximize the number of samples processed per second, while processing a fixed (large) number of samples. In this work, we present slytHErin, an agile framework that enables privacy-preserving PaaS beyond the application scenarios considered in prior works. Thanks to its hybrid design leveraging HE and its multiparty variant (MHE), slytHErin enables novel PaaS scenarios by encrypting the data, the model or both. Moreover, slytHErin features a flexible input data packing approach that allows processing a batch of an arbitrary number of samples, and several computation optimizations that are model-and-setting-agnostic. slytHErin is implemented in Go and it allows end-users to perform encrypted PaaS on custom deep learning models comprising fully-connected, convolutional, and pooling layers, in a few lines of code and without having to worry about the cumbersome implementation and optimization concerns inherent to HE.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Scalable and Privacy-Preserving Federated Principal Component Analysis
Authors:
David Froelicher,
Hyunghoon Cho,
Manaswitha Edupalli,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
Bonnie Berger,
Jean-Pierre Hubaux
Abstract:
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermedia…
▽ More
Principal component analysis (PCA) is an essential algorithm for dimensionality reduction in many data science domains. We address the problem of performing a federated PCA on private data distributed among multiple data providers while ensuring data confidentiality. Our solution, SF-PCA, is an end-to-end secure system that preserves the confidentiality of both the original data and all intermediate results in a passive-adversary model with up to all-but-one colluding parties. SF-PCA jointly leverages multiparty homomorphic encryption, interactive protocols, and edge computing to efficiently interleave computations on local cleartext data with operations on collectively encrypted data. SF-PCA obtains results as accurate as non-secure centralized solutions, independently of the data distribution among the parties. It scales linearly or better with the dataset dimensions and with the number of data providers. SF-PCA is more precise than existing approaches that approximate the solution by combining local analysis results, and between 3x and 250x faster than privacy-preserving alternatives based solely on secure multiparty computation or homomorphic encryption. Our work demonstrates the practical applicability of secure and federated PCA on private distributed datasets.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Orchestrating Collaborative Cybersecurity: A Secure Framework for Distributed Privacy-Preserving Threat Intelligence Sharing
Authors:
Juan R. Trocoso-Pastoriza,
Alain Mermoud,
Romain Bouyé,
Francesco Marino,
Jean-Philippe Bossuat,
Vincent Lenders,
Jean-Pierre Hubaux
Abstract:
Cyber Threat Intelligence (CTI) sharing is an important activity to reduce information asymmetries between attackers and defenders. However, this activity presents challenges due to the tension between data sharing and confidentiality, that result in information retention often leading to a free-rider problem. Therefore, the information that is shared represents only the tip of the iceberg. Curren…
▽ More
Cyber Threat Intelligence (CTI) sharing is an important activity to reduce information asymmetries between attackers and defenders. However, this activity presents challenges due to the tension between data sharing and confidentiality, that result in information retention often leading to a free-rider problem. Therefore, the information that is shared represents only the tip of the iceberg. Current literature assumes access to centralized databases containing all the information, but this is not always feasible, due to the aforementioned tension. This results in unbalanced or incomplete datasets, requiring the use of techniques to expand them; we show how these techniques lead to biased results and misleading performance expectations. We propose a novel framework for extracting CTI from distributed data on incidents, vulnerabilities and indicators of compromise, and demonstrate its use in several practical scenarios, in conjunction with the Malware Information Sharing Platforms (MISP). Policy implications for CTI sharing are presented and discussed. The proposed system relies on an efficient combination of privacy enhancing technologies and federated processing. This lets organizations stay in control of their CTI and minimize the risks of exposure or leakage, while enabling the benefits of sharing, more accurate and representative results, and more effective predictive and preventive defenses.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Privacy-Preserving Federated Recurrent Neural Networks
Authors:
Sinem Sav,
Abdulrahman Diaa,
Apostolos Pyrgelis,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-…
▽ More
We present RHODE, a novel system that enables privacy-preserving training of and prediction on Recurrent Neural Networks (RNNs) in a cross-silo federated learning setting by relying on multiparty homomorphic encryption. RHODE preserves the confidentiality of the training data, the model, and the prediction data; and it mitigates federated learning attacks that target the gradients under a passive-adversary threat model. We propose a packing scheme, multi-dimensional packing, for a better utilization of Single Instruction, Multiple Data (SIMD) operations under encryption. With multi-dimensional packing, RHODE enables the efficient processing, in parallel, of a batch of samples. To avoid the exploding gradients problem, RHODE provides several clipping approximations for performing gradient clipping under encryption. We experimentally show that the model performance with RHODE remains similar to non-secure solutions both for homogeneous and heterogeneous data distribution among the data holders. Our experimental evaluation shows that RHODE scales linearly with the number of data holders and the number of timesteps, sub-linearly and sub-quadratically with the number of features and the number of hidden units of RNNs, respectively. To the best of our knowledge, RHODE is the first system that provides the building blocks for the training of RNNs and its variants, under encryption in a federated learning setting.
△ Less
Submitted 3 May, 2023; v1 submitted 28 July, 2022;
originally announced July 2022.
-
POSEIDON: Privacy-Preserving Federated Neural Network Learning
Authors:
Sinem Sav,
Apostolos Pyrgelis,
Juan R. Troncoso-Pastoriza,
David Froelicher,
Jean-Philippe Bossuat,
Joao Sa Sousa,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation…
▽ More
In this paper, we address the problem of privacy-preserving training and evaluation of neural networks in an $N$-party, federated learning setting. We propose a novel system, POSEIDON, the first of its kind in the regime of privacy-preserving neural network training. It employs multiparty lattice-based cryptography to preserve the confidentiality of the training data, the model, and the evaluation data, under a passive-adversary model and collusions between up to $N-1$ parties. To efficiently execute the secure backpropagation algorithm for training neural networks, we provide a generic packing approach that enables Single Instruction, Multiple Data (SIMD) operations on encrypted data. We also introduce arbitrary linear transformations within the cryptographic bootstrapping operation, optimizing the costly cryptographic computations over the parties, and we define a constrained optimization problem for choosing the cryptographic parameters. Our experimental results show that POSEIDON achieves accuracy similar to centralized or decentralized non-private approaches and that its computation and communication overhead scales linearly with the number of parties. POSEIDON trains a 3-layer neural network on the MNIST dataset with 784 features and 60K samples distributed among 10 parties in less than 2 hours.
△ Less
Submitted 8 January, 2021; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Scalable Privacy-Preserving Distributed Learning
Authors:
David Froelicher,
Juan R. Troncoso-Pastoriza,
Apostolos Pyrgelis,
Sinem Sav,
Joao Sa Sousa,
Jean-Philippe Bossuat,
Jean-Pierre Hubaux
Abstract:
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the e…
▽ More
In this paper, we address the problem of privacy-preserving distributed learning and the evaluation of machine-learning models by analyzing it in the widespread MapReduce abstraction that we extend with privacy constraints. We design SPINDLE (Scalable Privacy-preservINg Distributed LEarning), the first distributed and privacy-preserving system that covers the complete ML workflow by enabling the execution of a cooperative gradient-descent and the evaluation of the obtained model and by preserving data and model confidentiality in a passive-adversary model with up to N-1 colluding parties. SPINDLE uses multiparty homomorphic encryption to execute parallel high-depth computations on encrypted data without significant overhead. We instantiate SPINDLE for the training and evaluation of generalized linear models on distributed datasets and show that it is able to accurately (on par with non-secure centrally-trained models) and efficiently (due to a multi-level parallelization of the computations) train models that require a high number of iterations on large input data with thousands of features, distributed among hundreds of data providers. For instance, it trains a logistic-regression model on a dataset of one million samples with 32 features distributed among 160 data providers in less than three minutes.
△ Less
Submitted 14 July, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.