-
PolyDNN: Polynomial Representation of NN for Communication-less SMPC Inference
Authors:
Philip Derbeko,
Shlomi Dolev
Abstract:
The structure and weights of Deep Neural Networks (DNN) typically encode and contain very valuable information about the dataset that was used to train the network.
One way to protect this information when DNN is published is to perform an interference of the network using secure multi-party computations (MPC).
In this paper, we suggest a translation of deep neural networks to polynomials, whi…
▽ More
The structure and weights of Deep Neural Networks (DNN) typically encode and contain very valuable information about the dataset that was used to train the network.
One way to protect this information when DNN is published is to perform an interference of the network using secure multi-party computations (MPC).
In this paper, we suggest a translation of deep neural networks to polynomials, which are easier to calculate efficiently with MPC techniques.
We show a way to translate complete networks into a single polynomial and how to calculate the polynomial with an efficient and information-secure MPC algorithm.
The calculation is done without intermediate communication between the participating parties, which is beneficial in several cases, as explained in the paper.
△ Less
Submitted 27 April, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
Efficient and Private Approximations of Distributed Databases Calculations
Authors:
Philip Derbeko,
Shlomi Dolev,
Ehud Gudes,
Jeffrey D. Ullman
Abstract:
In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data release were intensively investigated. However, despite a considerable progress, computational complexity, due to an increasing size of data, remains a limitin…
▽ More
In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data release were intensively investigated. However, despite a considerable progress, computational complexity, due to an increasing size of data, remains a limiting factor in real-world deployments, especially in case of privacy-preserving computations.
In this paper, we present a general method for trade off between performance and accuracy of distributed calculations by performing data sampling. Sampling was a topic of extensive research that recently received a boost of interest. We provide a sampling method targeted at separate, non-collaborating, vertically partitioned datasets. The method is exemplified and tested on approximation of intersection set both without and with privacy-preserving mechanism. An analysis of the bound on error as a function of the sample size is discussed and heuristic algorithm is suggested to further improve the performance. The algorithms were implemented and experimental results confirm the validity of the approach.
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Security and Privacy Aspects in MapReduce on Clouds: A Survey
Authors:
Philip Derbeko,
Shlomi Dolev,
Ehud Gudes,
Shantanu Sharma
Abstract:
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern match…
▽ More
MapReduce is a programming system for distributed processing large-scale data in an efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is extensively used daily around the world as an efficient distributed computation tool for a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and analysis of social networks. Security and privacy of data and MapReduce computations are essential concerns when a MapReduce computation is executed in public or hybrid clouds. In order to execute a MapReduce job in public and hybrid clouds, authentication of mappers-reducers, confidentiality of data-computations, integrity of data-computations, and correctness-freshness of the outputs are required. Satisfying these requirements shield the operation from several types of attacks on data and MapReduce computations. In this paper, we investigate and discuss security and privacy challenges and requirements, considering a variety of adversarial capabilities, and characteristics in the scope of MapReduce. We also provide a review of existing security and privacy protocols for MapReduce and discuss their overhead issues.
△ Less
Submitted 2 May, 2016;
originally announced May 2016.
-
Explicit Learning Curves for Transduction and Application to Clustering and Compression Algorithms
Authors:
P. Derbeko,
R. El-Yaniv,
R. Meir
Abstract:
Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful a…
▽ More
Inductive learning is based on inferring a general rule from a finite data set and using it to label new data. In transduction one attempts to solve the problem of using a labeled training set to label a set of unlabeled points, which are given to the learner prior to learning. Although transduction seems at the outset to be an easier task than induction, there have not been many provably useful algorithms for transduction. Moreover, the precise relation between induction and transduction has not yet been determined. The main theoretical developments related to transduction were presented by Vapnik more than twenty years ago. One of Vapnik's basic results is a rather tight error bound for transductive classification based on an exact computation of the hypergeometric tail. While tight, this bound is given implicitly via a computational routine. Our first contribution is a somewhat looser but explicit characterization of a slightly extended PAC-Bayesian version of Vapnik's transductive bound. This characterization is obtained using concentration inequalities for the tail of sums of random variables obtained by sampling without replacement. We then derive error bounds for compression schemes such as (transductive) support vector machines and for transduction algorithms based on clustering. The main observation used for deriving these new error bounds and algorithms is that the unlabeled test points, which in the transductive setting are known in advance, can be used in order to construct useful data dependent prior distributions over the hypothesis space.
△ Less
Submitted 30 June, 2011;
originally announced July 2011.