-
Learning-Augmented Algorithms for Boolean Satisfiability
Authors:
Idan Attias,
Xing Gao,
Lev Reyzin
Abstract:
Learning-augmented algorithms are a prominent recent development in beyond worst-case analysis. In this framework, a problem instance is provided with a prediction (``advice'') from a machine-learning oracle, which provides partial information about an optimal solution, and the goal is to design algorithms that leverage this advice to improve worst-case performance. We study the classic Boolean sa…
▽ More
Learning-augmented algorithms are a prominent recent development in beyond worst-case analysis. In this framework, a problem instance is provided with a prediction (``advice'') from a machine-learning oracle, which provides partial information about an optimal solution, and the goal is to design algorithms that leverage this advice to improve worst-case performance. We study the classic Boolean satisfiability (SAT) decision and optimization problems within this framework using two forms of advice. ``Subset advice" provides a random $ε$ fraction of the variables from an optimal assignment, whereas ``label advice" provides noisy predictions for all variables in an optimal assignment.
For the decision problem $k$-SAT, by using the subset advice we accelerate the exponential running time of the PPSZ family of algorithms due to Paturi, Pudlak, Saks and Zane, which currently represent the state of the art in the worst case. We accelerate the running time by a multiplicative factor of $2^{-c}$ in the base of the exponent, where $c$ is a function of $ε$ and $k$. For the optimization problem, we show how to incorporate subset advice in a black-box fashion with any $α$-approximation algorithm, improving the approximation ratio to $α+ (1 - α)ε$. Specifically, we achieve approximations of $0.94 + Ω(ε)$ for MAX-$2$-SAT, $7/8 + Ω(ε)$ for MAX-$3$-SAT, and $0.79 + Ω(ε)$ for MAX-SAT. Moreover, for label advice, we obtain near-optimal approximation for instances with large average degree, thereby generalizing recent results on MAX-CUT and MAX-$2$-LIN.
△ Less
Submitted 30 May, 2025; v1 submitted 9 May, 2025;
originally announced May 2025.
-
Non-adaptive Learning of Random Hypergraphs with Queries
Authors:
Bethany Austhof,
Lev Reyzin,
Erasmo Tani
Abstract:
We study the problem of learning a hidden hypergraph $G=(V,E)$ by making a single batch of queries (non-adaptively). We consider the hyperedge detection model, in which every query must be of the form:
``Does this set $S\subseteq V$ contain at least one full hyperedge?''
In this model, it is known that there is no algorithm that allows to non-adaptively learn arbitrary hypergraphs by making fe…
▽ More
We study the problem of learning a hidden hypergraph $G=(V,E)$ by making a single batch of queries (non-adaptively). We consider the hyperedge detection model, in which every query must be of the form:
``Does this set $S\subseteq V$ contain at least one full hyperedge?''
In this model, it is known that there is no algorithm that allows to non-adaptively learn arbitrary hypergraphs by making fewer than $Ω(\min\{m^2\log n, n^2\})$ even when the hypergraph is constrained to be $2$-uniform (i.e. the hypergraph is simply a graph). Recently, Li et al. overcame this lower bound in the setting in which $G$ is a graph by assuming that the graph learned is sampled from an Erdős-Rényi model. We generalize the result of Li et al. to the setting of random $k$-uniform hypergraphs. To achieve this result, we leverage a novel equivalence between the problem of learning a single hyperedge and the standard group testing problem. This latter result may also be of independent interest.
△ Less
Submitted 22 January, 2025;
originally announced January 2025.
-
Applications of Littlestone dimension to query learning and to compression
Authors:
Hunter Chase,
James Freitag,
Lev Reyzin
Abstract:
In this paper we give several applications of Littlestone dimension. The first is to the model of \cite{angluin2017power}, where we extend their results for learning by equivalence queries with random counterexamples. Second, we extend that model to infinite concept classes with an additional source of randomness. Third, we give improved results on the relationship of Littlestone dimension to clas…
▽ More
In this paper we give several applications of Littlestone dimension. The first is to the model of \cite{angluin2017power}, where we extend their results for learning by equivalence queries with random counterexamples. Second, we extend that model to infinite concept classes with an additional source of randomness. Third, we give improved results on the relationship of Littlestone dimension to classes with extended $d$-compression schemes, proving a strong version of a conjecture of \cite{floyd1995sample} for Littlestone dimension.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs
Authors:
Ian A. Kash,
Lev Reyzin,
Zishun Yu
Abstract:
Reinforcement learning generalizes multi-armed bandit problems with additional difficulties of a longer planning horizon and unknown transition kernel. We explore a black-box reduction from discounted infinite-horizon tabular reinforcement learning to multi-armed bandits, where, specifically, an independent bandit learner is placed in each state. We show that, under ergodicity and fast mixing assu…
▽ More
Reinforcement learning generalizes multi-armed bandit problems with additional difficulties of a longer planning horizon and unknown transition kernel. We explore a black-box reduction from discounted infinite-horizon tabular reinforcement learning to multi-armed bandits, where, specifically, an independent bandit learner is placed in each state. We show that, under ergodicity and fast mixing assumptions, any slowly changing adversarial bandit algorithm achieving optimal regret in the adversarial bandit setting can also attain optimal expected regret in infinite-horizon discounted Markov decision processes, with respect to the number of rounds $T$. Furthermore, we examine our reduction using a specific instance of the exponential-weight algorithm.
△ Less
Submitted 9 March, 2024; v1 submitted 18 May, 2022;
originally announced May 2022.
-
A Unified Analysis of Dynamic Interactive Learning
Authors:
Xing Gao,
Thomas Maranzatto,
Lev Reyzin
Abstract:
In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the mode…
▽ More
In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the models analyzed by [Emamjomeh-Zadeh et al., 2020], which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
On the Geometry of Stable Steiner Tree Instances
Authors:
James Freitag,
Neshat Mohammadi,
Aditya Potukuchi,
Lev Reyzin
Abstract:
In this note we consider the Steiner tree problem under Bilu-Linial stability. We give strong geometric structural properties that need to be satisfied by stable instances. We then make use of, and strengthen, these geometric properties to show that $1.562$-stable instances of Euclidean Steiner trees are polynomial-time solvable. We also provide a connection between certain approximation algorithm…
▽ More
In this note we consider the Steiner tree problem under Bilu-Linial stability. We give strong geometric structural properties that need to be satisfied by stable instances. We then make use of, and strengthen, these geometric properties to show that $1.562$-stable instances of Euclidean Steiner trees are polynomial-time solvable. We also provide a connection between certain approximation algorithms and Bilu-Linial stability for Steiner trees.
△ Less
Submitted 27 September, 2021;
originally announced September 2021.
-
Communication-Aware Collaborative Learning
Authors:
Avrim Blum,
Shelby Heinecke,
Lev Reyzin
Abstract:
Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then conside…
▽ More
Algorithms for noiseless collaborative PAC learning have been analyzed and optimized in recent years with respect to sample complexity. In this paper, we study collaborative PAC learning with the goal of reducing communication cost at essentially no penalty to the sample complexity. We develop communication efficient collaborative PAC learning algorithms using distributed boosting. We then consider the communication cost of collaborative learning in the presence of classification noise. As an intermediate step, we show how collaborative PAC learning algorithms can be adapted to handle classification noise. With this insight, we develop communication efficient algorithms for collaborative PAC learning robust to classification noise.
△ Less
Submitted 18 December, 2020;
originally announced December 2020.
-
On the Complexity of Learning from Label Proportions
Authors:
Benjamin Fish,
Lev Reyzin
Abstract:
In the problem of learning with label proportions, which we call LLP learning, the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of…
▽ More
In the problem of learning with label proportions, which we call LLP learning, the training data is unlabeled, and only the proportions of examples receiving each label are given. The goal is to learn a hypothesis that predicts the proportions of labels on the distribution underlying the sample. This model of learning is applicable to a wide variety of settings, including predicting the number of votes for candidates in political elections from polls.
In this paper, we formally define this class and resolve foundational questions regarding the computational complexity of LLP and characterize its relationship to PAC learning. Among our results, we show, perhaps surprisingly, that for finite VC classes what can be efficiently LLP learned is a strict subset of what can be leaned efficiently in PAC, under standard complexity assumptions. We also show that there exist classes of functions whose learnability in LLP is independent of ZFC, the standard set theoretic axioms. This implies that LLP learning cannot be easily characterized (like PAC by VC dimension).
△ Less
Submitted 7 April, 2020;
originally announced April 2020.
-
Statistical Queries and Statistical Algorithms: Foundations and Applications
Authors:
Lev Reyzin
Abstract:
We give a survey of the foundations of statistical queries and their many applications to other areas. We introduce the model, give the main definitions, and we explore the fundamental theory statistical queries and how how it connects to various notions of learnability. We also give a detailed summary of some of the applications of statistical queries to other areas, including to optimization, to…
▽ More
We give a survey of the foundations of statistical queries and their many applications to other areas. We introduce the model, give the main definitions, and we explore the fundamental theory statistical queries and how how it connects to various notions of learnability. We also give a detailed summary of some of the applications of statistical queries to other areas, including to optimization, to evolvability, and to differential privacy.
△ Less
Submitted 14 May, 2020; v1 submitted 1 April, 2020;
originally announced April 2020.
-
On Biased Random Walks, Corrupted Intervals, and Learning Under Adversarial Design
Authors:
Daniel Berend,
Aryeh Kontorovich,
Lev Reyzin,
Thomas Robinson
Abstract:
We tackle some fundamental problems in probability theory on corrupted random processes on the integer line. We analyze when a biased random walk is expected to reach its bottommost point and when intervals of integer points can be detected under a natural model of noise. We apply these results to problems in learning thresholds and intervals under a new model for learning under adversarial design…
▽ More
We tackle some fundamental problems in probability theory on corrupted random processes on the integer line. We analyze when a biased random walk is expected to reach its bottommost point and when intervals of integer points can be detected under a natural model of noise. We apply these results to problems in learning thresholds and intervals under a new model for learning under adversarial design.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
On Learning a Hidden Directed Graph with Path Queries
Authors:
Mano Vikash Janardhanan,
Lev Reyzin
Abstract:
In this paper, we consider the problem of reconstructing a directed graph using path queries. In this query model of learning, a graph is hidden from the learner, and the learner can access information about it with path queries. For a source and destination node, a path query returns whether there is a directed path from the source to the destination node in the hidden graph. In this paper we fir…
▽ More
In this paper, we consider the problem of reconstructing a directed graph using path queries. In this query model of learning, a graph is hidden from the learner, and the learner can access information about it with path queries. For a source and destination node, a path query returns whether there is a directed path from the source to the destination node in the hidden graph. In this paper we first give bounds for learning graphs on $n$ vertices and $k$ strongly connected components. We then study the case of bounded degree directed trees and give new algorithms for learning "almost-trees" -- directed trees to which extra edges have been added. We also give some lower bound constructions justifying our approach.
△ Less
Submitted 16 March, 2021; v1 submitted 26 February, 2020;
originally announced February 2020.
-
Crowdsourced PAC Learning under Classification Noise
Authors:
Shelby Heinecke,
Lev Reyzin
Abstract:
In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier…
▽ More
In this paper, we analyze PAC learnability from labels produced by crowdsourcing. In our setting, unlabeled examples are drawn from a distribution and labels are crowdsourced from workers who operate under classification noise, each with their own noise parameter. We develop an end-to-end crowdsourced PAC learning algorithm that takes unlabeled data points as input and outputs a trained classifier. Our three-step algorithm incorporates majority voting, pure-exploration bandits, and noisy-PAC learning. We prove several guarantees on the number of tasks labeled by workers for PAC learning in this setting and show that our algorithm improves upon the baseline by reducing the total number of tasks given to workers. We demonstrate the robustness of our algorithm by exploring its application to additional realistic crowdsourcing settings.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Sampling Without Compromising Accuracy in Adaptive Data Analysis
Authors:
Benjamin Fish,
Lev Reyzin,
Benjamin I. P. Rubinstein
Abstract:
In this work, we study how to use sampling to speed up mechanisms for answering adaptive queries into datasets without reducing the accuracy of those mechanisms. This is important to do when both the datasets and the number of queries asked are very large. In particular, we describe a mechanism that provides a polynomial speed-up per query over previous mechanisms, without needing to increase the…
▽ More
In this work, we study how to use sampling to speed up mechanisms for answering adaptive queries into datasets without reducing the accuracy of those mechanisms. This is important to do when both the datasets and the number of queries asked are very large. In particular, we describe a mechanism that provides a polynomial speed-up per query over previous mechanisms, without needing to increase the total amount of data required to maintain the same generalization error as before. We prove that this speed-up holds for arbitrary statistical queries. We also provide an even faster method for achieving statistically-meaningful responses wherein the mechanism is only allowed to see a constant number of samples from the data per query. Finally, we show that our general results yield a simple, fast, and unified approach for adaptively optimizing convex and strongly convex functions over a dataset.
△ Less
Submitted 1 January, 2020; v1 submitted 27 September, 2017;
originally announced September 2017.
-
Network Construction with Ordered Constraints
Authors:
Yi Huang,
Mano Vikash Janardhanan,
Lev Reyzin
Abstract:
In this paper, we study the problem of constructing a network by observing ordered connectivity constraints, which we define herein. These ordered constraints are made to capture realistic properties of real-world problems that are not reflected in previous, more general models. We give hardness of approximation results and nearly-matching upper bounds for the offline problem, and we study the onl…
▽ More
In this paper, we study the problem of constructing a network by observing ordered connectivity constraints, which we define herein. These ordered constraints are made to capture realistic properties of real-world problems that are not reflected in previous, more general models. We give hardness of approximation results and nearly-matching upper bounds for the offline problem, and we study the online problem in both general graphs and restricted sub-classes. In the online problem, for general graphs, we give exponentially better upper bounds than exist for algorithms for general connectivity problems. For the restricted classes of stars and paths we are able to find algorithms with optimal competitive ratios, the latter of which involve analysis using a potential function defined over pq-trees.
△ Less
Submitted 23 February, 2017;
originally announced February 2017.
-
A Simple Spectral Algorithm for Recovering Planted Partitions
Authors:
Sam Cole,
Shmuel Friedland,
Lev Reyzin
Abstract:
In this paper, we consider the planted partition model, in which $n = ks$ vertices of a random graph are partitioned into $k$ "clusters," each of size $s$. Edges between vertices in the same cluster and different clusters are included with constant probability $p$ and $q$, respectively (where $0 \le q < p \le 1$). We give an efficient algorithm that, with high probability, recovers the clusters as…
▽ More
In this paper, we consider the planted partition model, in which $n = ks$ vertices of a random graph are partitioned into $k$ "clusters," each of size $s$. Edges between vertices in the same cluster and different clusters are included with constant probability $p$ and $q$, respectively (where $0 \le q < p \le 1$). We give an efficient algorithm that, with high probability, recovers the clusters as long as the cluster sizes are are least $Ω(\sqrt{n})$. Informally, our algorithm constructs the projection operator onto the dominant $k$-dimensional eigenspace of the graph's adjacency matrix and uses it to recover one cluster at a time. To our knowledge, our algorithm is the first purely spectral algorithm which runs in polynomial time and works even when $s = Θ(\sqrt n)$, though there have been several non-spectral algorithms which accomplish this. Our algorithm is also among the simplest of these spectral algorithms, and its proof of correctness illustrates the usefulness of the Cauchy integral formula in this domain.
△ Less
Submitted 25 August, 2017; v1 submitted 2 March, 2015;
originally announced March 2015.
-
Network installation and recovery: approximation lower bounds and faster exact formulations
Authors:
Alexander Gutfraind,
Jeremy Kun,
Ádám D. Lelkes,
Lev Reyzin
Abstract:
We study the Neighbor Aided Network Installation Problem (NANIP) introduced previously which asks for a minimal cost ordering of the vertices of a graph, where the cost of visiting a node is a function of the number of neighbors that have already been visited. This problem has applications in resource management and disaster recovery. In this paper we analyze the computational hardness of NANIP. I…
▽ More
We study the Neighbor Aided Network Installation Problem (NANIP) introduced previously which asks for a minimal cost ordering of the vertices of a graph, where the cost of visiting a node is a function of the number of neighbors that have already been visited. This problem has applications in resource management and disaster recovery. In this paper we analyze the computational hardness of NANIP. In particular we show that this problem is NP-hard even when restricted to convex decreasing cost functions, give a linear approximation lower bound for the greedy algorithm, and prove a general sub-constant approximation lower bound. Then we give a new integer programming formulation of NANIP and empirically observe its speedup over the original integer program.
△ Less
Submitted 13 November, 2014;
originally announced November 2014.
-
On the Computational Complexity of MapReduce
Authors:
Benjamin Fish,
Jeremy Kun,
Ádám Dániel Lelkes,
Lev Reyzin,
György Turán
Abstract:
In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of sublogarithmic space, lies in constant round MRC. This result also applies to the MPC model of Andoni et al. (2014). In addition, we prove that, conditioned on a…
▽ More
In this paper we study MapReduce computations from a complexity-theoretic perspective. First, we formulate a uniform version of the MRC model of Karloff et al. (2010). We then show that the class of regular languages, and moreover all of sublogarithmic space, lies in constant round MRC. This result also applies to the MPC model of Andoni et al. (2014). In addition, we prove that, conditioned on a variant of the Exponential Time Hypothesis, there are strict hierarchies within MRC so that increasing the number of rounds or the amount of time per processor increases the power of MRC. To the best of our knowledge we are the first to approach the MapReduce model with complexity-theoretic techniques, and our work lays the foundation for further analysis relating MapReduce to established complexity classes.
△ Less
Submitted 6 October, 2015; v1 submitted 1 October, 2014;
originally announced October 2014.
-
On Coloring Resilient Graphs
Authors:
Jeremy Kun,
Lev Reyzin
Abstract:
We introduce a new notion of resilience for constraint satisfaction problems, with the goal of more precisely determining the boundary between NP-hardness and the existence of efficient algorithms for resilient instances. In particular, we study $r$-resiliently $k$-colorable graphs, which are those $k$-colorable graphs that remain $k$-colorable even after the addition of any $r$ new edges. We prov…
▽ More
We introduce a new notion of resilience for constraint satisfaction problems, with the goal of more precisely determining the boundary between NP-hardness and the existence of efficient algorithms for resilient instances. In particular, we study $r$-resiliently $k$-colorable graphs, which are those $k$-colorable graphs that remain $k$-colorable even after the addition of any $r$ new edges. We prove lower bounds on the NP-hardness of coloring resiliently colorable graphs, and provide an algorithm that colors sufficiently resilient graphs. We also analyze the corresponding notion of resilience for $k$-SAT. This notion of resilience suggests an array of open questions for graph coloring and other combinatorial problems.
△ Less
Submitted 11 June, 2014; v1 submitted 18 February, 2014;
originally announced February 2014.
-
Anti-Coordination Games and Stable Graph Colorings
Authors:
Jeremy Kun,
Brian Powers,
Lev Reyzin
Abstract:
Motivated by understanding non-strict and strict pure strategy equilibria in network anti-coordination games, we define notions of stable and, respectively, strictly stable colorings in graphs. We characterize the cases when such colorings exist and when the decision problem is NP-hard. These correspond to finding pure strategy equilibria in the anti-coordination games, whose price of anarchy we a…
▽ More
Motivated by understanding non-strict and strict pure strategy equilibria in network anti-coordination games, we define notions of stable and, respectively, strictly stable colorings in graphs. We characterize the cases when such colorings exist and when the decision problem is NP-hard. These correspond to finding pure strategy equilibria in the anti-coordination games, whose price of anarchy we also analyze. We further consider the directed case, a generalization that captures both coordination and anti-coordination. We prove the decision problem for non-strict equilibria in directed graphs is NP-hard. Our notions also have multiple connections to other combinatorial questions, and our results resolve some open problems in these areas, most notably the complexity of the strictly unfriendly partition problem.
△ Less
Submitted 14 August, 2013;
originally announced August 2013.
-
On the Resilience of Bipartite Networks
Authors:
Shelby Heinecke,
Will Perkins,
Lev Reyzin
Abstract:
Motivated by problems modeling the spread of infections in networks, in this paper we explore which bipartite graphs are most resilient to widespread infections under various parameter settings. Namely, we study bipartite networks with a requirement of a minimum degree $d$ on one side under an independent infection, independent transmission model. We completely characterize the optimal graphs in t…
▽ More
Motivated by problems modeling the spread of infections in networks, in this paper we explore which bipartite graphs are most resilient to widespread infections under various parameter settings. Namely, we study bipartite networks with a requirement of a minimum degree $d$ on one side under an independent infection, independent transmission model. We completely characterize the optimal graphs in the case $d=1$, which already produces non-trivial behavior, and we give extremal results for the more general cases. We show that in the case $d=2$, surprisingly, the optimally resilient set of graphs includes a graph that is not one of the two "extremes" found in the case $d=1$.
Then, we briefly examine the case where we force a connectivity requirement instead of a one-sided degree requirement and again, we find that the set of the most resilient graphs contains more than the two "extremes." We also show that determining the subgraph of an arbitrary bipartite graph most resilient to infection is NP-hard for any one-sided minimal degree $d \ge 1$.
△ Less
Submitted 8 January, 2018; v1 submitted 24 June, 2013;
originally announced June 2013.
-
Statistical Algorithms and a Lower Bound for Detecting Planted Clique
Authors:
Vitaly Feldman,
Elena Grigorescu,
Lev Reyzin,
Santosh Vempala,
Ying Xiao
Abstract:
We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing…
▽ More
We introduce a framework for proving lower bounds on computational problems over distributions against algorithms that can be implemented using access to a statistical query oracle. For such algorithms, access to the input distribution is limited to obtaining an estimate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than directly accessing samples. Most natural algorithms of interest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex optimization, MCMC and simulated annealing can be implemented in this framework. Our framework is based on, and generalizes, the statistical query model in learning theory (Kearns, 1998).
Our main application is a nearly optimal lower bound on the complexity of any statistical query algorithm for detecting planted bipartite clique distributions (or planted dense subgraph distributions) when the planted clique has size $O(n^{1/2-δ})$ for any constant $δ> 0$. The assumed hardness of variants of these problems has been used to prove hardness of several other problems and as a guarantee for security in cryptographic applications. Our lower bounds provide concrete evidence of hardness, thus supporting these assumptions.
△ Less
Submitted 14 August, 2016; v1 submitted 5 January, 2012;
originally announced January 2012.
-
Data Stability in Clustering: A Closer Look
Authors:
Shalev Ben-David,
Lev Reyzin
Abstract:
We consider the model introduced by Bilu and Linial (2010), who study problems for which the optimal clustering does not change when distances are perturbed. They show that even when a problem is NP-hard, it is sometimes possible to obtain efficient algorithms for instances resilient to certain multiplicative perturbations, e.g. on the order of $O(\sqrt{n})$ for max-cut clustering. Awasthi et al.…
▽ More
We consider the model introduced by Bilu and Linial (2010), who study problems for which the optimal clustering does not change when distances are perturbed. They show that even when a problem is NP-hard, it is sometimes possible to obtain efficient algorithms for instances resilient to certain multiplicative perturbations, e.g. on the order of $O(\sqrt{n})$ for max-cut clustering. Awasthi et al. (2010) consider center-based objectives, and Balcan and Liang (2011) analyze the $k$-median and min-sum objectives, giving efficient algorithms for instances resilient to certain constant multiplicative perturbations.
Here, we are motivated by the question of to what extent these assumptions can be relaxed while allowing for efficient algorithms. We show there is little room to improve these results by giving NP-hardness lower bounds for both the $k$-median and min-sum objectives. On the other hand, we show that constant multiplicative resilience parameters can be so strong as to make the clustering problem trivial, leaving only a narrow range of resilience parameters for which clustering is interesting. We also consider a model of additive perturbations and give a correspondence between additive and multiplicative notions of stability. Our results provide a close examination of the consequences of assuming stability in data.
△ Less
Submitted 29 August, 2014; v1 submitted 12 July, 2011;
originally announced July 2011.
-
Efficient Optimal Learning for Contextual Bandits
Authors:
Miroslav Dudik,
Daniel Hsu,
Satyen Kale,
Nikos Karampatziakis,
John Langford,
Lev Reyzin,
Tong Zhang
Abstract:
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classificati…
▽ More
We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time $\mathrm{polylog}(N)$, where $N$ is the number of classification rules among which the oracle might choose. This is exponentially faster than all previous algorithms that achieve optimal regret in this setting. Our formulation also enables us to create an algorithm with regret that is additive rather than multiplicative in feedback delay as in all previous work.
△ Less
Submitted 12 June, 2011;
originally announced June 2011.
-
Contextual Bandit Algorithms with Supervised Learning Guarantees
Authors:
Alina Beygelzimer,
John Langford,
Lihong Li,
Lev Reyzin,
Robert E. Schapire
Abstract:
We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most…
▽ More
We address the problem of learning in an online, bandit setting where the learner must repeatedly select among $K$ actions, but only receives partial feedback based on its choices. We establish two new facts: First, using a new algorithm called Exp4.P, we show that it is possible to compete with the best in a set of $N$ experts with probability $1-δ$ while incurring regret at most $O(\sqrt{KT\ln(N/δ)})$ over $T$ time steps. The new algorithm is tested empirically in a large-scale, real-world dataset. Second, we give a new algorithm called VE that competes with a possibly infinite set of policies of VC-dimension $d$ while incurring regret at most $O(\sqrt{T(d\ln(T) + \ln (1/δ))})$ with probability $1-δ$. These guarantees improve on those of all previous algorithms, whether in a stochastic or adversarial environment, and bring us closer to providing supervised learning type guarantees for the contextual bandit setting.
△ Less
Submitted 27 October, 2011; v1 submitted 22 February, 2010;
originally announced February 2010.
-
An Improved Robust Fuzzy Extractor
Authors:
Bhavana Kanukurthi,
Leonid Reyzin
Abstract:
We consider the problem of building robust fuzzy extractors, which allow two parties holding similar random variables W, W' to agree on a secret key R in the presence of an active adversary. Robust fuzzy extractors were defined by Dodis et al. in Crypto 2006 to be noninteractive, i.e., only one message P, which can be modified by an unbounded adversary, can pass from one party to the other. This…
▽ More
We consider the problem of building robust fuzzy extractors, which allow two parties holding similar random variables W, W' to agree on a secret key R in the presence of an active adversary. Robust fuzzy extractors were defined by Dodis et al. in Crypto 2006 to be noninteractive, i.e., only one message P, which can be modified by an unbounded adversary, can pass from one party to the other. This allows them to be used by a single party at different points in time (e.g., for key recovery or biometric authentication), but also presents an additional challenge: what if R is used, and thus possibly observed by the adversary, before the adversary has a chance to modify P. Fuzzy extractors secure against such a strong attack are called post-application robust.
We construct a fuzzy extractor with post-application robustness that extracts a shared secret key of up to (2m-n)/2 bits (depending on error-tolerance and security parameters), where n is the bit-length and m is the entropy of W. The previously best known result, also of Dodis et al., extracted up to (2m-n)/3 bits (depending on the same parameters).
△ Less
Submitted 8 August, 2008; v1 submitted 4 July, 2008;
originally announced July 2008.
-
Upper and Lower Bounds on Black-Box Steganography
Authors:
Nenad Dedić,
Gene Itkis,
Leonid Reyzin,
Scott Russell
Abstract:
We study the limitations of steganography when the sender is not using any properties of the underlying channel beyond its entropy and the ability to sample from it. On the negative side, we show that the number of samples the sender must obtain from the channel is exponential in the rate of the stegosystem. On the positive side, we present the first secret-key stegosystem that essentially match…
▽ More
We study the limitations of steganography when the sender is not using any properties of the underlying channel beyond its entropy and the ability to sample from it. On the negative side, we show that the number of samples the sender must obtain from the channel is exponential in the rate of the stegosystem. On the positive side, we present the first secret-key stegosystem that essentially matches this lower bound regardless of the entropy of the underlying channel. Furthermore, for high-entropy channels, we present the first secret-key stegosystem that matches this lower bound statelessly (i.e., without requiring synchronized state between sender and receiver).
△ Less
Submitted 4 June, 2008;
originally announced June 2008.
-
Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data
Authors:
Yevgeniy Dodis,
Rafail Ostrovsky,
Leonid Reyzin,
Adam Smith
Abstract:
We provide formal definitions and efficient secure techniques for
- turning noisy information into keys usable for any cryptographic application, and, in particular,
- reliably and securely authenticating biometric data.
Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) n…
▽ More
We provide formal definitions and efficient secure techniques for
- turning noisy information into keys usable for any cryptographic application, and, in particular,
- reliably and securely authenticating biometric data.
Our techniques apply not just to biometric information, but to any keying material that, unlike traditional cryptographic keys, is (1) not reproducible precisely and (2) not distributed uniformly. We propose two primitives: a "fuzzy extractor" reliably extracts nearly uniform randomness R from its input; the extraction is error-tolerant in the sense that R will be the same even if the input changes, as long as it remains reasonably close to the original. Thus, R can be used as a key in a cryptographic application. A "secure sketch" produces public information about its input w that does not reveal w, and yet allows exact recovery of w given another value that is close to w. Thus, it can be used to reliably reproduce error-prone biometric inputs without incurring the security risk inherent in storing them.
We define the primitives to be both formally secure and versatile, generalizing much prior work. In addition, we provide nearly optimal constructions of both primitives for various measures of ``closeness'' of input data, such as Hamming distance, edit distance, and set difference.
△ Less
Submitted 1 April, 2008; v1 submitted 4 February, 2006;
originally announced February 2006.