-
Euclidean distance compression via deep random features
Authors:
Brett Leroux,
Luis Rademacher
Abstract:
Motivated by the problem of compressing point sets into as few bits as possible while maintaining information about approximate distances between points, we construct random nonlinear maps $\varphi_\ell$ that compress point sets in the following way. For a point set $S$, the map $\varphi_\ell:\mathbb{R}^d \to N^{-1/2}\{-1,1\}^N$ has the property that storing $\varphi_\ell(S)$ (a \emph{sketch} of…
▽ More
Motivated by the problem of compressing point sets into as few bits as possible while maintaining information about approximate distances between points, we construct random nonlinear maps $\varphi_\ell$ that compress point sets in the following way. For a point set $S$, the map $\varphi_\ell:\mathbb{R}^d \to N^{-1/2}\{-1,1\}^N$ has the property that storing $\varphi_\ell(S)$ (a \emph{sketch} of $S$) allows one to report pairwise squared distances between points in $S$ up to some multiplicative $(1\pm ε)$ error with high probability as long as the minimum distance is not too small compared to $ε$. The maps $\varphi_\ell$ are the $\ell$-fold composition of a certain type of random feature mapping. Moreover, we determine how large $N$ needs to be as a function of $ε$ and other parameters of the point set.
Compared to existing techniques, our maps offer several advantages. The standard method for compressing point sets by random mappings relies on the Johnson-Lindenstrauss lemma which implies that if a set of $n$ points is mapped by a Gaussian random matrix to $\mathbb{R}^k$ with $k =Θ(ε^{-2}\log n)$, then pairwise distances between points are preserved up to a multiplicative $(1\pm ε)$ error with high probability. The main advantage of our maps $\varphi_\ell$ over random linear maps is that ours map point sets directly into the discrete cube $N^{-1/2}\{-1,1\}^N$ and so there is no additional step needed to convert the sketch to bits. For some range of parameters, our maps $\varphi_\ell$ produce sketches which require fewer bits of storage space.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
On the Nystrom Approximation for Preconditioning in Kernel Machines
Authors:
Amirhesam Abedsoltan,
Parthe Pandit,
Luis Rademacher,
Mikhail Belkin
Abstract:
Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral precondi…
▽ More
Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nystrom-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads.
△ Less
Submitted 24 January, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
The smoothed complexity of Frank-Wolfe methods via conditioning of random matrices and polytopes
Authors:
Luis Rademacher,
Chang Shu
Abstract:
Frank-Wolfe methods are popular for optimization over a polytope. One of the reasons is because they do not need projection onto the polytope but only linear optimization over it. To understand its complexity, Lacoste-Julien and Jaggi introduced a condition number for polytopes and showed linear convergence for several variations of the method. The actual running time can still be exponential in t…
▽ More
Frank-Wolfe methods are popular for optimization over a polytope. One of the reasons is because they do not need projection onto the polytope but only linear optimization over it. To understand its complexity, Lacoste-Julien and Jaggi introduced a condition number for polytopes and showed linear convergence for several variations of the method. The actual running time can still be exponential in the worst case (when the condition number is exponential). We study the smoothed complexity of the condition number, namely the condition number of small random perturbations of the input polytope and show that it is polynomial for any simplex and exponential for general polytopes. Our results also apply to other condition measures of polytopes that have been proposed for the analysis of Frank-Wolfe methods: vertex-facet distance (Beck and Shtern) and facial distance (Peña and Rodríguez).
Our argument for polytopes is a refinement of an argument that we develop to study the conditioning of random matrices. The basic argument shows that for $c>1$ a $d$-by-$n$ random Gaussian matrix with $n \geq cd$ has a $d$-by-$d$ submatrix with minimum singular value that is exponentially small with high probability. This has consequences on results about the robust uniqueness of tensor decompositions.
△ Less
Submitted 24 November, 2020; v1 submitted 26 September, 2020;
originally announced September 2020.
-
Overcomplete order-3 tensor decomposition, blind deconvolution and Gaussian mixture models
Authors:
Haolin Chen,
Luis Rademacher
Abstract:
We propose a new algorithm for tensor decomposition, based on Jennrich's algorithm, and apply our new algorithmic ideas to blind deconvolution and Gaussian mixture models. Our first contribution is a simple and efficient algorithm to decompose certain symmetric overcomplete order-3 tensors, that is, three dimensional arrays of the form $T = \sum_{i=1}^n a_i \otimes a_i \otimes a_i$ where the…
▽ More
We propose a new algorithm for tensor decomposition, based on Jennrich's algorithm, and apply our new algorithmic ideas to blind deconvolution and Gaussian mixture models. Our first contribution is a simple and efficient algorithm to decompose certain symmetric overcomplete order-3 tensors, that is, three dimensional arrays of the form $T = \sum_{i=1}^n a_i \otimes a_i \otimes a_i$ where the $a_i$s are not linearly independent.Our algorithm comes with a detailed robustness analysis. Our second contribution builds on top of our tensor decomposition algorithm to expand the family of Gaussian mixture models whose parameters can be estimated efficiently. These ideas are also presented in a more general framework of blind deconvolution that makes them applicable to mixture models of identical but very general distributions, including all centrally symmetric distributions with finite 6th moment.
△ Less
Submitted 19 February, 2021; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Algebraic $k$-sets and generally neighborly embeddings
Authors:
Brett Leroux,
Luis Rademacher
Abstract:
Given a set $S$ of $n$ points in $\mathbb{R}^d$, a $k$-set is a subset of $k$ points of $S$ that can be strictly separated by a hyperplane from the remaining $n-k$ points. Similarly, one may consider $k$-facets, which are hyperplanes that pass through $d$ points of $S$ and have $k$ points on one side. A notorious open problem is to determine the asymptotics of the maximum number of $k$-sets. In th…
▽ More
Given a set $S$ of $n$ points in $\mathbb{R}^d$, a $k$-set is a subset of $k$ points of $S$ that can be strictly separated by a hyperplane from the remaining $n-k$ points. Similarly, one may consider $k$-facets, which are hyperplanes that pass through $d$ points of $S$ and have $k$ points on one side. A notorious open problem is to determine the asymptotics of the maximum number of $k$-sets. In this paper we study a variation on the $k$-set/$k$-facet problem with hyperplanes replaced by algebraic surfaces. In stark contrast to the original $k$-set/$k$-facet problem, there are some natural families of algebraic curves for which the number of $k$-facets can be counted exactly. For example, we show that the number of halving conic sections for any set of $2n+5$ points in general position in the plane is $2\binom{n+2}{2}^2$. To understand the limits of our argument we study a class of maps we call \emph{generally neighborly embeddings}, which map generic point sets into neighborly position. Additionally, we give a simple argument which improves the best known bound on the number of $k$-sets/$k$-facets for point sets in convex position.
△ Less
Submitted 14 August, 2021; v1 submitted 9 December, 2019;
originally announced December 2019.
-
The Minimum Euclidean-Norm Point on a Convex Polytope: Wolfe's Combinatorial Algorithm is Exponential
Authors:
Jesus De Loera,
Jamie Haddock,
Luis Rademacher
Abstract:
The complexity of Philip Wolfe's method for the minimum Euclidean-norm point problem over a convex polytope has remained unknown since he proposed the method in 1974. The method is important because it is used as a subroutine for one of the most practical algorithms for submodular function minimization. We present the first example that Wolfe's method takes exponential time. Additionally, we impro…
▽ More
The complexity of Philip Wolfe's method for the minimum Euclidean-norm point problem over a convex polytope has remained unknown since he proposed the method in 1974. The method is important because it is used as a subroutine for one of the most practical algorithms for submodular function minimization. We present the first example that Wolfe's method takes exponential time. Additionally, we improve previous results to show that linear programming reduces in strongly-polynomial time to the minimum norm point problem over a simplex.
△ Less
Submitted 3 November, 2017; v1 submitted 6 October, 2017;
originally announced October 2017.
-
Heavy-Tailed Analogues of the Covariance Matrix for ICA
Authors:
Joseph Anderson,
Navin Goyal,
Anupama Nandi,
Luis Rademacher
Abstract:
Independent Component Analysis (ICA) is the problem of learning a square matrix $A$, given samples of $X=AS$, where $S$ is a random vector with independent coordinates. Most existing algorithms are provably efficient only when each $S_i$ has finite and moderately valued fourth moment. However, there are practical applications where this assumption need not be true, such as speech and finance. Algo…
▽ More
Independent Component Analysis (ICA) is the problem of learning a square matrix $A$, given samples of $X=AS$, where $S$ is a random vector with independent coordinates. Most existing algorithms are provably efficient only when each $S_i$ has finite and moderately valued fourth moment. However, there are practical applications where this assumption need not be true, such as speech and finance. Algorithms have been proposed for heavy-tailed ICA, but they are not practical, using random walks and the full power of the ellipsoid algorithm multiple times. The main contributions of this paper are:
(1) A practical algorithm for heavy-tailed ICA that we call HTICA. We provide theoretical guarantees and show that it outperforms other algorithms in some heavy-tailed regimes, both on real and synthetic data. Like the current state-of-the-art, the new algorithm is based on the centroid body (a first moment analogue of the covariance matrix). Unlike the state-of-the-art, our algorithm is practically efficient. To achieve this, we use explicit analytic representations of the centroid body, which bypasses the use of the ellipsoid method and random walks.
(2) We study how heavy tails affect different ICA algorithms, including HTICA. Somewhat surprisingly, we show that some algorithms that use the covariance matrix or higher moments can successfully solve a range of ICA instances with infinite second moment. We study this theoretically and experimentally, with both synthetic and real-world heavy-tailed data.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Heavy-tailed Independent Component Analysis
Authors:
Joseph Anderson,
Navin Goyal,
Anupama Nandi,
Luis Rademacher
Abstract:
Independent component analysis (ICA) is the problem of efficiently recovering a matrix $A \in \mathbb{R}^{n\times n}$ from i.i.d. observations of $X=AS$ where $S \in \mathbb{R}^n$ is a random vector with mutually independent coordinates. This problem has been intensively studied, but all existing efficient algorithms with provable guarantees require that the coordinates $S_i$ have finite fourth mo…
▽ More
Independent component analysis (ICA) is the problem of efficiently recovering a matrix $A \in \mathbb{R}^{n\times n}$ from i.i.d. observations of $X=AS$ where $S \in \mathbb{R}^n$ is a random vector with mutually independent coordinates. This problem has been intensively studied, but all existing efficient algorithms with provable guarantees require that the coordinates $S_i$ have finite fourth moments. We consider the heavy-tailed ICA problem where we do not make this assumption, about the second moment. This problem also has received considerable attention in the applied literature. In the present work, we first give a provably efficient algorithm that works under the assumption that for constant $γ> 0$, each $S_i$ has finite $(1+γ)$-moment, thus substantially weakening the moment requirement condition for the ICA problem to be solvable. We then give an algorithm that works under the assumption that matrix $A$ has orthogonal columns but requires no moment assumptions. Our techniques draw ideas from convex geometry and exploit standard properties of the multivariate spherical Gaussian distribution in a novel way.
△ Less
Submitted 2 September, 2015;
originally announced September 2015.
-
A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA
Authors:
James Voss,
Mikhail Belkin,
Luis Rademacher
Abstract:
Independent Component Analysis (ICA) is a popular model for blind signal separation. The ICA model assumes that a number of independent source signals are linearly mixed to form the observed signals. We propose a new algorithm, PEGI (for pseudo-Euclidean Gradient Iteration), for provable model recovery for ICA with Gaussian noise. The main technical innovation of the algorithm is to use a fixed po…
▽ More
Independent Component Analysis (ICA) is a popular model for blind signal separation. The ICA model assumes that a number of independent source signals are linearly mixed to form the observed signals. We propose a new algorithm, PEGI (for pseudo-Euclidean Gradient Iteration), for provable model recovery for ICA with Gaussian noise. The main technical innovation of the algorithm is to use a fixed point iteration in a pseudo-Euclidean (indefinite "inner product") space. The use of this indefinite "inner product" resolves technical issues common to several existing algorithms for noisy ICA. This leads to an algorithm which is conceptually simple, efficient and accurate in testing.
Our second contribution is combining PEGI with the analysis of objectives for optimal recovery in the noisy ICA model. It has been observed that the direct approach of demixing with the inverse of the mixing matrix is suboptimal for signal recovery in terms of the natural Signal to Interference plus Noise Ratio (SINR) criterion. There have been several partial solutions proposed in the ICA literature. It turns out that any solution to the mixing matrix reconstruction problem can be used to construct an SINR-optimal ICA demixing, despite the fact that SINR itself cannot be computed from data. That allows us to obtain a practical and provably SINR-optimal recovery method for ICA with arbitrary Gaussian noise.
△ Less
Submitted 1 October, 2015; v1 submitted 13 February, 2015;
originally announced February 2015.
-
Query complexity of sampling and small geometric partitions
Authors:
Navin Goyal,
Luis Rademacher,
Santosh Vempala
Abstract:
In this paper we study the following problem:
Discrete partitioning problem (DPP): Let $\mathbb{F}_q P^n$ denote the $n$-dimensional finite projective space over $\mathbb{F}_q$. For positive integer $k \leq n$, let $\{ A^i\}_{i=1}^N$ be a partition of $(\mathbb{F}_q P^n)^k$ such that
(1) for all $i \leq N$, $A^i = \prod_{j=1}^k A^i_j$ (partition into product sets),
(2) for all $i \leq N$, th…
▽ More
In this paper we study the following problem:
Discrete partitioning problem (DPP): Let $\mathbb{F}_q P^n$ denote the $n$-dimensional finite projective space over $\mathbb{F}_q$. For positive integer $k \leq n$, let $\{ A^i\}_{i=1}^N$ be a partition of $(\mathbb{F}_q P^n)^k$ such that
(1) for all $i \leq N$, $A^i = \prod_{j=1}^k A^i_j$ (partition into product sets),
(2) for all $i \leq N$, there is a $(k-1)$-dimensional subspace $L^i \subseteq \mathbb{F}_q P^n$ such that $A^i \subseteq (L^i)^k$.
What is the minimum value of $N$ as a function of $q,n,k$? We will be mainly interested in the case $k=n$.
△ Less
Submitted 14 November, 2014;
originally announced November 2014.
-
Eigenvectors of Orthogonally Decomposable Functions
Authors:
Mikhail Belkin,
Luis Rademacher,
James Voss
Abstract:
The Eigendecomposition of quadratic forms (symmetric matrices) guaranteed by the spectral theorem is a foundational result in applied mathematics. Motivated by a shared structure found in inferential problems of recent interest---namely orthogonal tensor decompositions, Independent Component Analysis (ICA), topic models, spectral clustering, and Gaussian mixture learning---we generalize the eigend…
▽ More
The Eigendecomposition of quadratic forms (symmetric matrices) guaranteed by the spectral theorem is a foundational result in applied mathematics. Motivated by a shared structure found in inferential problems of recent interest---namely orthogonal tensor decompositions, Independent Component Analysis (ICA), topic models, spectral clustering, and Gaussian mixture learning---we generalize the eigendecomposition from quadratic forms to a broad class of "orthogonally decomposable" functions. We identify a key role of convexity in our extension, and we generalize two traditional characterizations of eigenvectors: First, the eigenvectors of a quadratic form arise from the optima structure of the quadratic form on the sphere. Second, the eigenvectors are the fixed points of the power iteration.
In our setting, we consider a simple first order generalization of the power method which we call gradient iteration. It leads to efficient and easily implementable methods for basis recovery. It includes influential Machine Learning methods such as cumulant-based FastICA and the tensor power iteration for orthogonally decomposable tensors as special cases.
We provide a complete theoretical analysis of gradient iteration using the structure theory of discrete dynamical systems to show almost sure convergence and fast (super-linear) convergence rates. The analysis also extends to the case when the observed function is only approximately orthogonally decomposable, with bounds that are polynomial in dimension and other relevant parameters, such as perturbation size. Our perturbation results can be considered as a non-linear version of the classical Davis-Kahan theorem for perturbations of eigenvectors of symmetric matrices.
△ Less
Submitted 22 February, 2018; v1 submitted 5 November, 2014;
originally announced November 2014.
-
The Hidden Convexity of Spectral Clustering
Authors:
James Voss,
Mikhail Belkin,
Luis Rademacher
Abstract:
In recent years, spectral clustering has become a standard method for data analysis used in a broad range of applications. In this paper we propose a new class of algorithms for multiway spectral clustering based on optimization of a certain "contrast function" over the unit sphere. These algorithms, partly inspired by certain Independent Component Analysis techniques, are simple, easy to implemen…
▽ More
In recent years, spectral clustering has become a standard method for data analysis used in a broad range of applications. In this paper we propose a new class of algorithms for multiway spectral clustering based on optimization of a certain "contrast function" over the unit sphere. These algorithms, partly inspired by certain Independent Component Analysis techniques, are simple, easy to implement and efficient.
Geometrically, the proposed algorithms can be interpreted as hidden basis recovery by means of function optimization. We give a complete characterization of the contrast functions admissible for provable basis recovery. We show how these conditions can be interpreted as a "hidden convexity" of our optimization problem on the sphere; interestingly, we use efficient convex maximization rather than the more common convex minimization. We also show encouraging experimental results on real and simulated data.
△ Less
Submitted 4 May, 2016; v1 submitted 3 March, 2014;
originally announced March 2014.
-
The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures
Authors:
Joseph Anderson,
Mikhail Belkin,
Navin Goyal,
Luis Rademacher,
James Voss
Abstract:
In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimension. More precisely, we prove that a mixture with known identical covariance matrices whose number of components is a polynomial of any fixed degree in the dimension n is polynomially learnable as long as a certain non-degeneracy condition on the means is satisfied. It turns out that this condition…
▽ More
In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimension. More precisely, we prove that a mixture with known identical covariance matrices whose number of components is a polynomial of any fixed degree in the dimension n is polynomially learnable as long as a certain non-degeneracy condition on the means is satisfied. It turns out that this condition is generic in the sense of smoothed complexity, as soon as the dimensionality of the space is high enough. Moreover, we prove that no such condition can possibly exist in low dimension and the problem of learning the parameters is generically hard. In contrast, much of the existing work on Gaussian Mixtures relies on low-dimensional projections and thus hits an artificial barrier. Our main result on mixture recovery relies on a new "Poissonization"-based technique, which transforms a mixture of Gaussians to a linear map of a product distribution. The problem of learning this map can be efficiently solved using some recent results on tensor decompositions and Independent Component Analysis (ICA), thus giving an algorithm for recovering the mixture. In addition, we combine our low-dimensional hardness results for Gaussian mixtures with Poissonization to show how to embed difficult instances of low-dimensional Gaussian mixtures into the ICA setting, thus establishing exponential information-theoretic lower bounds for underdetermined ICA in low dimension. To the best of our knowledge, this is the first such result in the literature. In addition to contributing to the problem of Gaussian mixture learning, we believe that this work is among the first steps toward better understanding the rare phenomenon of the "blessing of dimensionality" in the computational aspects of statistical inference.
△ Less
Submitted 17 February, 2014; v1 submitted 12 November, 2013;
originally announced November 2013.
-
Efficient learning of simplices
Authors:
Joseph Anderson,
Navin Goyal,
Luis Rademacher
Abstract:
We show an efficient algorithm for the following problem: Given uniformly random points from an arbitrary n-dimensional simplex, estimate the simplex. The size of the sample and the number of arithmetic operations of our algorithm are polynomial in n. This answers a question of Frieze, Jerrum and Kannan [FJK]. Our result can also be interpreted as efficiently learning the intersection of n+1 half-…
▽ More
We show an efficient algorithm for the following problem: Given uniformly random points from an arbitrary n-dimensional simplex, estimate the simplex. The size of the sample and the number of arithmetic operations of our algorithm are polynomial in n. This answers a question of Frieze, Jerrum and Kannan [FJK]. Our result can also be interpreted as efficiently learning the intersection of n+1 half-spaces in R^n in the model where the intersection is bounded and we are given polynomially many uniform samples from it. Our proof uses the local search technique from Independent Component Analysis (ICA), also used by [FJK]. Unlike these previous algorithms, which were based on analyzing the fourth moment, ours is based on the third moment.
We also show a direct connection between the problem of learning a simplex and ICA: a simple randomized reduction to ICA from the problem of learning a simplex. The connection is based on a known representation of the uniform measure on a simplex. Similar representations lead to a reduction from the problem of learning an affine transformation of an n-dimensional l_p ball to ICA.
△ Less
Submitted 5 June, 2013; v1 submitted 9 November, 2012;
originally announced November 2012.
-
Blind Signal Separation in the Presence of Gaussian Noise
Authors:
Mikhail Belkin,
Luis Rademacher,
James Voss
Abstract:
A prototypical blind signal separation problem is the so-called cocktail party problem, with n people talking simultaneously and n different microphones within a room. The goal is to recover each speech signal from the microphone inputs. Mathematically this can be modeled by assuming that we are given samples from an n-dimensional random variable X=AS, where S is a vector whose coordinates are ind…
▽ More
A prototypical blind signal separation problem is the so-called cocktail party problem, with n people talking simultaneously and n different microphones within a room. The goal is to recover each speech signal from the microphone inputs. Mathematically this can be modeled by assuming that we are given samples from an n-dimensional random variable X=AS, where S is a vector whose coordinates are independent random variables corresponding to each speaker. The objective is to recover the matrix A^{-1} given random samples from X. A range of techniques collectively known as Independent Component Analysis (ICA) have been proposed to address this problem in the signal processing and machine learning literature. Many of these techniques are based on using the kurtosis or other cumulants to recover the components.
In this paper we propose a new algorithm for solving the blind signal separation problem in the presence of additive Gaussian noise, when we are given samples from X=AS+η, where ηis drawn from an unknown, not necessarily spherical n-dimensional Gaussian distribution. Our approach is based on a method for decorrelating a sample with additive Gaussian noise under the assumption that the underlying distribution is a linear transformation of a distribution with independent components. Our decorrelation routine is based on the properties of cumulant tensors and can be combined with any standard cumulant-based method for ICA to get an algorithm that is provably robust in the presence of Gaussian noise. We derive polynomial bounds for the sample complexity and error propagation of our method.
△ Less
Submitted 9 June, 2013; v1 submitted 7 November, 2012;
originally announced November 2012.
-
Lower Bounds for the Average and Smoothed Number of Pareto Optima
Authors:
Navin Goyal,
Luis Rademacher
Abstract:
Smoothed analysis of multiobjective 0-1 linear optimization has drawn considerable attention recently. The number of Pareto-optimal solutions (i.e., solutions with the property that no other solution is at least as good in all the coordinates and better in at least one) for multiobjective optimization problems is the central object of study. In this paper, we prove several lower bounds for the exp…
▽ More
Smoothed analysis of multiobjective 0-1 linear optimization has drawn considerable attention recently. The number of Pareto-optimal solutions (i.e., solutions with the property that no other solution is at least as good in all the coordinates and better in at least one) for multiobjective optimization problems is the central object of study. In this paper, we prove several lower bounds for the expected number of Pareto optima. Our basic result is a lower bound of Ω_d(n^(d-1)) for optimization problems with d objectives and n variables under fairly general conditions on the distributions of the linear objectives. Our proof relates the problem of lower bounding the number of Pareto optima to results in geometry connected to arrangements of hyperplanes. We use our basic result to derive (1) To our knowledge, the first lower bound for natural multiobjective optimization problems. We illustrate this for the maximum spanning tree problem with randomly chosen edge weights. Our technique is sufficiently flexible to yield such lower bounds for other standard objective functions studied in this setting (such as, multiobjective shortest path, TSP tour, matching). (2) Smoothed lower bound of min {Ω_d(n^(d-1.5) φ^{(d-log d) (1-Θ(1/φ))}), 2^{Θ(n)}}$ for the 0-1 knapsack problem with d profits for phi-semirandom distributions for a version of the knapsack problem. This improves the recent lower bound of Brunsch and Roeglin.
△ Less
Submitted 19 July, 2011;
originally announced July 2011.
-
Efficient volume sampling for row/column subset selection
Authors:
Amit Deshpande,
Luis Rademacher
Abstract:
We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala. Our first…
▽ More
We give efficient algorithms for volume sampling, i.e., for picking $k$-subsets of the rows of any given matrix with probabilities proportional to the squared volumes of the simplices defined by them and the origin (or the squared volumes of the parallelepipeds defined by these subsets of rows). This solves an open problem from the monograph on spectral algorithms by Kannan and Vempala. Our first algorithm for volume sampling $k$-subsets of rows from an $m$-by-$n$ matrix runs in $O(kmn^ω \log n)$ arithmetic operations and a second variant of it for $(1+ε)$-approximate volume sampling runs in $O(mn \log m \cdot k^{2}/ε^{2} + m \log^ω m \cdot k^{2ω+1}/ε^{2ω} \cdot \log(k ε^{-1} \log m))$ arithmetic operations, which is almost linear in the size of the input (i.e., the number of entries) for small $k$. Our efficient volume sampling algorithms imply several interesting results for low-rank matrix approximation.
△ Less
Submitted 23 April, 2010;
originally announced April 2010.
-
Learning convex bodies is hard
Authors:
Navin Goyal,
Luis Rademacher
Abstract:
We show that learning a convex body in $\RR^d$, given random samples from the body, requires $2^{Ω(\sqrt{d/\eps})}$ samples. By learning a convex body we mean finding a set having at most $\eps$ relative symmetric difference with the input body. To prove the lower bound we construct a hard to learn family of convex bodies. Our construction of this family is very simple and based on error correct…
▽ More
We show that learning a convex body in $\RR^d$, given random samples from the body, requires $2^{Ω(\sqrt{d/\eps})}$ samples. By learning a convex body we mean finding a set having at most $\eps$ relative symmetric difference with the input body. To prove the lower bound we construct a hard to learn family of convex bodies. Our construction of this family is very simple and based on error correcting codes.
△ Less
Submitted 7 April, 2009;
originally announced April 2009.
-
Expanders via Random Spanning Trees
Authors:
Navin Goyal,
Luis Rademacher,
Santosh Vempala
Abstract:
Motivated by the problem of routing reliably and scalably in a graph, we introduce the notion of a splicer, the union of spanning trees of a graph. We prove that for any bounded-degree n-vertex graph, the union of two random spanning trees approximates the expansion of every cut of the graph to within a factor of O(log n). For the random graph G_{n,p}, for p> c log{n}/n, two spanning trees give…
▽ More
Motivated by the problem of routing reliably and scalably in a graph, we introduce the notion of a splicer, the union of spanning trees of a graph. We prove that for any bounded-degree n-vertex graph, the union of two random spanning trees approximates the expansion of every cut of the graph to within a factor of O(log n). For the random graph G_{n,p}, for p> c log{n}/n, two spanning trees give an expander. This is suggested by the case of the complete graph, where we prove that two random spanning trees give an expander. The construction of the splicer is elementary -- each spanning tree can be produced independently using an algorithm by Aldous and Broder: a random walk in the graph with edges leading to previously unvisited vertices included in the tree.
A second important application of splicers is to graph sparsification where the goal is to approximate every cut (and more generally the quadratic form of the Laplacian) using only a small subgraph of the original graph. Benczur-Karger as well as Spielman-Srivastava have shown sparsifiers with O(n log n/eps^2)$ edges that achieve approximation within factors 1+eps and 1-eps. Their methods, based on independent sampling of edges, need Omega(n log n) edges to get any approximation (else the subgraph could be disconnected) and leave open the question of linear-size sparsifiers. Splicers address this question for random graphs by providing sparsifiers of size O(n) that approximate every cut to within a factor of O(log n).
△ Less
Submitted 9 July, 2008;
originally announced July 2008.
-
Dispersion of Mass and the Complexity of Randomized Geometric Algorithms
Authors:
Luis Rademacher,
Santosh Vempala
Abstract:
How much can randomness help computation? Motivated by this general question and by volume computation, one of the few instances where randomness provably helps, we analyze a notion of dispersion and connect it to asymptotic convex geometry. We obtain a nearly quadratic lower bound on the complexity of randomized volume algorithms for convex bodies in R^n (the current best algorithm has complexi…
▽ More
How much can randomness help computation? Motivated by this general question and by volume computation, one of the few instances where randomness provably helps, we analyze a notion of dispersion and connect it to asymptotic convex geometry. We obtain a nearly quadratic lower bound on the complexity of randomized volume algorithms for convex bodies in R^n (the current best algorithm has complexity roughly n^4, conjectured to be n^3). Our main tools, dispersion of random determinants and dispersion of the length of a random point from a convex body, are of independent interest and applicable more generally; in particular, the latter is closely related to the variance hypothesis from convex geometry. This geometric dispersion also leads to lower bounds for matrix problems and property testing.
△ Less
Submitted 17 June, 2008; v1 submitted 12 August, 2006;
originally announced August 2006.