-
Towards deployment-centric multimodal AI beyond vision and language
Authors:
Xianyuan Liu,
Jiayang Zhang,
Shuo Zhou,
Thijs L. van der Plas,
Avish Vijayaraghavan,
Anastasiia Grishina,
Mengdie Zhuang,
Daniel Schofield,
Christopher Tomlinson,
Yuhan Wang,
Ruizhe Li,
Louisa van Zeeland,
Sina Tabakhi,
Cyndie Demeocq,
Xiang Li,
Arunav Das,
Orlando Timmerman,
Thomas Baldwin-McDonald,
Jinge Wu,
Peizhen Bai,
Zahraa Al Sahili,
Omnia Alwazzan,
Thao N. Do,
Mohammod N. I. Suvon,
Angeline Wang
, et al. (23 additional authors not shown)
Abstract:
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that in…
▽ More
Multimodal artificial intelligence (AI) integrates diverse types of data via machine learning to improve understanding, prediction, and decision-making across disciplines such as healthcare, science, and engineering. However, most multimodal AI advances focus on models for vision and language data, while their deployability remains a key challenge. We advocate a deployment-centric workflow that incorporates deployment constraints early to reduce the likelihood of undeployable solutions, complementing data-centric and model-centric approaches. We also emphasise deeper integration across multiple levels of multimodality and multidisciplinary collaboration to significantly broaden the research scope beyond vision and language. To facilitate this approach, we identify common multimodal-AI-specific challenges shared across disciplines and examine three real-world use cases: pandemic response, self-driving car design, and climate change adaptation, drawing expertise from healthcare, social science, engineering, science, sustainability, and finance. By fostering multidisciplinary dialogue and open research practices, our community can accelerate deployment-centric development for broad societal impact.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Computing High-dimensional Confidence Sets for Arbitrary Distributions
Authors:
Chao Gao,
Liren Shan,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $δ$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $δ$ coverage of $D$, i.e., $\mathbb{P}_{y \sim D} \left[ y \in S \right] \ge δ$, and the volume of $S$ is as small as pos…
▽ More
We study the problem of learning a high-density region of an arbitrary distribution over $\mathbb{R}^d$. Given a target coverage parameter $δ$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S \subset \mathbb{R}^d$ such that $S$ achieves $δ$ coverage of $D$, i.e., $\mathbb{P}_{y \sim D} \left[ y \in S \right] \ge δ$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in finding confidence sets, uncertainty quantification, and support estimation.
In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $C$ with bounded VC-dimension. An algorithm is competitive with class $C$ if, given samples from an arbitrary distribution $D$, it outputs in polynomial time a set that achieves $δ$ coverage of $D$, and whose volume is competitive with the smallest set in $C$ with the required coverage $δ$. This problem is computationally challenging even in the basic setting when $C$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $\exp(\tilde{O}( d/ \log d))$-factor competitive with the volume of the best ball.
Our main result is an algorithm that finds a confidence set whose volume is $\exp(\tilde{O}(d^{1/2}))$ factor competitive with the optimal ball having the desired coverage. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $\exp(\tilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.
△ Less
Submitted 12 May, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
Volume Optimality in Conformal Prediction with Structured Prediction Sets
Authors:
Chao Gao,
Liren Shan,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove an impossibility of volume optimality where any distribution-free method can only find a trivial solution. We t…
▽ More
Conformal Prediction is a widely studied technique to construct prediction sets of future observations. Most conformal prediction methods focus on achieving the necessary coverage guarantees, but do not provide formal guarantees on the size (volume) of the prediction sets. We first prove an impossibility of volume optimality where any distribution-free method can only find a trivial solution. We then introduce a new notion of volume optimality by restricting the prediction sets to belong to a set family (of finite VC-dimension), specifically a union of $k$-intervals. Our main contribution is an efficient distribution-free algorithm based on dynamic programming (DP) to find a union of $k$-intervals that is guaranteed for any distribution to have near-optimal volume among all unions of $k$-intervals satisfying the desired coverage property. By adopting the framework of distributional conformal prediction (Chernozhukov et al., 2021), the new DP based conformity score can also be applied to achieve approximate conditional coverage and conditional restricted volume optimality, as long as a reasonable estimator of the conditional CDF is available. While the theoretical results already establish volume-optimality guarantees, they are complemented by experiments that demonstrate that our method can significantly outperform existing methods in many settings.
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals
Authors:
Anxin Guo,
Aravindan Vijayaraghavan
Abstract:
We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not understand the basic algorithmic question of whether one arbitrary ReLU neuron is learnable in the non-realizable setting. In particular, all existing polynomi…
▽ More
We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic building block of modern neural networks, we still do not understand the basic algorithmic question of whether one arbitrary ReLU neuron is learnable in the non-realizable setting. In particular, all existing polynomial time algorithms only provide approximation guarantees for the better-behaved unbiased setting or restricted bias setting.
Our main result is a polynomial time statistical query (SQ) algorithm that gives the first constant factor approximation for arbitrary bias. It outputs a ReLU activation that achieves a loss of $O(\mathrm{OPT}) + \varepsilon$ in time $\mathrm{poly}(d,1/\varepsilon)$, where $\mathrm{OPT}$ is the loss obtained by the optimal ReLU activation. Our algorithm presents an interesting departure from existing algorithms, which are all based on gradient descent and thus fall within the class of correlational statistical query (CSQ) algorithms. We complement our algorithmic result by showing that no polynomial time CSQ algorithm can achieve a constant factor approximation. Together, these results shed light on the intrinsic limitation of gradient descent, while identifying arguably the simplest setting (a single neuron) where there is a separation between SQ and CSQ algorithms.
△ Less
Submitted 22 November, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Theoretical Analysis of Weak-to-Strong Generalization
Authors:
Hunter Lang,
David Sontag,
Aravindan Vijayaraghavan
Abstract:
Strong student models can learn from weaker teachers: when trained on the predictions of a weaker model, a strong pretrained student can learn to correct the weak model's errors and generalize to examples where the teacher is not confident, even when these examples are excluded from training. This enables learning from cheap, incomplete, and possibly incorrect label information, such as coarse log…
▽ More
Strong student models can learn from weaker teachers: when trained on the predictions of a weaker model, a strong pretrained student can learn to correct the weak model's errors and generalize to examples where the teacher is not confident, even when these examples are excluded from training. This enables learning from cheap, incomplete, and possibly incorrect label information, such as coarse logical rules or the generations of a language model. We show that existing weak supervision theory fails to account for both of these effects, which we call pseudolabel correction and coverage expansion, respectively. We give a new bound based on expansion properties of the data distribution and student hypothesis class that directly accounts for pseudolabel correction and coverage expansion. Our bounds capture the intuition that weak-to-strong generalization occurs when the strong model is unable to fit the mistakes of the weak teacher without incurring additional error. We show that these expansion properties can be checked from finite data and give empirical evidence that they hold in practice.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Efficient Certificates of Anti-Concentration Beyond Gaussians
Authors:
Ainesh Bakshi,
Pravesh Kothari,
Goutham Rajendran,
Madhur Tulsiani,
Aravindan Vijayaraghavan
Abstract:
A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificate…
▽ More
A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $δ$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq δ$ is at most $O(δ)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points $X$ corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures, yet remain limited to rotationally invariant distributions.
This work presents a new (and arguably the most natural) formulation for anti-concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_p$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. Our approach constructs a canonical integer program for anti-concentration and analysis a sum-of-squares relaxation of it, independent of the intended application. We rely on duality and analyze a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions.
△ Less
Submitted 28 October, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
New Tools for Smoothed Analysis: Least Singular Value Bounds for Random Matrices with Dependent Entries
Authors:
Aditya Bhaskara,
Eric Evert,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
We develop new techniques for proving lower bounds on the least singular value of random matrices with limited randomness. The matrices we consider have entries that are given by polynomials of a few underlying base random variables. This setting captures a core technical challenge for obtaining smoothed analysis guarantees in many algorithmic settings. Least singular value bounds often involve sh…
▽ More
We develop new techniques for proving lower bounds on the least singular value of random matrices with limited randomness. The matrices we consider have entries that are given by polynomials of a few underlying base random variables. This setting captures a core technical challenge for obtaining smoothed analysis guarantees in many algorithmic settings. Least singular value bounds often involve showing strong anti-concentration inequalities that are intricate and much less understood compared to concentration (or large deviation) bounds.
First, we introduce a general technique involving a hierarchical $ε$-nets to prove least singular value bounds. Our second tool is a new statement about least singular values to reason about higher-order lifts of smoothed matrices, and the action of linear operators on them.
Apart from getting simpler proofs of existing smoothed analysis results, we use these tools to now handle more general families of random matrices. This allows us to produce smoothed analysis guarantees in several previously open settings. These include new smoothed analysis guarantees for power sum decompositions, subspace clustering and certifying robust entanglement of subspaces, where prior work could only establish least singular value bounds for fully random instances or only show non-robust genericity guarantees.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Error-Tolerant E-Discovery Protocols
Authors:
Jinshuo Dong,
Jason D. Hartline,
Liren Shan,
Aravindan Vijayaraghavan
Abstract:
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) in the context of electronic discovery (e-discovery). Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged. Our goal is to find a protocol that verifie…
▽ More
We consider the multi-party classification problem introduced by Dong, Hartline, and Vijayaraghavan (2022) in the context of electronic discovery (e-discovery). Based on a request for production from the requesting party, the responding party is required to provide documents that are responsive to the request except for those that are legally privileged. Our goal is to find a protocol that verifies that the responding party sends almost all responsive documents while minimizing the disclosure of non-responsive documents. We provide protocols in the challenging non-realizable setting, where the instance may not be perfectly separated by a linear classifier. We demonstrate empirically that our protocol successfully manages to find almost all relevant documents, while incurring only a small disclosure of non-responsive documents. We complement this with a theoretical analysis of our protocol in the single-dimensional setting, and other experiments on simulated data which suggest that the non-responsive disclosure incurred by our protocol may be unavoidable.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Higher-Order Cheeger Inequality for Partitioning with Buffers
Authors:
Konstantin Makarychev,
Yury Makarychev,
Liren Shan,
Aravindan Vijayaraghavan
Abstract:
We prove a new generalization of the higher-order Cheeger inequality for partitioning with buffers. Consider a graph $G=(V,E)$. The buffered expansion of a set $S \subseteq V$ with a buffer $B \subseteq V \setminus S$ is the edge expansion of $S$ after removing all the edges from set $S$ to its buffer $B$. An $\varepsilon$-buffered $k$-partitioning is a partitioning of a graph into disjoint compon…
▽ More
We prove a new generalization of the higher-order Cheeger inequality for partitioning with buffers. Consider a graph $G=(V,E)$. The buffered expansion of a set $S \subseteq V$ with a buffer $B \subseteq V \setminus S$ is the edge expansion of $S$ after removing all the edges from set $S$ to its buffer $B$. An $\varepsilon$-buffered $k$-partitioning is a partitioning of a graph into disjoint components $P_i$ and buffers $B_i$, in which the size of buffer $B_i$ for $P_i$ is small relative to the size of $P_i$: $|B_i| \le \varepsilon |P_i|$. The buffered expansion of a buffered partition is the maximum of buffered expansions of the $k$ sets $P_i$ with buffers $B_i$. Let $h^{k,\varepsilon}_G$ be the buffered expansion of the optimal $\varepsilon$-buffered $k$-partitioning, then for every $δ>0$, $$h_G^{k,\varepsilon} \le O_δ(1) \cdot \Big( \frac{\log k}{ \varepsilon}\Big) \cdot λ_{\lfloor (1+δ) k\rfloor},$$ where $λ_{\lfloor (1+δ)k\rfloor}$ is the $\lfloor (1+δ)k\rfloor$-th smallest eigenvalue of the normalized Laplacian of $G$.
Our inequality is constructive and avoids the ``square-root loss'' that is present in the standard Cheeger inequalities (even for $k=2$). We also provide a complementary lower bound, and a novel generalization to the setting with arbitrary vertex weights and edge costs. Moreover our result implies and generalizes the standard higher-order Cheeger inequalities and another recent Cheeger-type inequality by Kwok, Lau, and Lee (2017) involving robust vertex expansion.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Minimum Levels of Interpretability for Artificial Moral Agents
Authors:
Avish Vijayaraghavan,
Cosmin Badea
Abstract:
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we…
▽ More
As artificial intelligence (AI) models continue to scale up, they are becoming more capable and integrated into various forms of decision-making systems. For models involved in moral decision-making, also known as artificial moral agents (AMA), interpretability provides a way to trust and understand the agent's internal reasoning mechanisms for effective use and error correction. In this paper, we provide an overview of this rapidly-evolving sub-field of AI interpretability, introduce the concept of the Minimum Level of Interpretability (MLI) and recommend an MLI for various types of agents, to aid their safe deployment in real-world settings.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
Computing linear sections of varieties: quantum entanglement, tensor decompositions and beyond
Authors:
Nathaniel Johnston,
Benjamin Lovitz,
Aravindan Vijayaraghavan
Abstract:
We study the problem of finding elements in the intersection of an arbitrary conic variety in $\mathbb{F}^n$ with a given linear subspace (where $\mathbb{F}$ can be the real or complex field). This problem captures a rich family of algorithmic problems under different choices of the variety. The special case of the variety consisting of rank-1 matrices already has strong connections to central pro…
▽ More
We study the problem of finding elements in the intersection of an arbitrary conic variety in $\mathbb{F}^n$ with a given linear subspace (where $\mathbb{F}$ can be the real or complex field). This problem captures a rich family of algorithmic problems under different choices of the variety. The special case of the variety consisting of rank-1 matrices already has strong connections to central problems in different areas like quantum information theory and tensor decompositions. This problem is known to be NP-hard in the worst case, even for the variety of rank-1 matrices.
Surprisingly, despite these hardness results we develop an algorithm that solves this problem efficiently for "typical" subspaces. Here, the subspace $U \subseteq \mathbb{F}^n$ is chosen generically of a certain dimension, potentially with some generic elements of the variety contained in it. Our main result is a guarantee that our algorithm recovers all the elements of $U$ that lie in the variety, under some mild non-degeneracy assumptions on the variety. As corollaries, we obtain the following new results:
$\bullet$ Polynomial time algorithms for several entangled subspaces problems in quantum entanglement, including determining r-entanglement, complete entanglement, and genuine entanglement of a subspace. While all of these problems are NP-hard in the worst case, our algorithm solves them in polynomial time for generic subspaces of dimension up to a constant multiple of the maximum possible.
$\bullet$ Uniqueness results and polynomial time algorithmic guarantees for generic instances of a broad class of low-rank decomposition problems that go beyond tensor decompositions. Here, we recover a decomposition of the form $\sum_{i=1}^R v_i \otimes w_i$, where the $v_i$ are elements of the variety $X$. This implies new uniqueness results and genericity guarantees even in the special case of tensor decompositions.
△ Less
Submitted 7 May, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound
Authors:
Liam O'Carroll,
Vaidehi Srinivas,
Aravindan Vijayaraghavan
Abstract:
The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. There has been much recent interest in obtaining a better theoretical understanding of the Burer-Monteiro method. When the maximum allowed rank $p$ of the SDP solution is above the Barvinok…
▽ More
The most widely used technique for solving large-scale semidefinite programs (SDPs) in practice is the non-convex Burer-Monteiro method, which explicitly maintains a low-rank SDP solution for memory efficiency. There has been much recent interest in obtaining a better theoretical understanding of the Burer-Monteiro method. When the maximum allowed rank $p$ of the SDP solution is above the Barvinok-Pataki bound (where a globally optimal solution of rank at most $p$ is guaranteed to exist), a recent line of work established convergence to a global optimum for generic or smoothed instances of the problem. However, it was open whether there even exists an instance in this regime where the Burer-Monteiro method fails. We prove that the Burer-Monteiro method can fail for the Max-Cut SDP on $n$ vertices when the rank is above the Barvinok-Pataki bound ($p \ge \sqrt{2n}$). We provide a family of instances that have spurious local minima even when the rank $p = n/2$. Combined with existing guarantees, this settles the question of the existence of spurious local minima for the Max-Cut formulation in all ranges of the rank and justifies the use of beyond worst-case paradigms like smoothed analysis to obtain guarantees for the Burer-Monteiro method.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Classification Protocols with Minimal Disclosure
Authors:
Jinshuo Dong,
Jason Hartline,
Aravindan Vijayaraghavan
Abstract:
We consider multi-party protocols for classification that are motivated by applications such as e-discovery in court proceedings. We identify a protocol that guarantees that the requesting party receives all responsive documents and the sending party discloses the minimal amount of non-responsive documents necessary to prove that all responsive documents have been received. This protocol can be em…
▽ More
We consider multi-party protocols for classification that are motivated by applications such as e-discovery in court proceedings. We identify a protocol that guarantees that the requesting party receives all responsive documents and the sending party discloses the minimal amount of non-responsive documents necessary to prove that all responsive documents have been received. This protocol can be embedded in a machine learning framework that enables automated labeling of points and the resulting multi-party protocol is equivalent to the standard one-party classification problem (if the one-party classification problem satisfies a natural independence-of-irrelevant-alternatives property). Our formal guarantees focus on the case where there is a linear classifier that correctly partitions the documents.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Agnostic Learning of General ReLU Activation Using Gradient Descent
Authors:
Pranjal Awasthi,
Alex Tang,
Aravindan Vijayaraghavan
Abstract:
We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function with moderate bias under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial…
▽ More
We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function with moderate bias under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves an error that is within a constant factor of the optimal error of the best ReLU function with moderate bias. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.
△ Less
Submitted 3 November, 2024; v1 submitted 4 August, 2022;
originally announced August 2022.
-
Training Subset Selection for Weak Supervision
Authors:
Hunter Lang,
Aravindan Vijayaraghavan,
David Sontag
Abstract:
Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We explore this tradeoff by combining pretrained data representations with the cut statistic (Muhlenbach et al…
▽ More
Existing weak supervision approaches use all the data covered by weak signals to train a classifier. We show both theoretically and empirically that this is not always optimal. Intuitively, there is a tradeoff between the amount of weakly-labeled data and the precision of the weak labels. We explore this tradeoff by combining pretrained data representations with the cut statistic (Muhlenbach et al., 2004) to select (hopefully) high-quality subsets of the weakly-labeled training data. Subset selection applies to any label model and classifier and is very simple to plug in to existing weak supervision pipelines, requiring just a few lines of code. We show our subset selection method improves the performance of weak supervision for a wide range of label models, classifiers, and datasets. Using less weakly-labeled data improves the accuracy of weak supervision pipelines by up to 19% (absolute) on benchmark tasks.
△ Less
Submitted 6 March, 2023; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations
Authors:
Pranjal Awasthi,
Alex Tang,
Aravindan Vijayaraghavan
Abstract:
We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}σ({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $σ(t) := \max(t,0)$ is the ReLU activation.…
▽ More
We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f(x) = {a}^{\mathsf{T}}σ({W}^\mathsf{T}x+b)$, where $x$ is drawn from the Gaussian distribution, and $σ(t) := \max(t,0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f(x)$. Using these ideas we also establish identifiability of the network parameters under minimal assumptions.
△ Less
Submitted 1 August, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Beyond Perturbation Stability: LP Recovery Guarantees for MAP Inference on Noisy Stable Instances
Authors:
Hunter Lang,
Aravind Reddy,
David Sontag,
Aravindan Vijayaraghavan
Abstract:
Several works have shown that perturbation stable instances of the MAP inference problem in Potts models can be solved exactly using a natural linear programming (LP) relaxation. However, most of these works give few (or no) guarantees for the LP solutions on instances that do not satisfy the relatively strict perturbation stability definitions. In this work, we go beyond these stability results b…
▽ More
Several works have shown that perturbation stable instances of the MAP inference problem in Potts models can be solved exactly using a natural linear programming (LP) relaxation. However, most of these works give few (or no) guarantees for the LP solutions on instances that do not satisfy the relatively strict perturbation stability definitions. In this work, we go beyond these stability results by showing that the LP approximately recovers the MAP solution of a stable instance even after the instance is corrupted by noise. This "noisy stable" model realistically fits with practical MAP inference problems: we design an algorithm for finding "close" stable instances, and show that several real-world instances from computer vision have nearby instances that are perturbation stable. These results suggest a new theoretical explanation for the excellent performance of this LP relaxation in practice.
△ Less
Submitted 26 February, 2021;
originally announced March 2021.
-
Graph cuts always find a global optimum for Potts models (with a catch)
Authors:
Hunter Lang,
David Sontag,
Aravindan Vijayaraghavan
Abstract:
We prove that the $α$-expansion algorithm for MAP inference always returns a globally optimal assignment for Markov Random Fields with Potts pairwise potentials, with a catch: the returned assignment is only guaranteed to be optimal for an instance within a small perturbation of the original problem instance. In other words, all local minima with respect to expansion moves are global minima to sli…
▽ More
We prove that the $α$-expansion algorithm for MAP inference always returns a globally optimal assignment for Markov Random Fields with Potts pairwise potentials, with a catch: the returned assignment is only guaranteed to be optimal for an instance within a small perturbation of the original problem instance. In other words, all local minima with respect to expansion moves are global minima to slightly perturbed versions of the problem. On "real-world" instances, MAP assignments of small perturbations of the problem should be very similar to the MAP assignment(s) of the original problem instance. We design an algorithm that can certify whether this is the case in practice. On several MAP inference problem instances from computer vision, this algorithm certifies that MAP solutions to all of these perturbations are very close to solutions of the original instance. These results taken together give a cohesive explanation for the good performance of "graph cuts" algorithms in practice. Every local expansion minimum is a global minimum in a small perturbation of the problem, and all of these global minima are close to the original solution.
△ Less
Submitted 14 June, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Learning a mixture of two subspaces over finite fields
Authors:
Aidao Chen,
Anindya De,
Aravindan Vijayaraghavan
Abstract:
We study the problem of learning a mixture of two subspaces over $\mathbb{F}_2^n$. The goal is to recover the individual subspaces, given samples from a (weighted) mixture of samples drawn uniformly from the two subspaces $A_0$ and $A_1$.
This problem is computationally challenging, as it captures the notorious problem of "learning parities with noise" in the degenerate setting when…
▽ More
We study the problem of learning a mixture of two subspaces over $\mathbb{F}_2^n$. The goal is to recover the individual subspaces, given samples from a (weighted) mixture of samples drawn uniformly from the two subspaces $A_0$ and $A_1$.
This problem is computationally challenging, as it captures the notorious problem of "learning parities with noise" in the degenerate setting when $A_1 \subseteq A_0$. This is in contrast to the analogous problem over the reals that can be solved in polynomial time (Vidal'03). This leads to the following natural question: is Learning Parities with Noise the only computational barrier in obtaining efficient algorithms for learning mixtures of subspaces over $\mathbb{F}_2^n$?
The main result of this paper is an affirmative answer to the above question. Namely, we show the following results: 1. When the subspaces $A_0$ and $A_1$ are incomparable, i.e., $A_0$ and $A_1$ are not contained inside each other, then there is a polynomial time algorithm to recover the subspaces $A_0$ and $A_1$. 2. In the case when $A_1$ is a subspace of $A_0$ with a significant gap in the dimension i.e., $dim(A_1) \le αdim(A_0)$ for $α<1$, there is a $n^{O(1/(1-α))}$ time algorithm to recover the subspaces $A_0$ and $A_1$.
Thus, our algorithms imply computational tractability of the problem of learning mixtures of two subspaces, except in the degenerate setting captured by learning parities with noise.
△ Less
Submitted 15 February, 2021; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Efficient Tensor Decomposition
Authors:
Aravindan Vijayaraghavan
Abstract:
This chapter studies the problem of decomposing a tensor into a sum of constituent rank one tensors. While tensor decompositions are very useful in designing learning algorithms and data analysis, they are NP-hard in the worst-case. We will see how to design efficient algorithms with provable guarantees under mild assumptions, and using beyond worst-case frameworks like smoothed analysis.
This chapter studies the problem of decomposing a tensor into a sum of constituent rank one tensors. While tensor decompositions are very useful in designing learning algorithms and data analysis, they are NP-hard in the worst-case. We will see how to design efficient algorithms with provable guarantees under mild assumptions, and using beyond worst-case frameworks like smoothed analysis.
△ Less
Submitted 30 July, 2020;
originally announced July 2020.
-
Adversarial robustness via robust low rank representations
Authors:
Pranjal Awasthi,
Himanshu Jain,
Ankit Singh Rawat,
Aravindan Vijayaraghavan
Abstract:
Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees.
Our first contribution is for certified robustness to perturbations measured i…
▽ More
Adversarial robustness measures the susceptibility of a classifier to imperceptible perturbations made to the inputs at test time. In this work we highlight the benefits of natural low rank representations that often exist for real data such as images, for training neural networks with certified robustness guarantees.
Our first contribution is for certified robustness to perturbations measured in $\ell_2$ norm. We exploit low rank data representations to provide improved guarantees over state-of-the-art randomized smoothing-based approaches on standard benchmark datasets such as CIFAR-10 and CIFAR-100.
Our second contribution is for the more challenging setting of certified robustness to perturbations measured in $\ell_\infty$ norm. We demonstrate empirically that natural low rank representations have inherent robustness properties, that can be leveraged to provide significantly better guarantees for certified robustness to $\ell_\infty$ perturbations in those representations. Our certificate of $\ell_\infty$ robustness relies on a natural quantity involving the $\infty \to 2$ matrix operator norm associated with the representation, to translate robustness guarantees from $\ell_2$ to $\ell_\infty$ perturbations.
A key technical ingredient for our certification guarantees is a fast algorithm with provable guarantees based on the multiplicative weights update method to provide upper bounds on the above matrix norm. Our algorithmic guarantees improve upon the state of the art for this problem, and may be of independent interest.
△ Less
Submitted 1 August, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
Estimating Principal Components under Adversarial Perturbations
Authors:
Pranjal Awasthi,
Xue Chen,
Aravindan Vijayaraghavan
Abstract:
Robustness is a key requirement for widespread deployment of machine learning algorithms, and has received much attention in both statistics and computer science. We study a natural model of robustness for high-dimensional statistical estimation problems that we call the adversarial perturbation model. An adversary can perturb every sample arbitrarily up to a specified magnitude $δ$ measured in so…
▽ More
Robustness is a key requirement for widespread deployment of machine learning algorithms, and has received much attention in both statistics and computer science. We study a natural model of robustness for high-dimensional statistical estimation problems that we call the adversarial perturbation model. An adversary can perturb every sample arbitrarily up to a specified magnitude $δ$ measured in some $\ell_q$ norm, say $\ell_\infty$. Our model is motivated by emerging paradigms such as low precision machine learning and adversarial training.
We study the classical problem of estimating the top-$r$ principal subspace of the Gaussian covariance matrix in high dimensions, under the adversarial perturbation model. We design a computationally efficient algorithm that given corrupted data, recovers an estimate of the top-$r$ principal subspace with error that depends on a robustness parameter $κ$ that we identify. This parameter corresponds to the $q \to 2$ operator norm of the projector onto the principal subspace, and generalizes well-studied analytic notions of sparsity. Additionally, in the absence of corruptions, our algorithmic guarantees recover existing bounds for problems such as sparse PCA and its higher rank analogs. We also prove that the above dependence on the parameter $κ$ is almost optimal asymptotically, not just in a minimax sense, but remarkably for every instance of the problem. This instance-optimal guarantee shows that the $q \to 2$ operator norm of the subspace essentially characterizes the estimation error under adversarial perturbations.
△ Less
Submitted 1 June, 2020; v1 submitted 31 May, 2020;
originally announced June 2020.
-
Scheduling Precedence-Constrained Jobs on Related Machines with Communication Delay
Authors:
Biswaroop Maiti,
Rajmohan Rajaraman,
David Stalfa,
Zoya Svitkina,
Aravindan Vijayaraghavan
Abstract:
We consider the problem of scheduling $n$ precedence-constrained jobs on $m$ uniformly-related machines in the presence of an arbitrary, fixed communication delay $ρ$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an…
▽ More
We consider the problem of scheduling $n$ precedence-constrained jobs on $m$ uniformly-related machines in the presence of an arbitrary, fixed communication delay $ρ$. We consider a model that allows job duplication, i.e. processing of the same job on multiple machines, which, as we show, can reduce the length of a schedule (i.e., its makespan) by a logarithmic factor. Our main result is an $O(\log m \log ρ/ \log \log ρ)$-approximation algorithm for minimizing makespan, assuming the minimum makespan is at least $ρ$. Our algorithm is based on rounding a linear programming relaxation for the problem, which includes carefully designed constraints capturing the interaction among communication delay, precedence requirements, varying speeds, and job duplication. Our result builds on two previous lines of work, one with communication delay but identical machines (Lepere, Rapine 2002) and the other with uniformly-related machines but no communication delay (Chudak, Shmoys 1999). We next show that the integrality gap of our mathematical program is $Ω(\sqrt{\log ρ})$. Our gap construction employs expander graphs and exploits a property of robust expansion and its generalization to paths of longer length. Finally, we quantify the advantage of duplication in scheduling with communication delay. We show that the best schedule without duplication can have makespan $Ω(ρ/\log ρ)$ or $Ω(\log m/\log\log m)$ or $Ω(\log n/\log \log n)$ times that of an optimal schedule allowing duplication. Nevertheless, we present a polynomial time algorithm to transform any schedule to a schedule without duplication at the cost of a $O(\log^2 n \log m)$ factor increase in makespan. Together with our makespan approximation algorithm for schedules allowing duplication, this also yields a polylogarithmic-approximation algorithm for the setting where duplication is not allowed.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Adversarially Robust Low Dimensional Representations
Authors:
Pranjal Awasthi,
Vaggos Chatziafratis,
Xue Chen,
Aravindan Vijayaraghavan
Abstract:
Many machine learning systems are vulnerable to small perturbations made to inputs either at test time or at training time. This has received much recent interest on the empirical front due to applications where reliability and security are critical. However, theoretical understanding of algorithms that are robust to adversarial perturbations is limited.
In this work we focus on Principal Compon…
▽ More
Many machine learning systems are vulnerable to small perturbations made to inputs either at test time or at training time. This has received much recent interest on the empirical front due to applications where reliability and security are critical. However, theoretical understanding of algorithms that are robust to adversarial perturbations is limited.
In this work we focus on Principal Component Analysis (PCA), a ubiquitous algorithmic primitive in machine learning. We formulate a natural robust variant of PCA where the goal is to find a low dimensional subspace to represent the given data with minimum projection error, that is in addition robust to small perturbations measured in $\ell_q$ norm (say $q=\infty$). Unlike PCA which is solvable in polynomial time, our formulation is computationally intractable to optimize as it captures a variant of the well-studied sparse PCA objective as a special case. We show the following results:
-Polynomial time algorithm that is constant factor competitive in the worst-case with respect to the best subspace, in terms of the projection error and the robustness criterion.
-We show that our algorithmic techniques can also be made robust to adversarial training-time perturbations, in addition to yielding representations that are robust to adversarial perturbations at test time. Specifically, we design algorithms for a strong notion of training-time perturbations, where every point is adversarially perturbed up to a specified amount.
-We illustrate the broad applicability of our algorithmic techniques in addressing robustness to adversarial perturbations, both at training time and test time. In particular, our adversarially robust PCA primitive leads to computationally efficient and robust algorithms for both unsupervised and supervised learning problems such as clustering and learning adversarially robust classifiers.
△ Less
Submitted 13 August, 2021; v1 submitted 29 November, 2019;
originally announced November 2019.
-
On Robustness to Adversarial Examples and Polynomial Optimization
Authors:
Pranjal Awasthi,
Abhratanu Dutta,
Aravindan Vijayaraghavan
Abstract:
We study the design of computationally efficient algorithms with provable guarantees, that are robust to adversarial (test time) perturbations. While there has been an proliferation of recent work on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions like (i) when and how can one design provably robust l…
▽ More
We study the design of computationally efficient algorithms with provable guarantees, that are robust to adversarial (test time) perturbations. While there has been an proliferation of recent work on this topic due to its connections to test time robustness of deep networks, there is limited theoretical understanding of several basic questions like (i) when and how can one design provably robust learning algorithms? (ii) what is the price of achieving robustness to adversarial examples in a computationally efficient manner?
The main contribution of this work is to exhibit a strong connection between achieving robustness to adversarial examples, and a rich class of polynomial optimization problems, thereby making progress on the above questions. In particular, we leverage this connection to (a) design computationally efficient robust algorithms with provable guarantees for a large class of hypothesis, namely linear classifiers and degree-2 polynomial threshold functions (PTFs), (b) give a precise characterization of the price of achieving robustness in a computationally efficient manner for these classes, (c) design efficient algorithms to certify robustness and generate adversarial attacks in a principled manner for 2-layer neural networks. We empirically demonstrate the effectiveness of these attacks on real data.
△ Less
Submitted 12 November, 2019;
originally announced November 2019.
-
Smoothed Analysis in Unsupervised Learning via Decoupling
Authors:
Aditya Bhaskara,
Aidao Chen,
Aidan Perreault,
Aravindan Vijayaraghavan
Abstract:
Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsuperv…
▽ More
Smoothed analysis is a powerful paradigm in overcoming worst-case intractability in unsupervised learning and high-dimensional data analysis. While polynomial time smoothed analysis guarantees have been obtained for worst-case intractable problems like tensor decompositions and learning mixtures of Gaussians, such guarantees have been hard to obtain for several other important problems in unsupervised learning. A core technical challenge in analyzing algorithms is obtaining lower bounds on the least singular value for random matrix ensembles with dependent entries, that are given by low-degree polynomials of a few base underlying random variables.
In this work, we address this challenge by obtaining high-confidence lower bounds on the least singular value of new classes of structured random matrix ensembles of the above kind. We then use these bounds to design algorithms with polynomial time smoothed analysis guarantees for the following three important problems in unsupervised learning:
1. Robust subspace recovery, when the fraction $α$ of inliers in the d-dimensional subspace $T \subset \mathbb{R}^n$ is at least $α> (d/n)^\ell$ for any constant integer $\ell>0$. This contrasts with the known worst-case intractability when $α< d/n$, and the previous smoothed analysis result which needed $α> d/n$ (Hardt and Moitra, 2013).
2. Learning overcomplete hidden markov models, where the size of the state space is any polynomial in the dimension of the observations. This gives the first polynomial time guarantees for learning overcomplete HMMs in a smoothed analysis model.
3. Higher order tensor decompositions, where we generalize the so-called FOOBI algorithm of Cardoso to find order-$\ell$ rank-one tensors in a subspace. This allows us to obtain polynomially robust decomposition algorithms for $2\ell$'th order tensors with rank $O(n^{\ell})$.
△ Less
Submitted 23 April, 2019; v1 submitted 29 November, 2018;
originally announced November 2018.
-
Block Stability for MAP Inference
Authors:
Hunter Lang,
David Sontag,
Aravindan Vijayaraghavan
Abstract:
To understand the empirical success of approximate MAP inference, recent work (Lang et al., 2018) has shown that some popular approximation algorithms perform very well when the input instance is stable. The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are (adversarially) perturbed. Unfortunately, this strong condition does…
▽ More
To understand the empirical success of approximate MAP inference, recent work (Lang et al., 2018) has shown that some popular approximation algorithms perform very well when the input instance is stable. The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are (adversarially) perturbed. Unfortunately, this strong condition does not seem to be satisfied in practice. In this paper, we introduce a significantly more relaxed condition that only requires blocks (portions) of an input instance to be stable. Under this block stability condition, we prove that the pairwise LP relaxation is persistent on the stable blocks. We complement our theoretical results with an empirical evaluation of real-world MAP inference instances from computer vision. We design an algorithm to find stable blocks, and find that these real instances have large stable regions. Our work gives a theoretical explanation for the widespread empirical phenomenon of persistency for this LP relaxation.
△ Less
Submitted 12 November, 2020; v1 submitted 11 October, 2018;
originally announced October 2018.
-
Towards Learning Sparsely Used Dictionaries with Arbitrary Supports
Authors:
Pranjal Awasthi,
Aravindan Vijayaraghavan
Abstract:
Dictionary learning is a popular approach for inferring a hidden basis or dictionary in which data has a sparse representation. Data generated from the dictionary A (an n by m matrix, with m > n in the over-complete setting) is given by Y = AX where X is a matrix whose columns have supports chosen from a distribution over k-sparse vectors, and the non-zero values chosen from a symmetric distributi…
▽ More
Dictionary learning is a popular approach for inferring a hidden basis or dictionary in which data has a sparse representation. Data generated from the dictionary A (an n by m matrix, with m > n in the over-complete setting) is given by Y = AX where X is a matrix whose columns have supports chosen from a distribution over k-sparse vectors, and the non-zero values chosen from a symmetric distribution. Given Y, the goal is to recover A and X in polynomial time. Existing algorithms give polytime guarantees for recovering incoherent dictionaries, under strong distributional assumptions both on the supports of the columns of X, and on the values of the non-zero entries. In this work, we study the following question: Can we design efficient algorithms for recovering dictionaries when the supports of the columns of X are arbitrary?
To address this question while circumventing the issue of non-identifiability, we study a natural semirandom model for dictionary learning where there are a large number of samples $y=Ax$ with arbitrary k-sparse supports for x, along with a few samples where the sparse supports are chosen uniformly at random. While the few samples with random supports ensures identifiability, the support distribution can look almost arbitrary in aggregate. Hence existing algorithmic techniques seem to break down as they make strong assumptions on the supports.
Our main contribution is a new polynomial time algorithm for learning incoherent over-complete dictionaries that works under the semirandom model. Additionally the same algorithm provides polynomial time guarantees in new parameter regimes when the supports are fully random. Finally using these techniques, we also identify a minimal set of conditions on the supports under which the dictionary can be (information theoretically) recovered from polynomial samples for almost linear sparsity, i.e., $k=\tilde{O}(n)$.
△ Less
Submitted 8 May, 2018; v1 submitted 23 April, 2018;
originally announced April 2018.
-
Clustering Stable Instances of Euclidean k-means
Authors:
Abhratanu Dutta,
Aravindan Vijayaraghavan,
Alex Wang
Abstract:
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design…
▽ More
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances. Stable instances have unique optimal k-means solutions that do not change even when each point is perturbed a little (in Euclidean distance). This captures the property that the k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we show an efficient algorithm with provable guarantees that is also robust to outliers. We complement these results by studying the amount of stability in real datasets and demonstrating that our algorithm performs well on these benchmark datasets.
△ Less
Submitted 4 December, 2017;
originally announced December 2017.
-
Clustering Semi-Random Mixtures of Gaussians
Authors:
Pranjal Awasthi,
Aravindan Vijayaraghavan
Abstract:
Gaussian mixture models (GMM) are the most widely used statistical model for the $k$-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random model for $k$-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model,…
▽ More
Gaussian mixture models (GMM) are the most widely used statistical model for the $k$-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural semi-random model for $k$-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. In our model, a semi-random adversary is allowed to make arbitrary "monotone" or helpful changes to the data generated from the Gaussian mixture model.
Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd's algorithm for $k$-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching information-theoretic lower bound on the number of misclassified points incurred by any $k$-means clustering algorithm on the semi-random model.
△ Less
Submitted 23 November, 2017;
originally announced November 2017.
-
Optimality of Approximate Inference Algorithms on Stable Instances
Authors:
Hunter Lang,
David Sontag,
Aravindan Vijayaraghavan
Abstract:
Approximate algorithms for structured prediction problems---such as LP relaxations and the popular alpha-expansion algorithm (Boykov et al. 2001)---typically far exceed their theoretical performance guarantees on real-world instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alpha-expansion and an LP rel…
▽ More
Approximate algorithms for structured prediction problems---such as LP relaxations and the popular alpha-expansion algorithm (Boykov et al. 2001)---typically far exceed their theoretical performance guarantees on real-world instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alpha-expansion and an LP relaxation algorithm on MAP inference in Ferromagnetic Potts models (FPMs). Our main results give stability conditions under which these two algorithms provably recover the optimal MAP solution. These theoretical results complement numerous empirical observations of good performance.
△ Less
Submitted 23 April, 2018; v1 submitted 6 November, 2017;
originally announced November 2017.
-
On Learning Mixtures of Well-Separated Gaussians
Authors:
Oded Regev,
Aravindan Vijayaraghavan
Abstract:
We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of $k$ standard spherical Gaussians, and the goal is to estimate the means up to accuracy $δ$ using $poly(k,d, 1/δ)$ samples.
In this work, we study the followi…
▽ More
We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of $k$ standard spherical Gaussians, and the goal is to estimate the means up to accuracy $δ$ using $poly(k,d, 1/δ)$ samples.
In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly $\min\{k,d\}^{1/4}$. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation $o(1)$, exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.
1. We show that with separation $o(\sqrt{\log k})$, super-polynomially many samples are required. In fact, this holds even when the $k$ means of the Gaussians are picked at random in $d=O(\log k)$ dimensions.
2. We show that with separation $Ω(\sqrt{\log k})$, $poly(k,d,1/δ)$ samples suffice. Note that the bound on the separation is independent of $δ$. This result is based on a new and efficient "accuracy boosting" algorithm that takes as input coarse estimates of the true means and in time $poly(k,d, 1/δ)$ outputs estimates of the means up to arbitrary accuracy $δ$ assuming the separation between the means is $Ω(\min\{\sqrt{\log k},\sqrt{d}\})$ (independently of $δ$).
We also present a computationally efficient algorithm in $d=O(1)$ dimensions with only $Ω(\sqrt{d})$ separation. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of $k$ spherical Gaussians with polynomial samples.
△ Less
Submitted 31 October, 2017;
originally announced October 2017.
-
Learning Communities in the Presence of Errors
Authors:
Konstantin Makarychev,
Yury Makarychev,
Aravindan Vijayaraghavan
Abstract:
We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning commu…
▽ More
We study the problem of learning communities in the presence of modeling errors and give robust recovery algorithms for the Stochastic Block Model (SBM). This model, which is also known as the Planted Partition Model, is widely used for community detection and graph partitioning in various fields, including machine learning, statistics, and social sciences. Many algorithms exist for learning communities in the Stochastic Block Model, but they do not work well in the presence of errors.
In this paper, we initiate the study of robust algorithms for partial recovery in SBM with modeling errors or noise. We consider graphs generated according to the Stochastic Block Model and then modified by an adversary. We allow two types of adversarial errors, Feige---Kilian or monotone errors, and edge outlier errors. Mossel, Neeman and Sly (STOC 2015) posed an open question about whether an almost exact recovery is possible when the adversary is allowed to add $o(n)$ edges. Our work answers this question affirmatively even in the case of $k>2$ communities.
We then show that our algorithms work not only when the instances come from SBM, but also work when the instances come from any distribution of graphs that is $εm$ close to SBM in the Kullback---Leibler divergence. This result also works in the presence of adversarial errors. Finally, we present almost tight lower bounds for two communities.
△ Less
Submitted 24 June, 2016; v1 submitted 10 November, 2015;
originally announced November 2015.
-
Beating the random assignment on constraint satisfaction problems of bounded degree
Authors:
Boaz Barak,
Ankur Moitra,
Ryan O'Donnell,
Prasad Raghavendra,
Oded Regev,
David Steurer,
Luca Trevisan,
Aravindan Vijayaraghavan,
David Witmer,
John Wright
Abstract:
We show that for any odd $k$ and any instance of the Max-kXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a $\frac{1}{2} + Ω(1/\sqrt{D})$ fraction of constraints, where $D$ is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone,…
▽ More
We show that for any odd $k$ and any instance of the Max-kXOR constraint satisfaction problem, there is an efficient algorithm that finds an assignment satisfying at least a $\frac{1}{2} + Ω(1/\sqrt{D})$ fraction of constraints, where $D$ is a bound on the number of constraints that each variable occurs in. This improves both qualitatively and quantitatively on the recent work of Farhi, Goldstone, and Gutmann (2014), which gave a \emph{quantum} algorithm to find an assignment satisfying a $\frac{1}{2} + Ω(D^{-3/4})$ fraction of the equations.
For arbitrary constraint satisfaction problems, we give a similar result for "triangle-free" instances; i.e., an efficient algorithm that finds an assignment satisfying at least a $μ+ Ω(1/\sqrt{D})$ fraction of constraints, where $μ$ is the fraction that would be satisfied by a uniformly random assignment.
△ Less
Submitted 11 August, 2015; v1 submitted 13 May, 2015;
originally announced May 2015.
-
Learning Mixtures of Ranking Models
Authors:
Pranjal Awasthi,
Avrim Blum,
Or Sheffet,
Aravindan Vijayaraghavan
Abstract:
This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model. Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima. We present the first polynomial time algorithm which provably learns the param…
▽ More
This work concerns learning probabilistic models for ranking data in a heterogeneous population. The specific problem we study is learning the parameters of a Mallows Mixture Model. Despite being widely studied, current heuristics for this problem do not have theoretical guarantees and can get stuck in bad local optima. We present the first polynomial time algorithm which provably learns the parameters of a mixture of two Mallows models. A key component of our algorithm is a novel use of tensor decomposition techniques to learn the top-k prefix in both the rankings. Before this work, even the question of identifiability in the case of a mixture of two Mallows models was unresolved.
△ Less
Submitted 31 October, 2014;
originally announced October 2014.
-
Correlation Clustering with Noisy Partial Information
Authors:
Konstantin Makarychev,
Yury Makarychev,
Aravindan Vijayaraghavan
Abstract:
In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value $(1+ δ) optcost + O_δ(n\log^3 n)$ with high probability, where $optcost$ is the value of the optimal solution (for every $δ> 0$). The second algo…
▽ More
In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value $(1+ δ) optcost + O_δ(n\log^3 n)$ with high probability, where $optcost$ is the value of the optimal solution (for every $δ> 0$). The second algorithm finds the ground truth clustering with an arbitrarily small classification error $η$ (under some additional assumptions on the instance).
△ Less
Submitted 12 May, 2015; v1 submitted 21 June, 2014;
originally announced June 2014.
-
Constant Factor Approximation for Balanced Cut in the PIE model
Authors:
Konstantin Makarychev,
Yury Makarychev,
Aravindan Vijayaraghavan
Abstract:
We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters $L$ and $R$ of equal size. Let $G$ be an arbitrary graph on $V$ with no edges between $L$ and $R$. Let $E_{random}$ be a set…
▽ More
We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters $L$ and $R$ of equal size. Let $G$ be an arbitrary graph on $V$ with no edges between $L$ and $R$. Let $E_{random}$ be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in $L$ and in $R$). Then we say that $G + E_{random}$ is a graph with permutation-invariant random edges.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost $O(|E_{random}|) + n \text{polylog}(n)$ in this model. In the regime when $|E_{random}| = Ω(n \text{polylog}(n))$, this is a constant factor approximation with respect to the cost of the planted cut.
△ Less
Submitted 21 June, 2014;
originally announced June 2014.
-
Smoothed Analysis of Tensor Decompositions
Authors:
Aditya Bhaskara,
Moses Charikar,
Ankur Moitra,
Aravindan Vijayaraghavan
Abstract:
Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where ran…
▽ More
Low rank tensor decompositions are a powerful tool for learning generative models, and uniqueness results give them a significant advantage over matrix decomposition methods. However, tensors pose significant algorithmic challenges and tensors analogs of much of the matrix algebra toolkit are unlikely to exist because of hardness results. Efficient decomposition in the overcomplete case (where rank exceeds dimension) is particularly challenging. We introduce a smoothed analysis model for studying these questions and develop an efficient algorithm for tensor decomposition in the highly overcomplete case (rank polynomial in the dimension). In this setting, we show that our algorithm is robust to inverse polynomial error -- a crucial property for applications in learning since we are only allowed a polynomial number of samples. While algorithms are known for exact tensor decomposition in some overcomplete settings, our main contribution is in analyzing their stability in the framework of smoothed analysis.
Our main technical contribution is to show that tensor products of perturbed vectors are linearly independent in a robust sense (i.e. the associated matrix has singular values that are at least an inverse polynomial). This key result paves the way for applying tensor methods to learning problems in the smoothed setting. In particular, we use it to obtain results for learning multi-view models and mixtures of axis-aligned Gaussians where there are many more "components" than dimensions. The assumption here is that the model is not adversarially chosen, formalized by a perturbation of model parameters. We believe this an appealing way to analyze realistic instances of learning problems, since this framework allows us to overcome many of the usual limitations of using tensor methods.
△ Less
Submitted 20 January, 2014; v1 submitted 14 November, 2013;
originally announced November 2013.
-
Bilu-Linial Stable Instances of Max Cut and Minimum Multiway Cut
Authors:
Konstantin Makarychev,
Yury Makarychev,
Aravindan Vijayaraghavan
Abstract:
We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for $γ$-stable Max Cut instances with $γ\geq c\sqrt{\log n}\log\log n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $γ$-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that…
▽ More
We investigate the notion of stability proposed by Bilu and Linial. We obtain an exact polynomial-time algorithm for $γ$-stable Max Cut instances with $γ\geq c\sqrt{\log n}\log\log n$ for some absolute constant $c > 0$. Our algorithm is robust: it never returns an incorrect answer; if the instance is $γ$-stable, it finds the maximum cut, otherwise, it either finds the maximum cut or certifies that the instance is not $γ$-stable. We prove that there is no robust polynomial-time algorithm for $γ$-stable instances of Max Cut when $γ< α_{SC}(n/2)$, where $α_{SC}$ is the best approximation factor for Sparsest Cut with non-uniform demands.
Our algorithm is based on semidefinite programming. We show that the standard SDP relaxation for Max Cut (with $\ell_2^2$ triangle inequalities) is integral if $γ\geq D_{\ell_2^2\to \ell_1}(n)$, where $D_{\ell_2^2\to \ell_1}(n)$ is the least distortion with which every $n$ point metric space of negative type embeds into $\ell_1$. On the negative side, we show that the SDP relaxation is not integral when $γ< D_{\ell_2^2\to \ell_1}(n/2)$. Moreover, there is no tractable convex relaxation for $γ$-stable instances of Max Cut when $γ< α_{SC}(n/2)$. That suggests that solving $γ$-stable instances with $γ=o(\sqrt{\log n})$ might be difficult or impossible.
Our results significantly improve previously known results. The best previously known algorithm for $γ$-stable instances of Max Cut required that $γ\geq c\sqrt{n}$ (for some $c > 0$) [Bilu, Daniely, Linial, and Saks]. No hardness results were known for the problem. Additionally, we present an algorithm for 4-stable instances of Minimum Multiway Cut. We also study a relaxed notion of weak stability.
△ Less
Submitted 11 November, 2013; v1 submitted 7 May, 2013;
originally announced May 2013.
-
Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability
Authors:
Aditya Bhaskara,
Moses Charikar,
Aravindan Vijayaraghavan
Abstract:
We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in provin…
▽ More
We give a robust version of the celebrated result of Kruskal on the uniqueness of tensor decompositions: we prove that given a tensor whose decomposition satisfies a robust form of Kruskal's rank condition, it is possible to approximately recover the decomposition if the tensor is known up to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in proving the identifiability of parameters for various latent variable models and mixture models such as Hidden Markov models, topic models etc. Our robust version immediately implies identifiability using only polynomially many samples in many of these settings. This polynomial identifiability is an essential first step towards efficient learning algorithms for these models.
Recently, algorithms based on tensor decompositions have been used to estimate the parameters of various hidden variable models efficiently in special cases as long as they satisfy certain "non-degeneracy" properties. Our methods give a way to go beyond this non-degeneracy barrier, and establish polynomial identifiability of the parameters under much milder conditions. Given the importance of Kruskal's theorem in the tensor literature, we expect that this robust version will have several applications beyond the settings we explore in this work.
△ Less
Submitted 30 April, 2013;
originally announced April 2013.
-
Approximation Algorithms for Semi-random Graph Partitioning Problems
Authors:
Konstantin Makarychev,
Yury Makarychev,
Aravindan Vijayaraghavan
Abstract:
In this paper, we propose and study a new semi-random model for graph partitioning problems. We believe that it captures many properties of real--world instances. The model is more flexible than the semi-random model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser.
We develop a general framework for solving semi-random instances and apply it to several problem…
▽ More
In this paper, we propose and study a new semi-random model for graph partitioning problems. We believe that it captures many properties of real--world instances. The model is more flexible than the semi-random model of Feige and Kilian and planted random model of Bui, Chaudhuri, Leighton and Sipser.
We develop a general framework for solving semi-random instances and apply it to several problems of interest. We present constant factor bi-criteria approximation algorithms for semi-random instances of the Balanced Cut, Multicut, Min Uncut, Sparsest Cut and Small Set Expansion problems. We also show how to almost recover the optimal solution if the instance satisfies an additional expanding condition. Our algorithms work in a wider range of parameters than most algorithms for previously studied random and semi-random models.
Additionally, we study a new planted algebraic expander model and develop constant factor bi-criteria approximation algorithms for graph partitioning problems in this model.
△ Less
Submitted 10 May, 2012;
originally announced May 2012.
-
Approximation Algorithms and Hardness of the k-Route Cut Problem
Authors:
Julia Chuzhoy,
Yury Makarychev,
Aravindan Vijayaraghavan,
Yuan Zhou
Abstract:
We study the k-route cut problem: given an undirected edge-weighted graph G=(V,E), a collection {(s_1,t_1),(s_2,t_2),...,(s_r,t_r)} of source-sink pairs, and an integer connectivity requirement k, the goal is to find a minimum-weight subset E' of edges to remove, such that the connectivity of every pair (s_i, t_i) falls below k. Specifically, in the edge-connectivity version, EC-kRC, the requireme…
▽ More
We study the k-route cut problem: given an undirected edge-weighted graph G=(V,E), a collection {(s_1,t_1),(s_2,t_2),...,(s_r,t_r)} of source-sink pairs, and an integer connectivity requirement k, the goal is to find a minimum-weight subset E' of edges to remove, such that the connectivity of every pair (s_i, t_i) falls below k. Specifically, in the edge-connectivity version, EC-kRC, the requirement is that there are at most (k-1) edge-disjoint paths connecting s_i to t_i in G \ E', while in the vertex-connectivity version, NC-kRC, the same requirement is for vertex-disjoint paths. Prior to our work, poly-logarithmic approximation algorithms have been known for the special case where k >= 3, but no non-trivial approximation algorithms were known for any value k>3, except in the single-source setting. We show an O(k log^{3/2}r)-approximation algorithm for EC-kRC with uniform edge weights, and several polylogarithmic bi-criteria approximation algorithms for EC-kRC and NC-kRC, where the connectivity requirement k is violated by a constant factor. We complement these upper bounds by proving that NC-kRC is hard to approximate to within a factor of k^{eps} for some fixed eps>0.
We then turn to study a simpler version of NC-kRC, where only one source-sink pair is present. We give a simple bi-criteria approximation algorithm for this case, and show evidence that even this restricted version of the problem may be hard to approximate. For example, we prove that the single source-sink pair version of NC-kRC has no constant-factor approximation, assuming Feige's Random k-AND assumption.
△ Less
Submitted 15 December, 2011; v1 submitted 15 December, 2011;
originally announced December 2011.
-
Polynomial integrality gaps for strong SDP relaxations of Densest k-subgraph
Authors:
Aditya Bhaskara,
Moses Charikar,
Venkatesan Guruswami,
Aravindan Vijayaraghavan,
Yuan Zhou
Abstract:
The densest k-subgraph (DkS) problem (i.e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for DkS: the current best algorithm gives an ~ O(n^{1/4}) approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P !…
▽ More
The densest k-subgraph (DkS) problem (i.e. find a size k subgraph with maximum number of edges), is one of the notorious problems in approximation algorithms. There is a significant gap between known upper and lower bounds for DkS: the current best algorithm gives an ~ O(n^{1/4}) approximation, while even showing a small constant factor hardness requires significantly stronger assumptions than P != NP. In addition to interest in designing better algorithms, a number of recent results have exploited the conjectured hardness of densest k-subgraph and its variants. Thus, understanding the approximability of DkS is an important challenge.
In this work, we give evidence for the hardness of approximating DkS within polynomial factors. Specifically, we expose the limitations of strong semidefinite programs from SDP hierarchies in solving densest k-subgraph. Our results include:
* A lower bound of Omega(n^{1/4}/log^3 n) on the integrality gap for Omega(log n/log log n) rounds of the Sherali-Adams relaxation for DkS. This also holds for the relaxation obtained from Sherali-Adams with an added SDP constraint. Our gap instances are in fact Erdos-Renyi random graphs.
* For every epsilon > 0, a lower bound of n^{2/53-eps} on the integrality gap of n^{Omega(eps)} rounds of the Lasserre SDP relaxation for DkS, and an n^{Omega_eps(1)} gap for n^{1-eps} rounds. Our construction proceeds via a reduction from random instances of a certain Max-CSP over large domains.
In the absence of inapproximability results for DkS, our results show that even the most powerful SDPs are unable to beat a factor of n^{Omega(1)}, and in fact even improving the best known n^{1/4} factor is a barrier for current techniques.
△ Less
Submitted 6 October, 2011;
originally announced October 2011.
-
On Quadratic Programming with a Ratio Objective
Authors:
Aditya Bhaskara,
Moses Charikar,
Rajsekar Manokaran,
Aravindan Vijayaraghavan
Abstract:
Quadratic Programming (QP) is the well-studied problem of maximizing over {-1,1} values the quadratic form \sum_{i \ne j} a_{ij} x_i x_j. QP captures many known combinatorial optimization problems, and assuming the unique games conjecture, semidefinite programming techniques give optimal approximation algorithms. We extend this body of work by initiating the study of Quadratic Programming problems…
▽ More
Quadratic Programming (QP) is the well-studied problem of maximizing over {-1,1} values the quadratic form \sum_{i \ne j} a_{ij} x_i x_j. QP captures many known combinatorial optimization problems, and assuming the unique games conjecture, semidefinite programming techniques give optimal approximation algorithms. We extend this body of work by initiating the study of Quadratic Programming problems where the variables take values in the domain {-1,0,1}. The specific problems we study are
QP-Ratio : \max_{\{-1,0,1\}^n} \frac{\sum_{i \not = j} a_{ij} x_i x_j}{\sum x_i^2}, and Normalized QP-Ratio : \max_{\{-1,0,1\}^n} \frac{\sum_{i \not = j} a_{ij} x_i x_j}{\sum d_i x_i^2}, where d_i = \sum_j |a_{ij}|
We consider an SDP relaxation obtained by adding constraints to the natural eigenvalue (or SDP) relaxation for this problem. Using this, we obtain an $\tilde{O}(n^{1/3})$ algorithm for QP-ratio. We also obtain an $\tilde{O}(n^{1/4})$ approximation for bipartite graphs, and better algorithms for special cases. As with other problems with ratio objectives (e.g. uniform sparsest cut), it seems difficult to obtain inapproximability results based on P!=NP. We give two results that indicate that QP-Ratio is hard to approximate to within any constant factor. We also give a natural distribution on instances of QP-Ratio for which an n^εapproximation (for εroughly 1/10) seems out of reach of current techniques.
△ Less
Submitted 5 December, 2011; v1 submitted 10 January, 2011;
originally announced January 2011.
-
Detecting High Log-Densities -- an O(n^1/4) Approximation for Densest k-Subgraph
Authors:
Aditya Bhaskara,
Moses Charikar,
Eden Chlamtac,
Uriel Feige,
Aravindan Vijayaraghavan
Abstract:
In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm…
▽ More
In the Densest k-Subgraph problem, given a graph G and a parameter k, one needs to find a subgraph of G induced on k vertices that contains the largest number of edges. There is a significant gap between the best known upper and lower bounds for this problem. It is NP-hard, and does not have a PTAS unless NP has subexponential time algorithms. On the other hand, the current best known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of n^(1/3-epsilon) for some specific epsilon > 0 (estimated at around 1/60).
We present an algorithm that for every epsilon > 0 approximates the Densest k-Subgraph problem within a ratio of n^(1/4+epsilon) in time n^O(1/epsilon). In particular, our algorithm achieves an approximation ratio of O(n^1/4) in time n^O(log n). Our algorithm is inspired by studying an average-case version of the problem where the goal is to distinguish random graphs from graphs with planted dense subgraphs. The approximation ratio we achieve for the general case matches the distinguishing ratio we obtain for this planted problem.
At a high level, our algorithms involve cleverly counting appropriately defined trees of constant size in G, and using these counts to identify the vertices of the dense subgraph. Our algorithm is based on the following principle. We say that a graph G(V,E) has log-density alpha if its average degree is Theta(|V|^alpha). The algorithmic core of our result is a family of algorithms that output k-subgraphs of nontrivial density whenever the log-density of the densest k-subgraph is larger than the log-density of the host graph.
△ Less
Submitted 17 January, 2010;
originally announced January 2010.
-
Approximating Matrix p-norms
Authors:
Aditya Bhaskara,
Aravindan Vijayaraghavan
Abstract:
We consider the problem of computing the q->p norm of a matrix A, which is defined for p,q \ge 1, as |A|_{q->p} = max_{x !=0 } |Ax|_p / |x|_q. This is in general a non-convex optimization problem, and is a natural generalization of the well-studied question of computing singular values (this corresponds to p=q=2). Different settings of parameters give rise to a variety of known interesting problem…
▽ More
We consider the problem of computing the q->p norm of a matrix A, which is defined for p,q \ge 1, as |A|_{q->p} = max_{x !=0 } |Ax|_p / |x|_q. This is in general a non-convex optimization problem, and is a natural generalization of the well-studied question of computing singular values (this corresponds to p=q=2). Different settings of parameters give rise to a variety of known interesting problems (such as the Grothendieck problem when p=1 and q=\infty). However, very little is understood about the approximability of the problem for different values of p,q. Our first result is an efficient algorithm for computing the q->p norm of matrices with non-negative entries, when q \ge p \ge 1. The algorithm we analyze is based on a natural fixed point iteration, which can be seen as an analog of power iteration for computing eigenvalues. We then present an application of our techniques to the problem of constructing a scheme for oblivious routing in the l_p norm. This makes constructive a recent existential result of Englert and Räcke [ER] on O(log n)-competitive oblivious routing schemes (which they make constructive only for p=2). On the other hand, when we do not have any restrictions on the entries (such as non-negativity), we prove that the problem is NP-hard to approximate to any constant factor, for 2 < p \le q, and p \le q < 2 (these are precisely the ranges of p,q with p\le q, where constant factor approximations are not known). In this range, our techniques also show that if NP does not have quasi-polynomial time algorithms, the q->p cannot be approximated to a factor 2^{(log n)^{1-eps}}, for any \eps>0.
△ Less
Submitted 2 May, 2010; v1 submitted 15 January, 2010;
originally announced January 2010.