Skip to main content

Showing 1–49 of 49 results for author: Dhillon, I

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.11206  [pdf, other

    cs.LG cs.CR stat.ML

    Retraining with Predicted Hard Labels Provably Increases Model Accuracy

    Authors: Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

    Abstract: The performance of a model trained with noisy labels is often improved by simply \textit{retraining} the model with its \textit{own predicted hard labels} (i.e., 1/0 labels). Yet, a detailed theoretical characterization of this phenomenon is lacking. In this paper, we theoretically analyze retraining in a linearly separable binary classification setting with randomly corrupted labels given to us a… ▽ More

    Submitted 7 May, 2025; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: To appear in ICML 2025

  2. arXiv:2402.07114  [pdf, other

    cs.LG math.NA math.OC stat.ML

    Towards Quantifying the Preconditioning Effect of Adam

    Authors: Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a notable dearth of results characterizing the preconditioning effect of Adam and showing how it may alleviate the curse of ill-conditioning -- an issue plaguing gradient descent (GD). In this work, we perform a detailed analysis of Adam's preconditioning effect for quadratic functions and quantify to what extent Adam can mitigate the dependence on the condition number of the Hessian. Our… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  3. arXiv:2208.02362  [pdf, other

    cs.LG stat.ML

    Bayesian regularization of empirical MDPs

    Authors: Samarth Gupta, Daniel N. Hill, Lexing Ying, Inderjit Dhillon

    Abstract: In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling f… ▽ More

    Submitted 20 September, 2022; v1 submitted 3 August, 2022; originally announced August 2022.

  4. arXiv:2204.10936  [pdf, other

    cs.IR cs.LG stat.ML

    Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

    Authors: Adam Block, Rahul Kidambi, Daniel N. Hill, Thorsten Joachims, Inderjit S. Dhillon

    Abstract: Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

  5. arXiv:2202.10506  [pdf, other

    math.OC cs.LG stat.ML

    Accelerating Primal-dual Methods for Regularized Markov Decision Processes

    Authors: Haoya Li, Hsiang-fu Yu, Lexing Ying, Inderjit Dhillon

    Abstract: Entropy regularized Markov decision processes have been widely used in reinforcement learning. This paper is concerned with the primal-dual formulation of the entropy regularized problems. Standard first-order methods suffer from slow convergence due to the lack of strict convexity and concavity. To address this issue, we first introduce a new quadratically convexified primal-dual formulation. The… ▽ More

    Submitted 12 June, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

  6. arXiv:2110.14011  [pdf, other

    cs.LG stat.ML

    Cluster-and-Conquer: A Framework For Time-Series Forecasting

    Authors: Reese Pathak, Rajat Sen, Nikhil Rao, N. Benjamin Erichson, Michael I. Jordan, Inderjit S. Dhillon

    Abstract: We propose a three-stage framework for forecasting high-dimensional time-series data. Our method first estimates parameters for each univariate time series. Next, we use these parameters to cluster the time series. These clusters can be viewed as multivariate time series, for which we then compute parameters. The forecasted values of a single time series can depend on the history of other time ser… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: 25 pages, 3 figures

  7. arXiv:2110.00685  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

    Authors: Jiong Zhang, Wei-cheng Chang, Hsiang-fu Yu, Inderjit S. Dhillon

    Abstract: Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as X-Transformer and LightXML, have shown significant improvement over other XMC… ▽ More

    Submitted 28 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

  8. arXiv:2106.12751  [pdf, other

    stat.ML cs.LG

    Label Disentanglement in Partition-based Extreme Multilabel Classification

    Authors: Xuanqing Liu, Wei-Cheng Chang, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S. Dhillon

    Abstract: Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

  9. arXiv:2106.08882  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Robust Training in High Dimensions via Block Coordinate Geometric Median Descent

    Authors: Anish Acharya, Abolfazl Hashemi, Prateek Jain, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  10. arXiv:2106.07094  [pdf, other

    cs.LG cs.DC eess.SP math.OC stat.ML

    On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates

    Authors: Rudrajit Das, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update i… ▽ More

    Submitted 15 April, 2022; v1 submitted 13 June, 2021; originally announced June 2021.

  11. arXiv:2102.07800  [pdf, other

    stat.ML cs.AI cs.LG

    Top-$k$ eXtreme Contextual Bandits with Arm Hierarchy

    Authors: Rajat Sen, Alexander Rakhlin, Lexing Ying, Rahul Kidambi, Dean Foster, Daniel Hill, Inderjit Dhillon

    Abstract: Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ extreme contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-extreme realizable setting, utilizing the Inverse Gap Weigh… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

  12. arXiv:2012.04061  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Faster Non-Convex Federated Learning via Global and Local Momentum

    Authors: Rudrajit Das, Anish Acharya, Abolfazl Hashemi, Sujay Sanghavi, Inderjit S. Dhillon, Ufuk Topcu

    Abstract: We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key… ▽ More

    Submitted 24 October, 2021; v1 submitted 7 December, 2020; originally announced December 2020.

  13. arXiv:2011.14031  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    Voting based ensemble improves robustness of defensive models

    Authors: Devvrit, Minhao Cheng, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: Developing robust models against adversarial perturbations has been an active area of research and many algorithms have been proposed to train individual robust models. Taking these pretrained robust models, we aim to study whether it is possible to create an ensemble to further improve robustness. Several previous attempts tackled this problem by ensembling the soft-label prediction and have been… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  14. arXiv:2011.10643  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization

    Authors: Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay Sanghavi, Inderjit Dhillon

    Abstract: In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such co… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  15. arXiv:2009.12947  [pdf, other

    stat.ML cs.LG

    Learning from eXtreme Bandit Feedback

    Authors: Romain Lopez, Inderjit S. Dhillon, Michael I. Jordan

    Abstract: We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. Learning from extreme bandit feedback is ubiquitous in recommendation systems, in which billions of decisions are made over sets consisting of millions of choices in a single day, yielding massive observational data. In these large-scale real-world applications, supervised learning framewor… ▽ More

    Submitted 22 February, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Journal ref: AAAI Conference on Artificial Intelligence 2021

  16. Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

    Authors: Joyce Jiyoung Whang, Inderjit S. Dhillon

    Abstract: The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. A number of co-clustering techniques have been proposed including information-theoretic co-clustering and the minimum sum-squared residue co-clustering method. However, most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clust… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

    Journal ref: "Non-Exhaustive, Overlapping Co-Clustering", Proceedings of the 26th ACM Conference on Information and Knowledge Management (CIKM), pages 2367-2370, November 2017

  17. arXiv:2004.00198  [pdf, other

    cs.LG stat.ML

    Extreme Multi-label Classification from Aggregated Labels

    Authors: Yanyao Shen, Hsiang-fu Yu, Sujay Sanghavi, Inderjit Dhillon

    Abstract: Extreme multi-label classification (XMC) is the problem of finding the relevant labels for an input, from a very large universe of possible labels. We consider XMC in the setting where labels are available only for groups of samples - but not for individual ones. Current XMC approaches are not built for such multi-instance multi-label (MIML) training data, and MIML approaches do not scale to XMC s… ▽ More

    Submitted 31 March, 2020; originally announced April 2020.

  18. arXiv:2003.09229  [pdf, other

    cs.LG cs.CL stat.ML

    Learning to Encode Position for Transformer with Continuous Dynamical Model

    Authors: Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh

    Abstract: We introduce a new way of learning to encode position information for non-recurrent models, such as Transformer models. Unlike RNN and LSTM, which contain inductive bias by loading the input tokens sequentially, non-recurrent models are less sensitive to position. The main reason is that position information among input units is not inherently encoded, i.e., the models are permutation equivalent;… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

    Comments: Code to be released in https://github.com/xuanqing94/FLOATER

  19. arXiv:2002.06789  [pdf, other

    cs.LG stat.ML

    CAT: Customized Adversarial Training for Improved Robustness

    Authors: Minhao Cheng, Qi Lei, Pin-Yu Chen, Inderjit Dhillon, Cho-Jui Hsieh

    Abstract: Adversarial training has become one of the most effective methods for improving robustness of neural networks. However, it often suffers from poor generalization on both clean and perturbed data. In this paper, we propose a new algorithm, named Customized Adversarial Training (CAT), which adaptively customizes the perturbation level and the corresponding label for each training sample in adversari… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

  20. arXiv:1908.10408  [pdf, other

    cs.LG cs.IR stat.ML

    Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure

    Authors: Vikas K. Garg, Inderjit S. Dhillon, Hsiang-Fu Yu

    Abstract: The architecture of Transformer is based entirely on self-attention, and has been shown to outperform models that employ recurrence on sequence transduction tasks such as machine translation. The superior performance of Transformer has been attributed to propagating signals over shorter distances, between positions in the input and the output, compared to the recurrent architectures. We establish… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

    Comments: Initial version

  21. arXiv:1906.07437  [pdf, other

    cs.LG stat.ML

    Inverting Deep Generative models, One layer at a time

    Authors: Qi Lei, Ajil Jalal, Inderjit S. Dhillon, Alexandros G. Dimakis

    Abstract: We study the problem of inverting a deep generative model with ReLU activations. Inversion corresponds to finding a latent code vector that explains observed measurements as much as possible. In most prior works this is performed by attempting to solve a non-convex optimization problem involving the generator. In this paper we obtain several novel theoretical results for the inversion problem. W… ▽ More

    Submitted 19 June, 2019; v1 submitted 18 June, 2019; originally announced June 2019.

  22. arXiv:1906.02436  [pdf, other

    cs.LG math.OC stat.ML

    Primal-Dual Block Frank-Wolfe

    Authors: Qi Lei, Jiacheng Zhuo, Constantine Caramanis, Inderjit S. Dhillon, Alexandros G. Dimakis

    Abstract: We propose a variant of the Frank-Wolfe algorithm for solving a class of sparse/low-rank optimization problems. Our formulation includes Elastic Net, regularized SVMs and phase retrieval as special cases. The proposed Primal-Dual Block Frank-Wolfe algorithm reduces the per-iteration cost while maintaining linear convergence rate. The per iteration cost of our method depends on the structural compl… ▽ More

    Submitted 6 June, 2019; originally announced June 2019.

  23. arXiv:1905.03806  [pdf, other

    stat.ML cs.LG

    Think Globally, Act Locally: A Deep Neural Network Approach to High-Dimensional Time Series Forecasting

    Authors: Rajat Sen, Hsiang-Fu Yu, Inderjit Dhillon

    Abstract: Forecasting high-dimensional time series plays a crucial role in many applications such as demand forecasting and financial predictions. Modern datasets can have millions of correlated time-series that evolve together, i.e they are extremely high dimensional (one dimension for each individual time-series). There is a need for exploiting global patterns and coupling them with local calibration for… ▽ More

    Submitted 26 October, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

  24. arXiv:1905.03381  [pdf, other

    cs.LG cs.AI stat.ML

    AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

    Authors: Jiong Zhang, Hsiang-fu Yu, Inderjit S. Dhillon

    Abstract: Deep neural networks have yielded superior performance in many applications; however, the gradient computation in a deep model with millions of instances lead to a lengthy training process even with modern GPU/TPU hardware acceleration. In this paper, we propose AutoAssist, a simple framework to accelerate training of a deep neural network. Typically, as the training procedure evolves, the amount… ▽ More

    Submitted 8 May, 2019; originally announced May 2019.

  25. arXiv:1905.02331  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Taming Pretrained Transformers for Extreme Multi-label Text Classification

    Authors: Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, Inderjit Dhillon

    Abstract: We consider the extreme multi-label text classification (XMC) problem: given an input text, return the most relevant labels from a large label collection. For example, the input text could be a product description on Amazon.com and the labels could be product categories. XMC is an important yet challenging problem in the NLP community. Recently, deep pretrained transformer models have achieved sta… ▽ More

    Submitted 23 June, 2020; v1 submitted 6 May, 2019; originally announced May 2019.

    Comments: KDD 2020 Applied Data Track

  26. arXiv:1904.03257  [pdf, ps, other

    cs.LG cs.DB cs.DC cs.SE stat.ML

    MLSys: The New Frontier of Machine Learning Systems

    Authors: Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood , et al. (44 additional authors not shown)

    Abstract: Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development and deployment profile of modern ML methods, and the range of practical concerns that come with broader adoption. We propose to foster a ne… ▽ More

    Submitted 1 December, 2019; v1 submitted 29 March, 2019; originally announced April 2019.

  27. arXiv:1901.04684  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    The Limitations of Adversarial Training and the Blind-Spot Attack

    Authors: Huan Zhang, Hongge Chen, Zhao Song, Duane Boning, Inderjit S. Dhillon, Cho-Jui Hsieh

    Abstract: The adversarial training procedure proposed by Madry et al. (2018) is one of the most effective methods to defend against adversarial examples in deep neural networks (DNNs). In our paper, we shed some lights on the practicality and the hardness of adversarial training by showing that the effectiveness (robustness on test set) of adversarial training has a strong correlation with the distance betw… ▽ More

    Submitted 15 January, 2019; originally announced January 2019.

    Comments: Accepted by International Conference on Learning Representations (ICLR) 2019. Huan Zhang and Hongge Chen contributed equally

  28. arXiv:1812.00151  [pdf, other

    cs.LG cs.CR math.OC stat.ML

    Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

    Authors: Qi Lei, Lingfei Wu, Pin-Yu Chen, Alexandros G. Dimakis, Inderjit S. Dhillon, Michael Witbrock

    Abstract: Adversarial examples are carefully constructed modifications to an input that completely change the output of a classifier but are imperceptible to humans. Despite these successful attacks for continuous data (such as image and audio samples), generating adversarial examples for discrete structures such as text has proven significantly more challenging. In this paper we formulate the attacks with… ▽ More

    Submitted 4 April, 2019; v1 submitted 1 December, 2018; originally announced December 2018.

    Comments: In SysML 2019

  29. arXiv:1811.00641  [pdf, other

    cs.LG cs.CL math.NA stat.ML

    Online Embedding Compression for Text Classification using Low Rank Matrix Factorization

    Authors: Anish Acharya, Rahul Goel, Angeliki Metallinou, Inderjit Dhillon

    Abstract: Deep learning models have become state of the art for natural language processing (NLP) tasks, however deploying these models in production system poses significant memory constraints. Existing compression methods are either lossy or introduce significant latency. We propose a compression method that leverages low rank matrix factorization during training,to compress the word embedding layer which… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: Accepted in Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)

  30. arXiv:1805.10477  [pdf, other

    cs.LG stat.ML

    Nonlinear Inductive Matrix Completion based on One-layer Neural Networks

    Authors: Kai Zhong, Zhao Song, Prateek Jain, Inderjit S. Dhillon

    Abstract: The goal of a recommendation system is to predict the interest of a user in a given item by exploiting the existing set of ratings as well as certain user/item features. A standard approach to modeling this problem is Inductive Matrix Completion where the predicted rating is modeled as an inner product of the user and the item features projected onto a latent space. In order to learn the parameter… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

  31. arXiv:1804.09699  [pdf, other

    stat.ML cs.CR cs.CV cs.LG

    Towards Fast Computation of Certified Robustness for ReLU Networks

    Authors: Tsui-Wei Weng, Huan Zhang, Hongge Chen, Zhao Song, Cho-Jui Hsieh, Duane Boning, Inderjit S. Dhillon, Luca Daniel

    Abstract: Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low qua… ▽ More

    Submitted 2 October, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: Tsui-Wei Weng and Huan Zhang contributed equally

  32. arXiv:1803.09327  [pdf, other

    cs.LG stat.ML

    Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

    Authors: Jiong Zhang, Qi Lei, Inderjit S. Dhillon

    Abstract: Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks~(RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by it… ▽ More

    Submitted 25 March, 2018; originally announced March 2018.

    Comments: main text 13 pages, 22 pages including reference and appendix

  33. arXiv:1803.06585  [pdf, other

    cs.LG stat.ML

    Learning Long Term Dependencies via Fourier Recurrent Units

    Authors: Jiong Zhang, Yibo Lin, Zhao Song, Inderjit S. Dhillon

    Abstract: It is a known fact that training recurrent neural networks for tasks that have long term dependencies is challenging. One of the main reasons is the vanishing or exploding gradient problem, which prevents gradient information from propagating to early layers. In this paper we propose a simple recurrent architecture, the Fourier Recurrent Unit (FRU), that stabilizes the gradients that arise in its… ▽ More

    Submitted 17 March, 2018; originally announced March 2018.

  34. arXiv:1711.03440  [pdf, other

    cs.LG cs.DS stat.ML

    Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels

    Authors: Kai Zhong, Zhao Song, Inderjit S. Dhillon

    Abstract: In this paper, we consider parameter recovery for non-overlapping convolutional neural networks (CNNs) with multiple kernels. We show that when the inputs follow Gaussian distribution and the sample size is sufficiently large, the squared loss of such CNNs is $\mathit{~locally~strongly~convex}$ in a basin of attraction near the global optima for most popular activation functions, like ReLU, Leaky… ▽ More

    Submitted 8 November, 2017; originally announced November 2017.

    Comments: arXiv admin note: text overlap with arXiv:1706.03175

  35. arXiv:1706.03175  [pdf, other

    cs.LG cs.DS stat.ML

    Recovery Guarantees for One-hidden-layer Neural Networks

    Authors: Kai Zhong, Zhao Song, Prateek Jain, Peter L. Bartlett, Inderjit S. Dhillon

    Abstract: In this paper, we consider regression problems with one-hidden-layer neural networks (1NNs). We distill some properties of activation functions that lead to $\mathit{local~strong~convexity}$ in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective. Most popular nonlinear activation functions satisfy the distilled properties, including rectified linear units (ReLUs), le… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

    Comments: ICML 2017

  36. arXiv:1608.01976  [pdf, other

    stat.ML cs.LG

    Kernel Ridge Regression via Partitioning

    Authors: Rashish Tandon, Si Si, Pradeep Ravikumar, Inderjit Dhillon

    Abstract: In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR). Given n samples, the division step involves separating the points based on some underlying disjoint partition of the input space (possibly via clustering), and then computing a KRR estimate for each partition. The conquering step is simple: for each partition, we only consider its own local estimate for p… ▽ More

    Submitted 5 August, 2016; originally announced August 2016.

    Comments: 40 pages

  37. arXiv:1606.00813  [pdf, other

    stat.ML

    Generalized Root Models: Beyond Pairwise Graphical Models for Univariate Exponential Families

    Authors: David I. Inouye, Pradeep Ravikumar, Inderjit S. Dhillon

    Abstract: We present a novel k-way high-dimensional graphical model called the Generalized Root Model (GRM) that explicitly models dependencies between variable sets of size k > 2---where k = 2 is the standard pairwise graphical model. This model is based on taking the k-th root of the original sufficient statistics of any univariate exponential family with positive sufficient statistics, including the Pois… ▽ More

    Submitted 2 June, 2016; originally announced June 2016.

  38. arXiv:1605.09499  [pdf, other

    stat.ML

    Extreme Stochastic Variational Inference: Distributed and Asynchronous

    Authors: Jiong Zhang, Parameswaran Raman, Shihao Ji, Hsiang-Fu Yu, S. V. N. Vishwanathan, Inderjit S. Dhillon

    Abstract: Stochastic variational inference (SVI), the state-of-the-art algorithm for scaling variational inference to large-datasets, is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this is problematic when the number of parameters is in billions. In this paper, we propose extreme stochastic variational inference (ESVI), an asynchronous and lock-free al… ▽ More

    Submitted 3 August, 2018; v1 submitted 31 May, 2016; originally announced May 2016.

  39. arXiv:1603.03629  [pdf, other

    stat.ML

    Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies

    Authors: David I. Inouye, Pradeep Ravikumar, Inderjit S. Dhillon

    Abstract: We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models [Yang et al. 2015] did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive depe… ▽ More

    Submitted 10 June, 2016; v1 submitted 11 March, 2016; originally announced March 2016.

    Journal ref: ICML 2016

  40. arXiv:1602.06042  [pdf, ps, other

    stat.ML cs.LG

    Structured Sparse Regression via Greedy Hard-Thresholding

    Authors: Prateek Jain, Nikhil Rao, Inderjit Dhillon

    Abstract: Several learning applications require solving high-dimensional regression problems where the relevant features belong to a small number of (overlapping) groups. For very large datasets and under standard sparsity constraints, hard thresholding methods have proven to be extremely efficient, but such methods require NP hard projections when dealing with overlapping groups. In this paper, we show tha… ▽ More

    Submitted 27 May, 2016; v1 submitted 18 February, 2016; originally announced February 2016.

  41. arXiv:1509.08333  [pdf, other

    cs.LG stat.ML

    High-dimensional Time Series Prediction with Missing Values

    Authors: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon

    Abstract: High-dimensional time series prediction is needed in applications as diverse as demand forecasting and climatology. Often, such applications require methods that are both highly scalable, and deal with noisy data in terms of corruptions or missing values. Classical time series methods usually fall short of handling both these issues. In this paper, we propose to adapt matrix matrix completion appr… ▽ More

    Submitted 16 February, 2016; v1 submitted 28 September, 2015; originally announced September 2015.

  42. arXiv:1509.01404  [pdf, ps, other

    math.NA cs.CV cs.LG math.OC stat.ML

    Coordinate Descent Methods for Symmetric Nonnegative Matrix Factorization

    Authors: Arnaud Vandaele, Nicolas Gillis, Qi Lei, Kai Zhong, Inderjit Dhillon

    Abstract: Given a symmetric nonnegative matrix $A$, symmetric nonnegative matrix factorization (symNMF) is the problem of finding a nonnegative matrix $H$, usually with much fewer columns than $A$, such that $A \approx HH^T$. SymNMF can be used for data analysis and in particular for various clustering tasks. In this paper, we propose simple and very efficient coordinate descent schemes to solve this proble… ▽ More

    Submitted 31 May, 2016; v1 submitted 4 September, 2015; originally announced September 2015.

    Comments: 25 pages, 5 figures, 7 tables. Main changes: comparison with another symNMF algorithm (namely, BetaSNMF), and correction of an error in the convergence proof

    Journal ref: IEEE Transactions on Signal Processing 64 (21), pp. 5571-5584, 2016

  43. arXiv:1507.04457  [pdf, other

    stat.ML cs.LG

    Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons

    Authors: Dohyung Park, Joe Neeman, Jin Zhang, Sujay Sanghavi, Inderjit S. Dhillon

    Abstract: In this paper we consider the collaborative ranking setting: a pool of users each provides a small number of pairwise preferences between $d$ possible items; from these we need to predict preferences of the users for items they have not yet seen. We do so by fitting a rank $r$ score matrix to the pairwise data, and provide two main contributions: (a) we show that an algorithm based on convex optim… ▽ More

    Submitted 16 July, 2015; originally announced July 2015.

  44. arXiv:1505.01802  [pdf, ps, other

    cs.LG stat.ML

    Optimal Decision-Theoretic Classification Using Non-Decomposable Performance Metrics

    Authors: Nagarajan Natarajan, Oluwasanmi Koyejo, Pradeep Ravikumar, Inderjit S. Dhillon

    Abstract: We provide a general theoretical analysis of expected out-of-sample utility, also referred to as decision-theoretic classification, for non-decomposable binary classification metrics such as F-measure and Jaccard coefficient. Our key result is that the expected out-of-sample utility for many performance metrics is provably optimized by a classifier which is equivalent to a signed thresholding of t… ▽ More

    Submitted 7 May, 2015; originally announced May 2015.

  45. arXiv:1411.6081  [pdf, other

    cs.LG math.NA stat.ML

    PU Learning for Matrix Completion

    Authors: Cho-Jui Hsieh, Nagarajan Natarajan, Inderjit S. Dhillon

    Abstract: In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M, and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only "likes" or "friendships" are observed. The problem of learning from only positive and… ▽ More

    Submitted 21 November, 2014; originally announced November 2014.

  46. arXiv:1406.7321  [pdf, other

    stat.ML

    Proximal Quasi-Newton for Computationally Intensive L1-regularized M-estimators

    Authors: Kai Zhong, Ian E. H. Yen, Inderjit S. Dhillon, Pradeep Ravikumar

    Abstract: We consider the class of optimization problems arising from computationally intensive L1-regularized M-estimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the L1-regularized MLE for learning Conditional Random Fields (CRFs), which are a popular class of statistical models for varied structured prediction problems such as sequence la… ▽ More

    Submitted 23 January, 2015; v1 submitted 27 June, 2014; originally announced June 2014.

  47. arXiv:1306.3212  [pdf, ps, other

    cs.LG stat.ML

    Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation

    Authors: Cho-Jui Hsieh, Matyas A. Sustik, Inderjit S. Dhillon, Pradeep Ravikumar

    Abstract: The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program.… ▽ More

    Submitted 13 June, 2013; originally announced June 2013.

  48. arXiv:1306.0626  [pdf, other

    cs.LG cs.IT stat.ML

    Provable Inductive Matrix Completion

    Authors: Prateek Jain, Inderjit S. Dhillon

    Abstract: Consider a movie recommendation system where apart from the ratings information, side information such as user's age or movie's genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the problem of inductive matrix completion in the exact recovery setting. That is, we assume that the ratings… ▽ More

    Submitted 3 June, 2013; originally announced June 2013.

  49. arXiv:1106.2774  [pdf, ps, other

    cs.IT stat.ML

    Orthogonal Matching Pursuit with Replacement

    Authors: Prateek Jain, Ambuj Tewari, Inderjit S. Dhillon

    Abstract: In this paper, we consider the problem of compressed sensing where the goal is to recover almost all the sparse vectors using a small number of fixed linear measurements. For this problem, we propose a novel partial hard-thresholding operator that leads to a general family of iterative algorithms. While one extreme of the family yields well known hard thresholding algorithms like ITI (Iterative Th… ▽ More

    Submitted 14 June, 2011; originally announced June 2011.