Skip to main content

Showing 1–33 of 33 results for author: Zaheer, M

Searching in archive stat. Search in all archives.
.
  1. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  2. arXiv:2210.02415  [pdf, other

    cs.LG cs.DS stat.ML

    A Fourier Approach to Mixture Learning

    Authors: Mingda Qiao, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer

    Abstract: We revisit the problem of learning mixtures of spherical Gaussians. Given samples from mixture $\frac{1}{k}\sum_{j=1}^{k}\mathcal{N}(μ_j, I_d)$, the goal is to estimate the means $μ_1, μ_2, \ldots, μ_k \in \mathbb{R}^d$ up to a small error. The hardness of this learning problem can be measured by the separation $Δ$ defined as the minimum distance between all pairs of means. Regev and Vijayaraghava… ▽ More

    Submitted 5 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: To appear at NeurIPS 2022; v2 corrected author information

  3. arXiv:2204.03758  [pdf, other

    cs.LG cs.PL stat.ML

    Compositional Generalization and Decomposition in Neural Program Synthesis

    Authors: Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton

    Abstract: When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what we can measure is whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

    Comments: Published at the Deep Learning for Code (DL4C) Workshop at ICLR 2022

  4. arXiv:2202.05963  [pdf, other

    cs.LG cs.CR stat.ML

    Private Adaptive Optimization with Side Information

    Authors: Tian Li, Manzil Zaheer, Sashank J. Reddi, Virginia Smith

    Abstract: Adaptive optimization methods have become the default solvers for many machine learning tasks. Unfortunately, the benefits of adaptivity may degrade when training with differential privacy, as the noise added to ensure privacy reduces the effectiveness of the adaptive preconditioner. To this end, we propose AdaDPS, a general framework that uses non-sensitive side information to precondition the gr… ▽ More

    Submitted 24 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  5. arXiv:2202.01454  [pdf, other

    cs.LG stat.ML

    Deep Hierarchy in Bandits

    Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  6. arXiv:2202.00980  [pdf, other

    cs.LG stat.ML

    Robust Training of Neural Networks Using Scale Invariant Architectures

    Authors: Zhiyuan Li, Srinadh Bhojanapalli, Manzil Zaheer, Sashank J. Reddi, Sanjiv Kumar

    Abstract: In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve… ▽ More

    Submitted 18 July, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: 36 pages, 7 figures; ICML 2022

  7. arXiv:2106.05608  [pdf, other

    cs.LG cs.AI stat.ML

    Thompson Sampling with a Mixture Prior

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and… ▽ More

    Submitted 5 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  8. arXiv:2104.07061  [pdf, other

    cs.LG cs.DS physics.data-an stat.ML

    Exact and Approximate Hierarchical Clustering Using A*

    Authors: Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer, Andrew McCallum

    Abstract: Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: 30 pages, 9 figures

  9. arXiv:2102.06129  [pdf, other

    cs.LG stat.ML

    Meta-Thompson Sampling

    Authors: Branislav Kveton, Mikhail Konobeev, Manzil Zaheer, Chih-wei Hsu, Martin Mladenov, Craig Boutilier, Csaba Szepesvari

    Abstract: Efficient exploration in bandits is a fundamental online learning problem. We propose a variant of Thompson sampling that learns to explore better as it interacts with bandit instances drawn from an unknown prior. The algorithm meta-learns the prior and thus we call it MetaTS. We propose several efficient implementations of MetaTS and analyze it in Gaussian bandits. Our analysis shows the benefit… ▽ More

    Submitted 23 June, 2021; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning

  10. arXiv:2011.08474  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Federated Composite Optimization

    Authors: Honglin Yuan, Manzil Zaheer, Sashank Reddi

    Abstract: Federated Learning (FL) is a distributed learning paradigm that scales on-device learning collaboratively and privately. Standard FL algorithms such as FedAvg are primarily geared towards smooth unconstrained settings. In this paper, we study the Federated Composite Optimization (FCO) problem, in which the loss function contains a non-smooth regularizer. Such problems arise naturally in FL applica… ▽ More

    Submitted 5 June, 2021; v1 submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to ICML 2021. Code repository see https://github.com/hongliny/FCO-ICML21

  11. arXiv:2009.06851  [pdf, other

    cs.CL cs.LG stat.ML

    Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

    Authors: Xinyuan Zhang, Ruiyi Zhang, Manzil Zaheer, Amr Ahmed

    Abstract: High-quality dialogue-summary paired data is expensive to produce and domain-sensitive, making abstractive dialogue summarization a challenging task. In this work, we propose the first unsupervised abstractive dialogue summarization model for tete-a-tetes (SuTaT). Unlike standard text summarization, a dialogue summarization method should consider the multi-speaker scenario where the speakers have… ▽ More

    Submitted 14 September, 2020; originally announced September 2020.

  12. arXiv:2007.14062  [pdf, other

    cs.LG cs.CL stat.ML

    Big Bird: Transformers for Longer Sequences

    Authors: Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed

    Abstract: Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that… ▽ More

    Submitted 8 January, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2020

  13. arXiv:2006.08714  [pdf, other

    cs.LG cs.AI stat.ML

    Latent Bandits Revisited

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed, Craig Boutilier

    Abstract: A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. The primary goal of the agent is to identify the latent state, after which it can act optimally. This setting is a natural midpoint between online and offline learning---complex models can be learned offline with the agent identifying latent state online---… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: 16 pages, 2 figures

  14. arXiv:2006.08236  [pdf, other

    cs.LG cs.AI stat.ML

    Non-Stationary Off-Policy Optimization

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Yinlam Chow, Amr Ahmed

    Abstract: Off-policy learning is a framework for evaluating and optimizing policies without deploying them, from data collected by another policy. Real-world environments are typically non-stationary and the offline learned policies should adapt to these changes. To address this challenge, we study the novel problem of off-policy optimization in piecewise-stationary contextual bandits. Our proposed solution… ▽ More

    Submitted 4 April, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: AISTATS 2021; 16 pages, 2 figures

  15. arXiv:2006.05094  [pdf, other

    cs.LG stat.ML

    Meta-Learning Bandit Policies by Gradient Ascent

    Authors: Branislav Kveton, Martin Mladenov, Chih-Wei Hsu, Manzil Zaheer, Csaba Szepesvari, Craig Boutilier

    Abstract: Most bandit policies are designed to either minimize regret in any problem instance, making very few assumptions about the underlying environment, or in a Bayesian sense, assuming a prior distribution over environment parameters. The former are often too conservative in practical settings, while the latter require assumptions that are hard to verify in practice. We study bandit problems that fall… ▽ More

    Submitted 5 January, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

  16. arXiv:2004.05465  [pdf, other

    cs.LG stat.ML

    Robust Large-Margin Learning in Hyperbolic Space

    Authors: Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

    Abstract: Recently, there has been a surge of interest in representation learning in hyperbolic spaces, driven by their ability to represent hierarchical data with significantly fewer dimensions than standard Euclidean spaces. However, the viability and benefits of hyperbolic spaces for downstream machine learning tasks have received less attention. In this paper, we present, to our knowledge, the first the… ▽ More

    Submitted 1 November, 2022; v1 submitted 11 April, 2020; originally announced April 2020.

    Comments: Revision corrects error in section 3.1

  17. arXiv:2003.08197  [pdf, other

    cs.LG cs.CL stat.ML

    Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

    Authors: Paul Pu Liang, Manzil Zaheer, Yuan Wang, Amr Ahmed

    Abstract: Learning continuous representations of discrete objects such as text, users, movies, and URLs lies at the heart of many applications including language and user modeling. When using discrete objects as input to neural networks, we often ignore the underlying structures (e.g., natural groupings and similarities) and embed the objects independently into individual vectors. As a result, existing meth… ▽ More

    Submitted 11 March, 2021; v1 submitted 18 March, 2020; originally announced March 2020.

    Comments: ICLR 2021, code can be found at http://github.com/pliang279/sparse_discrete

  18. arXiv:2003.00295  [pdf, other

    cs.LG cs.DC math.OC stat.ML

    Adaptive Federated Optimization

    Authors: Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, H. Brendan McMahan

    Abstract: Federated learning is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) are often difficult to tune and exhibit unfavorable convergence behavior. In non-federated settings, adaptive optimization methods have… ▽ More

    Submitted 8 September, 2021; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Published as a conference paper at ICLR 2021

  19. arXiv:2002.06772  [pdf, other

    cs.LG stat.ML

    Differentiable Bandit Exploration

    Authors: Craig Boutilier, Chih-Wei Hsu, Branislav Kveton, Martin Mladenov, Csaba Szepesvari, Manzil Zaheer

    Abstract: Exploration policies in Bayesian bandits maximize the average reward over problem instances drawn from some distribution $\mathcal{P}$. In this work, we learn such policies for an unknown distribution $\mathcal{P}$ using samples from $\mathcal{P}$. Our approach is a form of meta-learning and exploits properties of $\mathcal{P}$ without making strong assumptions about its form. To do this, we param… ▽ More

    Submitted 9 June, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

  20. arXiv:2002.02778  [pdf, other

    cs.LG cs.CG stat.ML

    PLLay: Efficient Topological Layer based on Persistence Landscapes

    Authors: Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Sik Kim, Frederic Chazal, Larry Wasserman

    Abstract: We propose PLLay, a novel topological layer for general deep learning models based on persistence landscapes, in which we can efficiently exploit the underlying topological features of the input data structure. In this work, we show differentiability with respect to layer inputs, for a general persistent homology with arbitrary filtration. Thus, our proposed layer can be placed anywhere in the net… ▽ More

    Submitted 17 January, 2021; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: 29 pages, 7 figures

    Journal ref: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  21. arXiv:2001.01920  [pdf, other

    cs.LG stat.ML

    FedDANE: A Federated Newton-Type Method

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated learning aims to jointly learn statistical models over massively distributed remote devices. In this work, we propose FedDANE, an optimization method that we adapt from DANE, a method for classical distributed optimization, to handle the practical constraints of federated learning. We provide convergence guarantees for this method when learning over both convex and non-convex functions.… ▽ More

    Submitted 7 January, 2020; originally announced January 2020.

    Comments: Asilomar Conference on Signals, Systems, and Computers 2019

  22. arXiv:1908.07587  [pdf, other

    cs.LG cs.AI cs.GR stat.ML

    Developing Creative AI to Generate Sculptural Objects

    Authors: Songwei Ge, Austin Dill, Eunsu Kang, Chun-Liang Li, Lingyao Zhang, Manzil Zaheer, Barnabas Poczos

    Abstract: We explore the intersection of human and machine creativity by generating sculptural objects through machine learning. This research raises questions about both the technical details of automatic art generation and the interaction between AI and people, as both artists and the audience of art. We introduce two algorithms for generating 3D point clouds and then discuss their actualization as sculpt… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: In the Proceedings of International Symposium on Electronic Art (ISEA 2019)

  23. arXiv:1908.01760  [pdf, other

    cs.LG stat.ML

    The Myths of Our Time: Fake News

    Authors: Vít Růžička, Eunsu Kang, David Gordon, Ankita Patel, Jacqui Fashimpaur, Manzil Zaheer

    Abstract: While the purpose of most fake news is misinformation and political propaganda, our team sees it as a new type of myth that is created by people in the age of internet identities and artificial intelligence. Seeking insights on the fear and desire hidden underneath these modified or generated stories, we use machine learning methods to generate fake articles and present them in the form of an onli… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

    Comments: 5 pages, 5 figures, in proceedings of International Symposium on Electronic Art 2019 (ISEA)

    Journal ref: Proceedings of International Symposium on Electronic Art 2019 (ISEA), pages 494-498

  24. arXiv:1907.04651  [pdf, other

    cs.LG cs.AI stat.ML

    Incrementally Learning Functions of the Return

    Authors: Brendan Bennett, Wesley Chung, Muhammad Zaheer, Vincent Liu

    Abstract: Temporal difference methods enable efficient estimation of value functions in reinforcement learning in an incremental fashion, and are of broader interest because they correspond learning as observed in biological systems. Standard value functions correspond to the expected value of a sum of discounted returns. While this formulation is often sufficient for many purposes, it would often be useful… ▽ More

    Submitted 5 July, 2019; originally announced July 2019.

  25. arXiv:1906.08947  [pdf, other

    cs.LG stat.ML

    Randomized Exploration in Generalized Linear Bandits

    Authors: Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the num… ▽ More

    Submitted 10 July, 2023; v1 submitted 21 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistic

  26. arXiv:1902.01967  [pdf, other

    cs.LG stat.ML

    Exchangeable Generative Models with Flow Scans

    Authors: Christopher Bender, Kevin O'Connor, Yang Li, Juan Jose Garcia, Manzil Zaheer, Junier Oliva

    Abstract: In this work, we develop a new approach to generative density estimation for exchangeable, non-i.i.d. data. The proposed framework, FlowScan, combines invertible flow transformations with a sorted scan to flexibly model the data while preserving exchangeability. Unlike most existing methods, FlowScan exploits the intradependencies within sets to learn both global and local structure. FlowScan repr… ▽ More

    Submitted 18 September, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

  27. arXiv:1812.06127  [pdf, other

    cs.LG stat.ML

    Federated Optimization in Heterogeneous Networks

    Authors: Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, Virginia Smith

    Abstract: Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedPr… ▽ More

    Submitted 21 April, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

    Comments: MLSys 2020

  28. arXiv:1810.05795  [pdf, other

    cs.LG stat.ML

    Point Cloud GAN

    Authors: Chun-Liang Li, Manzil Zaheer, Yang Zhang, Barnabas Poczos, Ruslan Salakhutdinov

    Abstract: Generative Adversarial Networks (GAN) can achieve promising performance on learning complex data distributions on different types of data. In this paper, we first show a straightforward extension of existing GAN algorithm is not applicable to point clouds, because the constraint required for discriminators is undefined for set data. We propose a two fold modification to GAN algorithm for learning… ▽ More

    Submitted 13 October, 2018; originally announced October 2018.

  29. arXiv:1805.08836  [pdf, other

    math.ST cs.IT stat.ML

    Nonparametric Density Estimation under Adversarial Losses

    Authors: Shashank Singh, Ananya Uppal, Boyue Li, Chun-Liang Li, Manzil Zaheer, Barnabás Póczos

    Abstract: We study minimax convergence rates of nonparametric density estimation under a large class of loss functions called "adversarial losses", which, besides classical $\mathcal{L}^p$ losses, includes maximum mean discrepancy (MMD), Wasserstein distance, and total variation distance. These losses are closely related to the losses encoded by discriminator networks in generative adversarial networks (GAN… ▽ More

    Submitted 28 October, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

  30. arXiv:1801.09819  [pdf, other

    stat.ML

    Transformation Autoregressive Networks

    Authors: Junier B. Oliva, Avinava Dubey, Manzil Zaheer, Barnabás Póczos, Ruslan Salakhutdinov, Eric P. Xing, Jeff Schneider

    Abstract: The fundamental task of general density estimation $p(x)$ has been of keen interest to machine learning. In this work, we attempt to systematically characterize methods for density estimation. Broadly speaking, most of the existing methods can be categorized into either using: \textit{a}) autoregressive models to estimate the conditional factors of the chain rule, $p(x_{i}\, |\, x_{i-1}, \ldots)$;… ▽ More

    Submitted 23 October, 2018; v1 submitted 29 January, 2018; originally announced January 2018.

    Journal ref: ICML 2018

  31. arXiv:1711.11179  [pdf, other

    cs.LG stat.ML

    State Space LSTM Models with Particle MCMC Inference

    Authors: Xun Zheng, Manzil Zaheer, Amr Ahmed, Yuan Wang, Eric P Xing, Alexander J Smola

    Abstract: Long Short-Term Memory (LSTM) is one of the most powerful sequence models. Despite the strong performance, however, it lacks the nice interpretability as in state space models. In this paper, we present a way to combine the best of both worlds by introducing State Space LSTM (SSL) models that generalizes the earlier work \cite{zaheer2017latent} of combining topic models with LSTM. However, unlike… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

  32. arXiv:1704.00003  [pdf, other

    cs.LG stat.ML

    Spectral Methods for Nonparametric Models

    Authors: Hsiao-Yu Fish Tung, Chao-Yuan Wu, Manzil Zaheer, Alexander J. Smola

    Abstract: Nonparametric models are versatile, albeit computationally expensive, tool for modeling mixture models. In this paper, we introduce spectral methods for the two most popular nonparametric models: the Indian Buffet Process (IBP) and the Hierarchical Dirichlet Process (HDP). We show that using spectral methods for the inference of nonparametric models are computationally and statistically efficient.… ▽ More

    Submitted 30 March, 2017; originally announced April 2017.

    Comments: Keywords: Spectral Methods, Indian Buffet Process, Hierarchical Dirichlet Process

  33. arXiv:1703.06114  [pdf, other

    cs.LG stat.ML

    Deep Sets

    Authors: Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhutdinov, Alexander Smola

    Abstract: We study the problem of designing models for machine learning tasks defined on \emph{sets}. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics \cite{poczos13aistats}, to anomaly detection in piezometer data of… ▽ More

    Submitted 14 April, 2018; v1 submitted 10 March, 2017; originally announced March 2017.

    Comments: NIPS 2017