Skip to main content

Showing 1–29 of 29 results for author: Kusner, M J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2412.16475  [pdf, other

    cs.LG cs.AI stat.ML

    When Can Proxies Improve the Sample Complexity of Preference Learning?

    Authors: Yuchen Zhu, Daniel Augusto de Souza, Zhengyan Shi, Mengyue Yang, Pasquale Minervini, Alexander D'Amour, Matt J. Kusner

    Abstract: We address the problem of reward hacking, where maximising a proxy reward does not necessarily increase the true reward. This is a key concern for Large Language Models (LLMs), as they are often fine-tuned on human preferences that may not accurately reflect a true objective. Existing work uses various tricks such as regularisation, tweaks to the reward model, and reward hacking detectors, to limi… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

  2. arXiv:2403.07442  [pdf, other

    cs.LG stat.ML

    Proxy Methods for Domain Adaptation

    Authors: Katherine Tsai, Stephen R. Pfohl, Olawale Salaudeen, Nicole Chiou, Matt J. Kusner, Alexander D'Amour, Sanmi Koyejo, Arthur Gretton

    Abstract: We study the problem of domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. In this setting, neither the covariate shift nor the label shift assumptions apply. Our approach to adaptation employs proximal causal learning, a technique for estimating causal effects in se… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  3. arXiv:2301.11898  [pdf, other

    cs.LG cs.AI stat.ML

    DAG Learning on the Permutahedron

    Authors: Valentina Zantedeschi, Luca Franceschi, Jean Kaddour, Matt J. Kusner, Vlad Niculae

    Abstract: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimiz… ▽ More

    Submitted 10 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: The Eleventh International Conference on Learning Representations

  4. arXiv:2212.11254  [pdf, other

    stat.ML cs.AI cs.LG

    Adapting to Latent Subgroup Shifts via Concepts and Proxies

    Authors: Ibrahim Alabdulmohsin, Nicole Chiou, Alexander D'Amour, Arthur Gretton, Sanmi Koyejo, Matt J. Kusner, Stephen R. Pfohl, Olawale Salaudeen, Jessica Schrouff, Katherine Tsai

    Abstract: We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variabl… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

    Comments: Authors listed in alphabetical order

  5. arXiv:2206.15475  [pdf, other

    cs.LG stat.ME

    Causal Machine Learning: A Survey and Open Problems

    Authors: Jean Kaddour, Aengus Lynch, Qi Liu, Matt J. Kusner, Ricardo Silva

    Abstract: Causal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems the… ▽ More

    Submitted 21 July, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

    Comments: 191 pages. v02. Work in progress. Feedback and comments are highly appreciated!

  6. arXiv:2202.00661  [pdf, other

    cs.LG stat.ML

    When Do Flat Minima Optimizers Work?

    Authors: Jean Kaddour, Linqing Liu, Ricardo Silva, Matt J. Kusner

    Abstract: Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods, have been shown to improve a neural network's generalization performance over stochastic and adaptive gradient-based optimizers. Two methods have received significant attention due to their scalability: 1. Stochastic Weight Averaging (SWA), and 2. Sharpness-Aware Minimization (SAM). However, there has been l… ▽ More

    Submitted 27 January, 2023; v1 submitted 1 February, 2022; originally announced February 2022.

  7. arXiv:2201.11872  [pdf, other

    cs.LG stat.ML

    Local Latent Space Bayesian Optimization over Structured Inputs

    Authors: Natalie Maus, Haydn T. Jones, Juston S. Moore, Matt J. Kusner, John Bradshaw, Jacob R. Gardner

    Abstract: Bayesian optimization over the latent spaces of deep autoencoder models (DAEs) has recently emerged as a promising new approach for optimizing challenging black-box functions over structured, discrete, hard-to-enumerate search spaces (e.g., molecules). Here the DAE dramatically simplifies the search space by mapping inputs into a continuous latent space where familiar Bayesian optimization tools c… ▽ More

    Submitted 22 February, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

  8. arXiv:2106.05074  [pdf, other

    cs.LG stat.ME

    Operationalizing Complex Causes: A Pragmatic View of Mediation

    Authors: Limor Gultchin, David S. Watson, Matt J. Kusner, Ricardo Silva

    Abstract: We examine the problem of causal response estimation for complex objects (e.g., text, images, genomics). In this setting, classical \emph{atomic} interventions are often not available (e.g., changes to characters, pixels, DNA base-pairs). Instead, we only have access to indirect or \emph{crude} interventions (e.g., enrolling in a writing program, modifying a scene, applying a gene therapy). In thi… ▽ More

    Submitted 10 June, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Machine Learning 2021

  9. arXiv:2106.01939  [pdf, other

    cs.LG stat.ML

    Causal Effect Inference for Structured Treatments

    Authors: Jean Kaddour, Yuchen Zhu, Qi Liu, Matt J. Kusner, Ricardo Silva

    Abstract: We address the estimation of conditional average treatment effects (CATEs) for structured treatments (e.g., graphs, images, texts). Given a weak condition on the effect, we propose the generalized Robinson decomposition, which (i) isolates the causal estimand (reducing regularization bias), (ii) allows one to plug in arbitrary models for learning, and (iii) possesses a quasi-oracle convergence gua… ▽ More

    Submitted 27 October, 2021; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Camera-Ready submission

  10. arXiv:2010.04627  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Binary Decision Trees by Argmin Differentiation

    Authors: Valentina Zantedeschi, Matt J. Kusner, Vlad Niculae

    Abstract: We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters,… ▽ More

    Submitted 14 June, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

  11. arXiv:2006.06366  [pdf, other

    cs.LG stat.ML

    A Class of Algorithms for General Instrumental Variable Models

    Authors: Niki Kilbertus, Matt J. Kusner, Ricardo Silva

    Abstract: Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive erro… ▽ More

    Submitted 21 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Appeared at Neural Information Processing Systems (NeurIPS) 2020; Code at https://github.com/nikikilbertus/general-iv-models

  12. arXiv:2003.01461  [pdf, other

    cs.LG stat.ML

    Differentiable Causal Backdoor Discovery

    Authors: Limor Gultchin, Matt J. Kusner, Varun Kanade, Ricardo Silva

    Abstract: Discovering the causal effect of a decision is critical to nearly all forms of decision-making. In particular, it is a key quantity in drug development, in crafting government policy, and when implementing a real-world machine learning system. Given only observational data, confounders often obscure the true causal effect. Luckily, in some cases, it is possible to recover the causal effect by usin… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

    Comments: Published in the Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS) 2020, Palermo, Italy

  13. arXiv:1911.04227  [pdf, other

    physics.ao-ph cs.CV cs.LG stat.ML

    Cumulo: A Dataset for Learning Cloud Classes

    Authors: Valentina Zantedeschi, Fabrizio Falasca, Alyson Douglas, Richard Strange, Matt J. Kusner, Duncan Watson-Parris

    Abstract: One of the greatest sources of uncertainty in future climate projections comes from limitations in modelling clouds and in understanding how different cloud types interact with the climate system. A key first step in reducing this uncertainty is to accurately classify cloud types at high spatial and temporal resolution. In this paper, we introduce Cumulo, a benchmark dataset for training and evalu… ▽ More

    Submitted 13 October, 2022; v1 submitted 5 November, 2019; originally announced November 2019.

    Journal ref: Tackling Climate Change with Machine Learning Workshop, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

  14. arXiv:1907.01040  [pdf, other

    cs.LG cs.CY stat.ML

    The Sensitivity of Counterfactual Fairness to Unmeasured Confounding

    Authors: Niki Kilbertus, Philip J. Ball, Matt J. Kusner, Adrian Weller, Ricardo Silva

    Abstract: Causal approaches to fairness have seen substantial recent interest, both from the machine learning community and from wider parties interested in ethical prediction algorithms. In no small part, this has been due to the fact that causal models allow one to simultaneously leverage data and expert knowledge to remove discriminatory effects from predictions. However, one of the primary assumptions i… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

    Comments: published at UAI 2019

  15. arXiv:1906.05221  [pdf, other

    cs.LG physics.comp-ph stat.ML

    A Model to Search for Synthesizable Molecules

    Authors: John Bradshaw, Brooks Paige, Matt J. Kusner, Marwin H. S. Segler, José Miguel Hernández-Lobato

    Abstract: Deep generative models are able to suggest new organic molecules by generating strings, trees, and graphs representing their structure. While such models allow one to generate molecules with desirable properties, they give no guarantees that the molecules can actually be synthesized in practice. We propose a new molecule generation model, mirroring a more realistic real-world process, where (a) re… ▽ More

    Submitted 4 December, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: To appear in Advances in Neural Information Processing Systems 2019

  16. arXiv:1901.04065  [pdf, other

    cs.LG stat.ML

    Gradient Regularized Budgeted Boosting

    Authors: Zhixiang Eddie Xu, Matt J. Kusner, Kilian Q. Weinberger, Alice X. Zheng

    Abstract: As machine learning transitions increasingly towards real world applications controlling the test-time cost of algorithms becomes more and more crucial. Recent work, such as the Greedy Miser and Speedboost, incorporate test-time budget constraints into the training procedure and learn classifiers that provably stay within budget (in expectation). However, so far, these algorithms are limited to th… ▽ More

    Submitted 26 January, 2019; v1 submitted 13 January, 2019; originally announced January 2019.

  17. arXiv:1806.03461  [pdf, other

    cs.CR cs.LG stat.ML

    TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service

    Authors: Amartya Sanyal, Matt J. Kusner, Adrià Gascón, Varun Kanade

    Abstract: Machine learning methods are widely used for a variety of prediction problems. \emph{Prediction as a service} is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) a… ▽ More

    Submitted 9 June, 2018; originally announced June 2018.

    Comments: Accepted at International Conference in Machine Learning (ICML), 2018

  18. arXiv:1806.03281  [pdf, other

    stat.ML cs.CR cs.CY cs.LG

    Blind Justice: Fairness with Encrypted Sensitive Attributes

    Authors: Niki Kilbertus, Adrià Gascón, Matt J. Kusner, Michael Veale, Krishna P. Gummadi, Adrian Weller

    Abstract: Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or… ▽ More

    Submitted 8 June, 2018; originally announced June 2018.

    Comments: published at ICML 2018

    Journal ref: Proceedings of the 35th International Conference on Machine Learning, PMLR 80:2630-2639, 2018

  19. arXiv:1806.02380  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Interventions for Fairness

    Authors: Matt J. Kusner, Chris Russell, Joshua R. Loftus, Ricardo Silva

    Abstract: Most approaches in algorithmic fairness constrain machine learning methods so the resulting predictions satisfy one of several intuitive notions of fairness. While this may help private companies comply with non-discrimination laws or avoid negative publicity, we believe it is often too little, too late. By the time the training data is collected, individuals in disadvantaged groups have already s… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

  20. arXiv:1805.10970  [pdf, other

    physics.chem-ph cs.LG stat.ML

    A Generative Model For Electron Paths

    Authors: John Bradshaw, Matt J. Kusner, Brooks Paige, Marwin H. S. Segler, José Miguel Hernández-Lobato

    Abstract: Chemical reactions can be described as the stepwise redistribution of electrons in molecules. As such, reactions are often depicted using `arrow-pushing' diagrams which show this movement as a sequence of arrows. We propose an electron path prediction model (ELECTRO) to learn these sequences directly from raw reaction data. Instead of predicting product molecules directly from reactant molecules i… ▽ More

    Submitted 20 March, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

  21. arXiv:1712.01664  [pdf, other

    stat.ML cs.LG

    Learning a Generative Model for Validity in Complex Discrete Structures

    Authors: David Janz, Jos van der Westhuizen, Brooks Paige, Matt J. Kusner, José Miguel Hernández-Lobato

    Abstract: Deep generative models have been successfully used to learn representations for high-dimensional discrete spaces by representing discrete objects as sequences and employing powerful sequence-based deep models. Unfortunately, these sequence-based models often produce invalid sequences: sequences which do not represent any underlying discrete structure; invalid sequences hinder the utility of such m… ▽ More

    Submitted 1 November, 2018; v1 submitted 5 December, 2017; originally announced December 2017.

    Comments: Conference paper at ICLR 2018. Code available online

  22. arXiv:1703.06856  [pdf, other

    stat.ML cs.CY cs.LG

    Counterfactual Fairness

    Authors: Matt J. Kusner, Joshua R. Loftus, Chris Russell, Ricardo Silva

    Abstract: Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be bias… ▽ More

    Submitted 8 March, 2018; v1 submitted 20 March, 2017; originally announced March 2017.

  23. arXiv:1703.01925  [pdf, other

    stat.ML

    Grammar Variational Autoencoder

    Authors: Matt J. Kusner, Brooks Paige, José Miguel Hernández-Lobato

    Abstract: Deep generative models have been wildly successful at learning coherent latent representations for continuous data such as video and audio. However, generative modeling of discrete data such as arithmetic expressions and molecular structures still poses significant challenges. Crucially, state-of-the-art methods often produce outputs that are not valid. We make the key observation that frequently,… ▽ More

    Submitted 6 March, 2017; originally announced March 2017.

  24. arXiv:1611.04051  [pdf, other

    stat.ML cs.LG

    GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution

    Authors: Matt J. Kusner, José Miguel Hernández-Lobato

    Abstract: Generative Adversarial Networks (GAN) have limitations when the goal is to generate sequences of discrete elements. The reason for this is that samples from a distribution on discrete objects such as the multinomial are not differentiable with respect to the distribution parameters. This problem can be avoided by using the Gumbel-softmax distribution, which is a continuous approximation to a multi… ▽ More

    Submitted 12 November, 2016; originally announced November 2016.

  25. arXiv:1512.05469  [pdf, other

    stat.ML

    Private Causal Inference

    Authors: Matt J. Kusner, Yu Sun, Karthik Sridharan, Kilian Q. Weinberger

    Abstract: Causal inference deals with identifying which random variables "cause" or control other random variables. Recent advances on the topic of causal inference based on tools from statistical estimation and machine learning have resulted in practical algorithms for causal inference. Causal inference has the potential to have significant impact on medical research, prevention and control of diseases, an… ▽ More

    Submitted 20 August, 2016; v1 submitted 17 December, 2015; originally announced December 2015.

  26. arXiv:1511.06421  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Manifold Traversal: Changing Labels with Convolutional Features

    Authors: Jacob R. Gardner, Paul Upchurch, Matt J. Kusner, Yixuan Li, Kilian Q. Weinberger, Kavita Bala, John E. Hopcroft

    Abstract: Many tasks in computer vision can be cast as a "label changing" problem, where the goal is to make a semantic change to the appearance of an image or some subject in an image in order to alter the class membership. Although successful task-specific methods have been developed for some label changing applications, to date no general purpose method exists. Motivated by this we propose deep manifold… ▽ More

    Submitted 17 March, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  27. arXiv:1501.04080  [pdf, other

    stat.ML

    Differentially Private Bayesian Optimization

    Authors: Matt J. Kusner, Jacob R. Gardner, Roman Garnett, Kilian Q. Weinberger

    Abstract: Bayesian optimization is a powerful tool for fine-tuning the hyper-parameters of a wide variety of machine learning models. The success of machine learning has led practitioners in diverse real-world settings to learn classifiers for practical problems. As machine learning becomes commonplace, Bayesian optimization becomes an attractive method for practitioners to automate the process of classifie… ▽ More

    Submitted 22 February, 2015; v1 submitted 16 January, 2015; originally announced January 2015.

  28. arXiv:1412.1740  [pdf, other

    stat.ML cs.CV cs.LG

    Image Data Compression for Covariance and Histogram Descriptors

    Authors: Matt J. Kusner, Nicholas I. Kolkin, Stephen Tyree, Kilian Q. Weinberger

    Abstract: Covariance and histogram image descriptors provide an effective way to capture information about images. Both excel when used in combination with special purpose distance metrics. For covariance descriptors these metrics measure the distance along the non-Euclidean Riemannian manifold of symmetric positive definite matrices. For histogram descriptors the Earth Mover's distance measures the optimal… ▽ More

    Submitted 23 May, 2015; v1 submitted 4 December, 2014; originally announced December 2014.

  29. arXiv:1210.2771  [pdf, other

    stat.ML cs.LG

    Cost-Sensitive Tree of Classifiers

    Authors: Zhixiang Xu, Matt J. Kusner, Kilian Q. Weinberger, Minmin Chen

    Abstract: Recently, machine learning algorithms have successfully entered large-scale real-world industrial applications (e.g. search engines and email spam filters). Here, the CPU cost during test time must be budgeted and accounted for. In this paper, we address the challenge of balancing the test-time cost and the classifier accuracy in a principled fashion. The test-time cost of a classifier is often do… ▽ More

    Submitted 22 April, 2013; v1 submitted 9 October, 2012; originally announced October 2012.