Skip to main content

Showing 1–9 of 9 results for author: Gershman, S J

Searching in archive stat. Search in all archives.
.
  1. arXiv:2310.06110  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Grokking as the Transition from Lazy to Rich Training Dynamics

    Authors: Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

    Abstract: We propose that the grokking phenomenon, where the train loss of a neural network decreases much earlier than its test loss, can arise due to a neural network transitioning from lazy training dynamics to a rich, feature learning regime. To illustrate this mechanism, we study the simple setting of vanilla gradient descent on a polynomial regression problem with a two layer neural network which exhi… ▽ More

    Submitted 11 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Adding new experiments on higher degree Hermite polynomials, multi-index targets, removed DMFT analysis from this version

  2. arXiv:2012.15814  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Language-Mediated, Object-Centric Representation Learning

    Authors: Ruocheng Wang, Jiayuan Mao, Samuel J. Gershman, Jiajun Wu

    Abstract: We present Language-mediated, Object-centric Representation Learning (LORL), a paradigm for learning disentangled, object-centric scene representations from vision and language. LORL builds upon recent advances in unsupervised object discovery and segmentation, notably MONet and Slot Attention. While these algorithms learn an object-centric representation just by reconstructing the input image, LO… ▽ More

    Submitted 8 June, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: ACL 2021 Findings. First two authors contributed equally; last two authors contributed equally. Project page: https://lang-orl.github.io/

  3. arXiv:1909.05885  [pdf, other

    cs.CL cs.LG stat.ML

    Analyzing machine-learned representations: A natural language case study

    Authors: Ishita Dasgupta, Demi Guo, Samuel J. Gershman, Noah D. Goodman

    Abstract: As modern deep networks become more complex, and get closer to human-like capabilities in certain domains, the question arises of how the representations and decision rules they learn compare to the ones in humans. In this work, we study representations of sentences in one such artificial system for natural language processing. We first present a diagnostic test dataset to examine the degree of ab… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: This article supersedes a previous article arXiv:1802.04302

  4. arXiv:1805.11571  [pdf, other

    stat.ML cs.LG

    Human-in-the-Loop Interpretability Prior

    Authors: Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez

    Abstract: We often desire our models to be interpretable as well as accurate. Prior work on optimizing models for interpretability has relied on easy-to-quantify proxies for interpretability, such as sparsity or the number of operations required. In this work, we optimize for interpretability by directly including humans in the optimization loop. We develop an algorithm that minimizes the number of user stu… ▽ More

    Submitted 30 October, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

    Comments: To appear at NIPS 2018, selected for a spotlight. 13 pages (incl references and appendix)

  5. arXiv:1802.04302  [pdf, other

    cs.CL stat.ML

    Evaluating Compositionality in Sentence Embeddings

    Authors: Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller, Samuel J. Gershman, Noah D. Goodman

    Abstract: An important challenge for human-like AI is compositional semantics. Recent research has attempted to address this by using deep neural networks to learn vector space embeddings of sentences, which then serve as input to other tasks. We present a new dataset for one such task, `natural language inference' (NLI), that cannot be solved using only word-level knowledge and requires some compositionali… ▽ More

    Submitted 17 May, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

  6. arXiv:1606.02396  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Deep Successor Reinforcement Learning

    Authors: Tejas D. Kulkarni, Ardavan Saeedi, Simanta Gautam, Samuel J. Gershman

    Abstract: Learning robust value functions given raw observations and rewards is now possible with model-free and model-based deep reinforcement learning algorithms. There is a third alternative, called Successor Representations (SR), which decomposes the value function into two components -- a reward predictor and a successor map. The successor map represents the expected future state occupancy from any giv… ▽ More

    Submitted 8 June, 2016; originally announced June 2016.

    Comments: 10 pages, 6 figures

  7. arXiv:1604.00289  [pdf, other

    cs.AI cs.CV cs.LG cs.NE stat.ML

    Building Machines That Learn and Think Like People

    Authors: Brenden M. Lake, Tomer D. Ullman, Joshua B. Tenenbaum, Samuel J. Gershman

    Abstract: Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achieveme… ▽ More

    Submitted 2 November, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentary

  8. arXiv:1110.5454  [pdf, other

    stat.ML math.ST

    Distance Dependent Infinite Latent Feature Models

    Authors: Samuel J. Gershman, Peter I. Frazier, David M. Blei

    Abstract: Latent feature models are widely used to decompose data into a small number of components. Bayesian nonparametric variants of these models, which use the Indian buffet process (IBP) as a prior over latent features, allow the number of features to be determined from the data. We present a generalization of the IBP, the distance dependent Indian buffet process (dd-IBP), for modeling non-exchangeable… ▽ More

    Submitted 10 September, 2012; v1 submitted 25 October, 2011; originally announced October 2011.

    Comments: 28 pages, 9 figures

  9. arXiv:1106.2697  [pdf, other

    stat.ML stat.ME

    A Tutorial on Bayesian Nonparametric Models

    Authors: Samuel J. Gershman, David M. Blei

    Abstract: A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data… ▽ More

    Submitted 4 August, 2011; v1 submitted 14 June, 2011; originally announced June 2011.

    Comments: 28 pages, 8 figures