Skip to main content

Showing 1–50 of 330 results for author: Tenenbaum, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26255  [pdf, ps, other

    cs.AI cs.CV cs.LG

    ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning

    Authors: Yichao Liang, Dat Nguyen, Cambridge Yang, Tianyang Li, Joshua B. Tenenbaum, Carl Edward Rasmussen, Adrian Weller, Zenna Tavares, Tom Silver, Kevin Ellis

    Abstract: Long-horizon embodied planning is challenging because the world does not only change through an agent's actions: exogenous processes (e.g., water heating, dominoes cascading) unfold concurrently with the agent's actions. We propose a framework for abstract world models that jointly learns (i) symbolic state representations and (ii) causal processes for both endogenous actions and exogenous mechani… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 41 pages. The last two authors contributed equally in co-advising

  2. arXiv:2509.06952  [pdf, ps, other

    cs.CL

    On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts

    Authors: Linlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy

    Abstract: Language use is shaped by pragmatics -- i.e., reasoning about communicative goals and norms in context. As language models (LMs) are increasingly used as conversational agents, it becomes ever more important to understand their pragmatic reasoning abilities. We propose an evaluation framework derived from Wavelength, a popular communication game where a speaker and a listener communicate about a b… ▽ More

    Submitted 26 September, 2025; v1 submitted 8 September, 2025; originally announced September 2025.

    Comments: EMNLP 2025 (Main)

  3. arXiv:2509.00074  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Language and Experience: A Computational Model of Social Learning in Complex Tasks

    Authors: Cédric Colas, Tracey Mills, Ben Prystawski, Michael Henry Tessler, Noah Goodman, Jacob Andreas, Joshua Tenenbaum

    Abstract: The ability to combine linguistic guidance from others with direct experience is central to human development, enabling safe and rapid learning in new environments. How do people integrate these two sources of knowledge, and how might AI systems? We present a computational framework that models social learning as joint probabilistic inference over structured, executable world models given sensorim… ▽ More

    Submitted 26 August, 2025; originally announced September 2025.

  4. arXiv:2508.10914  [pdf, ps, other

    cs.HC

    Generation and Evaluation in the Human Invention Process through the Lens of Game Design

    Authors: Katherine M. Collins, Graham Todd, Cedegao E. Zhang, Adrian Weller, Julian Togelius, Junyi Chu, Lionel Wong, Thomas L. Griffiths, Joshua B. Tenenbaum

    Abstract: The human ability to learn rules and solve problems has been a central concern of cognitive science research since the field's earliest days. But we do not just follow rules and solve problems given to us by others: we modify those rules, create new problems, and set new goals and tasks for ourselves and others. Arguably, even more than rule following and problem solving, human intelligence is abo… ▽ More

    Submitted 31 July, 2025; originally announced August 2025.

    Comments: CogSci conference non-archival paper

  5. arXiv:2507.21081  [pdf, ps, other

    cs.HC cs.AI

    Empathy in Explanation

    Authors: Katherine M. Collins, Kartik Chandra, Adrian Weller, Jonathan Ragan-Kelley, Joshua B. Tenenbaum

    Abstract: Why do we give the explanations we do? Recent work has suggested that we should think of explanation as a kind of cooperative social interaction, between a why-question-asker and an explainer. Here, we apply this perspective to consider the role that emotion plays in this social interaction. We develop a computational framework for modeling explainers who consider the emotional impact an explanati… ▽ More

    Submitted 16 June, 2025; originally announced July 2025.

    Comments: CogSci non-archival conference paper

  6. arXiv:2507.12821  [pdf, ps, other

    cs.AI cs.LG

    Assessing Adaptive World Models in Machines with Novel Games

    Authors: Lance Ying, Katherine M. Collins, Prafull Sharma, Cedric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob D. Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R. Allen, Joshua B. Tenenbaum

    Abstract: Human intelligence exhibits a remarkable capacity for rapid adaptation and effective problem-solving in novel and unfamiliar contexts. We argue that this profound adaptability is fundamentally linked to the efficient construction and refinement of internal representations of the environment, commonly referred to as world models, and we refer to this adaptation mechanism as world model induction. H… ▽ More

    Submitted 22 July, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

    Comments: 17 pages, 4 figures

  7. arXiv:2507.12547  [pdf, ps, other

    cs.CL cs.AI cs.PL

    Modeling Open-World Cognition as On-Demand Synthesis of Probabilistic Models

    Authors: Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gerstenberg, Timothy O'Donnell, Alexander K. Lew, Jacob D. Andreas, Joshua B. Tenenbaum, Tyler Brooke-Wilson

    Abstract: When faced with novel situations, people are able to marshal relevant considerations from a wide range of background knowledge and put these to use in inferences and predictions. What permits us to draw in globally relevant information and reason over it coherently? Here, we explore the hypothesis that people use a combination of distributed and symbolic representations to construct bespoke mental… ▽ More

    Submitted 18 July, 2025; v1 submitted 16 July, 2025; originally announced July 2025.

    Comments: Presented at CogSci 2025

  8. arXiv:2507.09409  [pdf, ps, other

    cs.MA

    Adaptive Social Learning using Theory of Mind

    Authors: Lance Ying, Ryan Truong, Joshua B. Tenenbaum, Samuel J. Gershman

    Abstract: Social learning is a powerful mechanism through which agents learn about the world from others. However, humans don't always choose to observe others, since social learning can carry time and cognitive resource costs. How do people balance social and non-social learning? In this paper, we propose a rational mentalizing model of the decision to engage in social learning. This model estimates the ut… ▽ More

    Submitted 12 July, 2025; originally announced July 2025.

    Comments: 7 pages, 4 figure; paper published at CogSci 2025

  9. arXiv:2506.21695  [pdf, ps, other

    cs.LG

    Unimodal Strategies in Density-Based Clustering

    Authors: Oron Nir, Jay Tenenbaum, Ariel Shamir

    Abstract: Density-based clustering methods often surpass centroid-based counterparts, when addressing data with noise or arbitrary data distributions common in real-world problems. In this study, we reveal a key property intrinsic to density-based clustering methods regarding the relation between the number of clusters and the neighborhood radius of core points - we empirically show that it is nearly unimod… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  10. arXiv:2506.17434  [pdf, ps, other

    cs.AI

    Resource Rational Contractualism Should Guide AI Alignment

    Authors: Sydney Levine, Matija Franklin, Tan Zhi-Xuan, Secil Yanik Guyot, Lionel Wong, Daniel Kilov, Yejin Choi, Joshua B. Tenenbaum, Noah Goodman, Seth Lazar, Iason Gabriel

    Abstract: AI systems will soon have to navigate human environments and make decisions that affect people and other AI agents whose goals and values diverge. Contractualist alignment proposes grounding those decisions in agreements that diverse stakeholders would endorse under the right conditions, yet securing such agreement at scale remains costly and slow -- even for advanced AI. We therefore propose Reso… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 24 pages, 10 figures

  11. arXiv:2506.16755  [pdf, ps, other

    cs.CL cs.AI

    Language-Informed Synthesis of Rational Agent Models for Grounded Theory-of-Mind Reasoning On-The-Fly

    Authors: Lance Ying, Ryan Truong, Katherine M. Collins, Cedegao E. Zhang, Megan Wei, Tyler Brooke-Wilson, Tan Zhi-Xuan, Lionel Wong, Joshua B. Tenenbaum

    Abstract: Drawing real world social inferences usually requires taking into account information from multiple modalities. Language is a particularly powerful source of information in social settings, especially in novel situations where language can provide both abstract information about the environment dynamics and concrete specifics about an agent that cannot be easily visually observed. In this paper, w… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 5 figures, 19 pages

  12. arXiv:2506.15623  [pdf, ps, other

    cs.CL cs.CY

    Minding the Politeness Gap in Cross-cultural Communication

    Authors: Yuka Machino, Matthias Hofer, Max Siegel, Joshua B. Tenenbaum, Robert D. Hawkins

    Abstract: Misunderstandings in cross-cultural communication often arise from subtle differences in interpretation, but it is unclear whether these differences arise from the literal meanings assigned to words or from more general pragmatic factors such as norms around politeness and brevity. In this paper, we report three experiments examining how speakers of British and American English interpret intensifi… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  13. arXiv:2506.14212  [pdf, ps, other

    cs.AI

    What's in the Box? Reasoning about Unseen Objects from Multimodal Cues

    Authors: Lance Ying, Daniel Xu, Alicia Zhang, Katherine M. Collins, Max H. Siegel, Joshua B. Tenenbaum

    Abstract: People regularly make inferences about objects in the world that they cannot see by flexibly integrating information from multiple sources: auditory and visual cues, language, and our prior beliefs and knowledge about the scene. How are we able to so flexibly integrate many sources of information to make sense of the world around us, even if we have no direct knowledge? In this work, we propose a… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Paper published at CogSci 2025

  14. arXiv:2505.23931  [pdf, ps, other

    cs.CL cs.AI

    Scaling up the think-aloud method

    Authors: Daniel Wurgaft, Ben Prystawski, Kanishk Gandhi, Cedegao E. Zhang, Joshua B. Tenenbaum, Noah D. Goodman

    Abstract: The think-aloud method, where participants voice their thoughts as they solve a task, is a valuable source of rich data about human reasoning processes. Yet, it has declined in popularity in contemporary cognitive science, largely because labor-intensive transcription and annotation preclude large sample sizes. Here, we develop methods to automate the transcription and annotation of verbal reports… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 8 pages, 4 figures. Daniel Wurgaft and Ben Prystawski contributed equally

  15. arXiv:2505.19376  [pdf, ps, other

    cs.CL

    Belief Attribution as Mental Explanation: The Role of Accuracy, Informativity, and Causality

    Authors: Lance Ying, Almog Hillel, Ryan Truong, Vikash K. Mansinghka, Joshua B. Tenenbaum, Tan Zhi-Xuan

    Abstract: A key feature of human theory-of-mind is the ability to attribute beliefs to other agents as mentalistic explanations for their behavior. But given the wide variety of beliefs that agents may hold about the world and the rich language we can use to express them, which specific beliefs are people inclined to attribute to others? In this paper, we investigate the hypothesis that people prefer to att… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: 8 pages, 3 figures; oral presentation at CogSci 2025

  16. arXiv:2505.06191  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    Neuro-Symbolic Concepts

    Authors: Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: This article presents a concept-centric paradigm for building agents that can learn continually and reason flexibly. The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts. These concepts, such as object, relation, and action concepts, are grounded on sensory inputs and actuation outputs. They are also compositional, allowing for the creation of novel concepts through their str… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: To appear in Communications of the ACM

  17. arXiv:2505.02216  [pdf, ps, other

    cs.AI

    LLM-Guided Probabilistic Program Induction for POMDP Model Estimation

    Authors: Aidan Curtis, Hao Tang, Thiago Veloso, Kevin Ellis, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Partially Observable Markov Decision Processes (POMDPs) model decision making under uncertainty. While there are many approaches to approximately solving POMDPs, we aim to address the problem of learning such models. In particular, we are interested in a subclass of POMDPs wherein the components of the model, including the observation function, reward function, transition function, and initial sta… ▽ More

    Submitted 11 May, 2025; v1 submitted 4 May, 2025; originally announced May 2025.

  18. arXiv:2504.07081  [pdf, ps, other

    cs.CL cs.AI

    Self-Steering Language Models

    Authors: Gabriel Grand, Joshua B. Tenenbaum, Vikash K. Mansinghka, Alexander K. Lew, Jacob Andreas

    Abstract: While test-time reasoning enables language models (LMs) to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a m… ▽ More

    Submitted 8 August, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted to COLM 2025

  19. arXiv:2503.20124  [pdf, ps, other

    cs.AI

    Synthesizing world models for bilevel planning

    Authors: Zergham Ahmed, Joshua B. Tenenbaum, Christopher J. Bates, Samuel J. Gershman

    Abstract: Modern reinforcement learning (RL) systems have demonstrated remarkable capabilities in complex environments, such as video games. However, they still fall short of achieving human-like sample efficiency and adaptability when learning new domains. Theory-based reinforcement learning (TBRL) is an algorithmic framework specifically designed to address this gap. Modeled on cognitive theories, TBRL le… ▽ More

    Submitted 13 July, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: Accepted to TMLR

  20. arXiv:2502.20502  [pdf, other

    cs.AI

    On Benchmarking Human-Like Intelligence in Machines

    Authors: Lance Ying, Katherine M. Collins, Lionel Wong, Ilia Sucholutsky, Ryan Liu, Adrian Weller, Tianmin Shu, Thomas L. Griffiths, Joshua B. Tenenbaum

    Abstract: Recent benchmark studies have claimed that AI has approached or even surpassed human-level performances on various cognitive tasks. However, this position paper argues that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 18 pages, 5 figures

  21. arXiv:2502.15678  [pdf, ps, other

    cs.LG

    Testing the Limits of Fine-Tuning for Improving Visual Cognition in Vision Language Models

    Authors: Luca M. Schulze Buschoff, Konstantinos Voudouris, Elif Akata, Matthias Bethge, Joshua B. Tenenbaum, Eric Schulz

    Abstract: Pre-trained vision language models still fall short of human visual cognition. In an effort to improve visual cognition and align models with human behavior, we introduce visual stimuli and human judgments on visual cognition tasks, allowing us to systematically evaluate performance across cognitive domains under a consistent environment. We fine-tune models on ground truth data for intuitive phys… ▽ More

    Submitted 30 May, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  22. arXiv:2502.11881  [pdf, ps, other

    cs.AI cs.CL

    Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models

    Authors: Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan, Lance Ying, Sydney Levine, Yang Liu, Joshua B. Tenenbaum, Yejin Choi

    Abstract: Existing LLM reasoning methods have shown impressive capabilities across various tasks, such as solving math and coding problems. However, applying these methods to scenarios without ground-truth answers or rule-based verification methods - such as tracking the mental states of an agent - remains challenging. Inspired by the sequential Monte Carlo algorithm, we introduce thought-tracing, an infere… ▽ More

    Submitted 8 August, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: COLM 2025. For code and data, see https://hyunw.kim/thought-tracing

  23. arXiv:2501.05707  [pdf, other

    cs.CL cs.AI cs.LG

    Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains

    Authors: Vighnesh Subramaniam, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Shuang Li, Igor Mordatch

    Abstract: Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data. To improve models beyond the training data, recent works have explored how LLMs can be used to generate synthetic data for autonomous self-improvement. However, successive steps of self-improvement can reach a point of diminishing returns. In this work, w… ▽ More

    Submitted 3 March, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

    Comments: ICLR 2025; 22 pages, 13 figures, 7 tables; Project page at https://llm-multiagent-ft.github.io/

  24. arXiv:2412.21149  [pdf, other

    cs.LG

    Functional Risk Minimization

    Authors: Ferran Alet, Clement Gehring, Tomás Lozano-Pérez, Kenji Kawaguchi, Joshua B. Tenenbaum, Leslie Pack Kaelbling

    Abstract: The field of Machine Learning has changed significantly since the 1970s. However, its most basic principle, Empirical Risk Minimization (ERM), remains unchanged. We propose Functional Risk Minimization~(FRM), a general framework where losses compare functions rather than outputs. This results in better performance in supervised, unsupervised, and RL experiments. In the FRM paradigm, for each data… ▽ More

    Submitted 30 December, 2024; originally announced December 2024.

  25. arXiv:2412.09115  [pdf, other

    q-bio.NC cs.CV cs.LG cs.NE

    Vision CNNs trained to estimate spatial latents learned similar ventral-stream-aligned representations

    Authors: Yudi Xie, Weichen Huang, Esther Alter, Jeremy Schwartz, Joshua B. Tenenbaum, James J. DiCarlo

    Abstract: Studies of the functional role of the primate ventral visual stream have traditionally focused on object categorization, often ignoring -- despite much prior evidence -- its role in estimating "spatial" latents such as object position and pose. Most leading ventral stream models are derived by optimizing networks for object categorization, which seems to imply that the ventral stream is also deriv… ▽ More

    Submitted 17 February, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

    Comments: 30 pages, 21 figures, ICLR 2025

  26. arXiv:2411.11196  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    PickScan: Object discovery and reconstruction from handheld interactions

    Authors: Vincent van der Brugge, Marc Pollefeys, Joshua B. Tenenbaum, Ayush Tewari, Krishna Murthy Jatavallabhula

    Abstract: Reconstructing compositional 3D representations of scenes, where each object is represented with its own 3D model, is a highly desirable capability in robotics and augmented reality. However, most existing methods rely heavily on strong appearance priors for object discovery, therefore only working on those classes of objects on which the method has been trained, or do not allow for object manipul… ▽ More

    Submitted 17 November, 2024; originally announced November 2024.

    Comments: 7 pages, 8 figures, published in the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

    ACM Class: I.4.5

  27. arXiv:2411.09627  [pdf, other

    cs.RO cs.AI cs.CV

    One-Shot Manipulation Strategy Learning by Making Contact Analogies

    Authors: Yuyao Liu, Jiayuan Mao, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using dif… ▽ More

    Submitted 23 March, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

    Comments: ICRA 2025; CoRL LEAP Workshop, 2024

  28. arXiv:2411.04987  [pdf, other

    cs.AI cs.LG cs.RO

    Few-Shot Task Learning through Inverse Generative Modeling

    Authors: Aviv Netanyahu, Yilun Du, Antonia Bronars, Jyothish Pari, Joshua Tenenbaum, Tianmin Shu, Pulkit Agrawal

    Abstract: Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative m… ▽ More

    Submitted 13 January, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

    Comments: Added acknowledgment

  29. arXiv:2410.23254  [pdf, other

    cs.RO cs.AI cs.CV

    Keypoint Abstraction using Large Models for Object-Relative Imitation Learning

    Authors: Xiaolin Fang, Bo-Ruei Huang, Jiayuan Mao, Jasmine Shone, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Generalization to novel object configurations and instances across diverse tasks and environments is a critical challenge in robotics. Keypoint-based representations have been proven effective as a succinct representation for capturing essential object features, and for establishing a reference frame in action prediction, enabling data-efficient learning of robot skills. However, their manual desi… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: CoRL LangRob Workshop, 2024

  30. arXiv:2410.23156  [pdf, other

    cs.AI cs.CV cs.LG cs.RO

    VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning

    Authors: Yichao Liang, Nishanth Kumar, Hao Tang, Adrian Weller, Joshua B. Tenenbaum, Tom Silver, João F. Henriques, Kevin Ellis

    Abstract: Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. We outline an online algorithm for inventi… ▽ More

    Submitted 28 February, 2025; v1 submitted 30 October, 2024; originally announced October 2024.

    Comments: ICLR 2025 (Spotlight)

  31. arXiv:2410.10101  [pdf, other

    cs.LG cs.AI cs.CL cs.DS

    Learning Linear Attention in Polynomial Time

    Authors: Morris Yau, Ekin Akyürek, Jiayuan Mao, Joshua B. Tenenbaum, Stefanie Jegelka, Jacob Andreas

    Abstract: Previous research has explored the computational expressivity of Transformer models in simulating Boolean circuits or Turing machines. However, the learnability of these simulators from observational data has remained an open question. Our study addresses this gap by providing the first polynomial-time learnability results (specifically strong, agnostic PAC learning) for single-layer Transformers… ▽ More

    Submitted 18 October, 2024; v1 submitted 13 October, 2024; originally announced October 2024.

  32. arXiv:2409.13507  [pdf, other

    cs.GR cs.CL cs.HC cs.SD eess.AS

    Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation

    Authors: Matthew Caren, Kartik Chandra, Joshua B. Tenenbaum, Jonathan Ragan-Kelley, Karima Ma

    Abstract: We present a method for automatically producing human-like vocal imitations of sounds: the equivalent of "sketching," but for auditory rather than visual representation. Starting with a simulated model of the human vocal tract, we first try generating vocal imitations by tuning the model's control parameters to make the synthesized vocalization match the target sound in terms of perceptually-salie… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: SIGGRAPH Asia 2024

    ACM Class: I.3.8

    Journal ref: SIGGRAPH Asia 2024

  33. arXiv:2409.10849  [pdf, other

    cs.RO cs.AI cs.HC cs.MA

    SIFToM: Robust Spoken Instruction Following through Theory of Mind

    Authors: Lance Ying, Jason Xinyu Liu, Shivam Aarya, Yizirui Fang, Stefanie Tellex, Joshua B. Tenenbaum, Tianmin Shu

    Abstract: Spoken language instructions are ubiquitous in agent collaboration. However, in human-robot collaboration, recognition accuracy for human speech is often influenced by various speech and environmental factors, such as background noise, the speaker's accents, and mispronunciation. When faced with noisy or unfamiliar auditory inputs, humans use context and prior knowledge to disambiguate the stimulu… ▽ More

    Submitted 16 September, 2024; originally announced September 2024.

    Comments: 7 pages, 4 figures

  34. arXiv:2409.08202  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    What Makes a Maze Look Like a Maze?

    Authors: Joy Hsu, Jiayuan Mao, Joshua B. Tenenbaum, Noah D. Goodman, Jiajun Wu

    Abstract: A unique aspect of human visual understanding is the ability to flexibly interpret abstract concepts: acquiring lifted rules explaining what they symbolize, grounding them across familiar and unfamiliar contexts, and making predictions or reasoning about them. While off-the-shelf vision-language models excel at making literal interpretations of images (e.g., recognizing object categories such as t… ▽ More

    Submitted 17 February, 2025; v1 submitted 12 September, 2024; originally announced September 2024.

    Comments: ICLR 2025

  35. arXiv:2409.05862  [pdf, other

    cs.CV

    Evaluating Multiview Object Consistency in Humans and Image Models

    Authors: Tyler Bonnen, Stephanie Fu, Yutong Bai, Thomas O'Connell, Yoni Friedman, Nancy Kanwisher, Joshua B. Tenenbaum, Alexei A. Efros

    Abstract: We introduce a benchmark to directly evaluate the alignment between human observers and vision models on a 3D shape inference task. We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape: given a set of images, participants identify which contain the same/different objects, despite considerable viewpoint variation. We draw from… ▽ More

    Submitted 9 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: Project page: https://tzler.github.io/MOCHI/ Code: https://github.com/tzler/mochi_code Huggingface dataset: https://huggingface.co/datasets/tzler/MOCHI

  36. arXiv:2408.12022  [pdf, other

    cs.CL cs.AI

    Understanding Epistemic Language with a Language-augmented Bayesian Theory of Mind

    Authors: Lance Ying, Tan Zhi-Xuan, Lionel Wong, Vikash Mansinghka, Joshua B. Tenenbaum

    Abstract: How do people understand and evaluate claims about others' beliefs, even though these beliefs cannot be directly observed? In this paper, we introduce a cognitive model of epistemic language interpretation, grounded in Bayesian inferences about other agents' goals, beliefs, and intentions: a language-augmented Bayesian theory-of-mind (LaBToM). By translating natural language into an epistemic ``la… ▽ More

    Submitted 18 April, 2025; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 23 pages; Published at the Transactions of the Association for Computational Linguistics (TACL); Presented at NAACL 2025

  37. arXiv:2408.08313  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Can Large Language Models Understand Symbolic Graphics Programs?

    Authors: Zeju Qiu, Weiyang Liu, Haiwen Feng, Zhen Liu, Tim Z. Xiao, Katherine M. Collins, Joshua B. Tenenbaum, Adrian Weller, Michael J. Black, Bernhard Schölkopf

    Abstract: Against the backdrop of enthusiasm for large language models (LLMs), there is a growing need to scientifically assess their capabilities and shortcomings. This is nontrivial in part because it is difficult to find tasks which the models have not encountered during training. Utilizing symbolic graphics programs, we propose a domain well-suited to test multiple spatial-semantic reasoning skills of L… ▽ More

    Submitted 27 May, 2025; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: ICLR 2025 Spotlight (v4: 47 pages, 26 figures, project page: https://sgp-bench.github.io/)

  38. arXiv:2408.03943  [pdf, other

    cs.HC cs.AI cs.LG

    Building Machines that Learn and Think with People

    Authors: Katherine M. Collins, Ilia Sucholutsky, Umang Bhatt, Kartik Chandra, Lionel Wong, Mina Lee, Cedegao E. Zhang, Tan Zhi-Xuan, Mark Ho, Vikash Mansinghka, Adrian Weller, Joshua B. Tenenbaum, Thomas L. Griffiths

    Abstract: What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

  39. arXiv:2408.02687  [pdf, other

    cs.CV

    Compositional Physical Reasoning of Objects and Events from Videos

    Authors: Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Understanding and reasoning about objects' physical properties in the natural world is a fundamental challenge in artificial intelligence. While some properties like colors and shapes can be directly observed, others, such as mass and electric charge, are hidden from the objects' visual appearance. This paper addresses the unique challenge of inferring these hidden physical properties from objects… ▽ More

    Submitted 26 May, 2025; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI 2025. arXiv admin note: text overlap with arXiv:2205.01089

  40. arXiv:2407.16770  [pdf, other

    cs.AI

    Infinite Ends from Finite Samples: Open-Ended Goal Inference as Top-Down Bayesian Filtering of Bottom-Up Proposals

    Authors: Tan Zhi-Xuan, Gloria Kang, Vikash Mansinghka, Joshua B. Tenenbaum

    Abstract: The space of human goals is tremendously vast; and yet, from just a few moments of watching a scene or reading a story, we seem to spontaneously infer a range of plausible motivations for the people and characters involved. What explains this remarkable capacity for intuiting other agents' goals, despite the infinitude of ends they might pursue? And how does this cohere with our understanding of o… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at CogSci 2024. 6 pages, 4 figures. (Appendix: 5 pages, 6 figures, 2 tables)

  41. arXiv:2407.14095  [pdf, other

    cs.GT cs.AI q-bio.NC

    People use fast, goal-directed simulation to reason about novel games

    Authors: Cedegao E. Zhang, Katherine M. Collins, Lionel Wong, Mauricio Barba, Adrian Weller, Joshua B. Tenenbaum

    Abstract: People can evaluate features of problems and their potential solutions well before we can effectively solve them. When considering a game we have never played, for instance, we might infer whether it is likely to be challenging, fair, or fun simply from hearing the game rules, prior to deciding whether to invest time in learning the game or trying to play it well. Many studies of game play have fo… ▽ More

    Submitted 7 February, 2025; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted at CogSci 2024 as a talk

  42. arXiv:2407.06169  [pdf, other

    cs.RO cs.CV cs.LG

    Potential Based Diffusion Motion Planning

    Authors: Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

    Abstract: Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potent… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICML 2024. Project page and code at https://energy-based-model.github.io/potential-motion-plan/

  43. arXiv:2406.19298  [pdf, other

    cs.CV cs.LG

    Compositional Image Decomposition with Diffusion Models

    Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

    Abstract: Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a sce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2024, Webpage: https://energy-based-model.github.io/decomp-diffusion

  44. arXiv:2406.15736  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads

    Authors: Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Joanna Matthiesen, Kevin Smith, Joshua B. Tenenbaum

    Abstract: Recent years have seen a significant progress in the general-purpose problem solving abilities of large vision and language models (LVLMs), such as ChatGPT, Gemini, etc.; some of these breakthroughs even seem to enable AI models to outperform human abilities in varied tasks that demand higher-order cognitive skills. Are the current large AI models indeed capable of generalized problem solving as h… ▽ More

    Submitted 5 December, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted at NeurIPS 2024 (Datasets and Benchmarks Track)

  45. arXiv:2406.11179  [pdf, other

    cs.LG cs.AI

    Learning Iterative Reasoning through Energy Diffusion

    Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

    Abstract: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference ba… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024, website: https://energy-based-model.github.io/ired/

  46. arXiv:2406.04302  [pdf, other

    cs.LG

    Representational Alignment Supports Effective Machine Teaching

    Authors: Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby, Weiyang Liu, Theodore R. Sumers, Michalis Korakakis, Umang Bhatt, Mark Ho, Joshua B. Tenenbaum, Brad Love, Zachary A. Pardos, Adrian Weller, Thomas L. Griffiths

    Abstract: A good teacher should not only be knowledgeable, but should also be able to communicate in a way that the student understands -- to share the student's representation of the world. In this work, we introduce a new controlled experimental setting, GRADE, to study pedagogy and representational alignment. We use GRADE through a series of machine-machine and machine-human teaching experiments to chara… ▽ More

    Submitted 4 February, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Preprint

  47. arXiv:2405.20510  [pdf, other

    cs.CV

    Physically Compatible 3D Object Modeling from a Single Image

    Authors: Minghao Guo, Bohan Wang, Pingchuan Ma, Tianyuan Zhang, Crystal Elaine Owens, Chuang Gan, Joshua B. Tenenbaum, Kaiming He, Wojciech Matusik

    Abstract: We present a computational framework that transforms single images into 3D physical objects. The visual geometry of a physical object in an image is determined by three orthogonal attributes: mechanical properties, external forces, and rest-shape geometry. Existing single-view 3D reconstruction methods often overlook this underlying composition, presuming rigidity or neglecting external forces. Co… ▽ More

    Submitted 31 December, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  48. arXiv:2405.09783  [pdf, other

    cs.LG cs.AI cs.CE

    LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

    Authors: Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik

    Abstract: Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulati… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  49. arXiv:2405.09711  [pdf, other

    cs.AI cs.CL cs.CV

    STAR: A Benchmark for Situated Reasoning in Real-World Videos

    Authors: Bo Wu, Shoubin Yu, Zhenfang Chen, Joshua B Tenenbaum, Chuang Gan

    Abstract: Reasoning in the real world is not divorced from situations. How to capture the present knowledge from surrounding situations and perform reasoning accordingly is crucial and challenging for machine intelligence. This paper introduces a new benchmark that evaluates the situated reasoning ability via situation abstraction and logic-grounded question answering for real-world videos, called Situated… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: NeurIPS

  50. arXiv:2405.09605  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models

    Authors: Anna A. Ivanova, Aalok Sathe, Benjamin Lipkin, Unnathi Kumar, Setayesh Radkani, Thomas H. Clark, Carina Kauf, Jennifer Hu, R. T. Pramod, Gabriel Grand, Vivian Paulun, Maria Ryskina, Ekin Akyürek, Ethan Wilcox, Nafisa Rashid, Leshem Choshen, Roger Levy, Evelina Fedorenko, Joshua Tenenbaum, Jacob Andreas

    Abstract: The ability to build and reason about models of the world is essential for situated language understanding. But evaluating world modeling capabilities in modern AI systems -- especially those based on language models -- has proven challenging, in large part because of the difficulty of disentangling conceptual knowledge about the world from knowledge of surface co-occurrence statistics. This paper… ▽ More

    Submitted 3 July, 2025; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted to Transactions of the ACL (TACL). Contains 25 pages (14 main), 6 figures. Visit http://ewok-core.github.io for data and code. Authors Anna Ivanova, Aalok Sathe, Benjamin Lipkin contributed equally