Skip to main content

Showing 1–5 of 5 results for author: Kao, J T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.09943  [pdf, ps, other

    cs.CV cs.AI

    CausalVQA: A Physically Grounded Causal Reasoning Benchmark for Video Models

    Authors: Aaron Foss, Chloe Evans, Sasha Mitts, Koustuv Sinha, Ammar Rizvi, Justine T. Kao

    Abstract: We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models' understanding of causality in the physical world. Existing VQA benchmarks either tend to focus on surface perceptual understanding of real-world videos, or on narrow physical reasoning questions created using simulation environments. CausalVQA fills an important gap b… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: 35 pages, 3 figures, Submitted to NeurIPS2025 benchmark track

    ACM Class: I.2.10; I.4.8

  2. arXiv:2506.09849  [pdf, ps, other

    cs.CV

    IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

    Authors: Florian Bordes, Quentin Garrido, Justine T Kao, Adina Williams, Michael Rabbat, Emmanuel Dupoux

    Abstract: We present IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the original IntPhys benchmark, IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. These conditions are inspired by research into intuitive physical understanding emerging dur… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  3. arXiv:2411.13904  [pdf, other

    cs.CL

    Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning

    Authors: Song Jiang, Da JU, Andrew Cohen, Sasha Mitts, Aaron Foss, Justine T Kao, Xian Li, Yuandong Tian

    Abstract: How are LLM-based agents used in the future? While many of the existing work on agents has focused on improving the performance of a specific family of objective and challenging tasks, in this work, we take a different perspective by thinking about full delegation: agents take over humans' routine decision-making processes and are trusted by humans to find solutions that fit people's personalized… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  4. arXiv:2410.16456  [pdf, other

    cs.CL

    To the Globe (TTG): Towards Language-Driven Guaranteed Travel Planning

    Authors: Da JU, Song Jiang, Andrew Cohen, Aaron Foss, Sasha Mitts, Arman Zharmagambetov, Brandon Amos, Xian Li, Justine T Kao, Maryam Fazel-Zarandi, Yuandong Tian

    Abstract: Travel planning is a challenging and time-consuming task that aims to find an itinerary which satisfies multiple, interdependent constraints regarding flights, accommodations, attractions, and other travel arrangements. In this paper, we propose To the Globe (TTG), a real-time demo system that takes natural language requests from users, translates it to symbolic form via a fine-tuned Large Languag… ▽ More

    Submitted 21 October, 2024; originally announced October 2024.

    Journal ref: EMNLP 2024 Demo Track

  5. arXiv:1908.11404  [pdf

    cs.LG stat.ML

    Active Learning for Domain Classification in a Commercial Spoken Personal Assistant

    Authors: Xi C. Chen, Adithya Sagar, Justine T. Kao, Tony Y. Li, Christopher Klein, Stephen Pulman, Ashish Garg, Jason D. Williams

    Abstract: We describe a method for selecting relevant new training data for the LSTM-based domain selection component of our personal assistant system. Adding more annotated training data for any ML system typically improves accuracy, but only if it provides examples not already adequately covered in the existing data. However, obtaining, selecting, and labeling relevant data is expensive. This work present… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.