-
Adaptive Stress Testing Black-Box LLM Planners
Authors:
Neeloy Chakraborty,
John Pohovey,
Melkior Ornik,
Katherine Driggs-Campbell
Abstract:
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing black-box methods often detect hallucinations by identifying inconsis…
▽ More
Large language models (LLMs) have recently demonstrated success in generalizing across decision-making tasks including planning, control and prediction, but their tendency to hallucinate unsafe and undesired outputs poses risks. We argue that detecting such failures is necessary, especially in safety-critical scenarios. Existing black-box methods often detect hallucinations by identifying inconsistencies across multiple samples. Many of these approaches typically introduce prompt perturbations like randomizing detail order or generating adversarial inputs, with the intuition that a confident model should produce stable outputs. We first perform a manual case study showing that other forms of perturbations (e.g., adding noise, removing sensor details) cause LLMs to hallucinate in a driving environment. We then propose a novel method for efficiently searching the space of prompt perturbations using Adaptive Stress Testing (AST) with Monte-Carlo Tree Search (MCTS). Our AST formulation enables discovery of scenarios and prompts that cause language models to act with high uncertainty. By generating MCTS prompt perturbation trees across diverse scenarios, we show that offline analyses can be used at runtime to automatically generate prompts that influence model uncertainty, and to inform real-time trust assessments of an LLM.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
LIT: Large Language Model Driven Intention Tracking for Proactive Human-Robot Collaboration -- A Robot Sous-Chef Application
Authors:
Zhe Huang,
John Pohovey,
Ananya Yammanuru,
Katherine Driggs-Campbell
Abstract:
Large Language Models (LLM) and Vision Language Models (VLM) enable robots to ground natural language prompts into control actions to achieve tasks in an open world. However, when applied to a long-horizon collaborative task, this formulation results in excessive prompting for initiating or clarifying robot actions at every step of the task. We propose Language-driven Intention Tracking (LIT), lev…
▽ More
Large Language Models (LLM) and Vision Language Models (VLM) enable robots to ground natural language prompts into control actions to achieve tasks in an open world. However, when applied to a long-horizon collaborative task, this formulation results in excessive prompting for initiating or clarifying robot actions at every step of the task. We propose Language-driven Intention Tracking (LIT), leveraging LLMs and VLMs to model the human user's long-term behavior and to predict the next human intention to guide the robot for proactive collaboration. We demonstrate smooth coordination between a LIT-based collaborative robot and the human user in collaborative cooking tasks.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Neural Informed RRT*: Learning-based Path Planning with Point Cloud State Representations under Admissible Ellipsoidal Constraints
Authors:
Zhe Huang,
Hongyu Chen,
John Pohovey,
Katherine Driggs-Campbell
Abstract:
Sampling-based planning algorithms like Rapidly-exploring Random Tree (RRT) are versatile in solving path planning problems. RRT* offers asymptotic optimality but requires growing the tree uniformly over the free space, which leaves room for efficiency improvement. To accelerate convergence, rule-based informed approaches sample states in an admissible ellipsoidal subset of the space determined by…
▽ More
Sampling-based planning algorithms like Rapidly-exploring Random Tree (RRT) are versatile in solving path planning problems. RRT* offers asymptotic optimality but requires growing the tree uniformly over the free space, which leaves room for efficiency improvement. To accelerate convergence, rule-based informed approaches sample states in an admissible ellipsoidal subset of the space determined by the current path cost. Learning-based alternatives model the topology of the free space and infer the states close to the optimal path to guide planning. We propose Neural Informed RRT* to combine the strengths from both sides. We define point cloud representations of free states. We perform Neural Focus, which constrains the point cloud within the admissible ellipsoidal subset from Informed RRT*, and feeds into PointNet++ for refined guidance state inference. In addition, we introduce Neural Connect to build connectivity of the guidance state set and further boost performance in challenging planning problems. Our method surpasses previous works in path planning benchmarks while preserving probabilistic completeness and asymptotic optimality. We deploy our method on a mobile robot and demonstrate real world navigation around static obstacles and dynamic humans. Code is available at https://github.com/tedhuang96/nirrt_star.
△ Less
Submitted 7 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.