-
Learning from Observation: A Survey of Recent Advances
Authors:
Returaj Burnwal,
Hriday Mehta,
Nirav Pravinbhai Bhatt,
Balaraman Ravindran
Abstract:
Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert ac…
▽ More
Imitation Learning (IL) algorithms offer an efficient way to train an agent by mimicking an expert's behavior without requiring a reward function. IL algorithms often necessitate access to state and action information from expert demonstrations. Although expert actions can provide detailed guidance, requiring such action information may prove impractical for real-world applications where expert actions are difficult to obtain. To address this limitation, the concept of learning from observation (LfO) or state-only imitation learning (SOIL) has recently gained attention, wherein the imitator only has access to expert state visitation information. In this paper, we present a framework for LfO and use it to survey and classify existing LfO methods in terms of their trajectory construction, assumptions and algorithm's design choices. This survey also draws connections between several related fields like offline RL, model-based RL and hierarchical RL. Finally, we use our framework to identify open problems and suggest future research directions.
△ Less
Submitted 20 September, 2025;
originally announced September 2025.
-
VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
Authors:
Neel P. Bhatt,
Yunhao Yang,
Rohan Siva,
Pranay Samineni,
Daniel Milan,
Zhangyang Wang,
Ufuk Topcu
Abstract:
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In…
▽ More
Rapid adaptation in unseen environments is essential for scalable real-world autonomy, yet existing approaches rely on exhaustive exploration or rigid navigation policies that fail to generalize. We present VLN-Zero, a two-phase vision-language navigation framework that leverages vision-language models to efficiently construct symbolic scene graphs and enable zero-shot neurosymbolic navigation. In the exploration phase, structured prompts guide VLM-based search toward informative and diverse trajectories, yielding compact scene graph representations. In the deployment phase, a neurosymbolic planner reasons over the scene graph and environmental observations to generate executable plans, while a cache-enabled execution module accelerates adaptation by reusing previously computed task-location trajectories. By combining rapid exploration, symbolic reasoning, and cache-enabled execution, the proposed framework overcomes the computational inefficiency and poor generalization of prior vision-language navigation methods, enabling robust and scalable decision-making in unseen environments. VLN-Zero achieves 2x higher success rate compared to state-of-the-art zero-shot models, outperforms most fine-tuned baselines, and reaches goal locations in half the time with 55% fewer VLM calls on average compared to state-of-the-art models across diverse environments. Codebase, datasets, and videos for VLN-Zero are available at: https://vln-zero.github.io/.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
Functional Groups are All you Need for Chemically Interpretable Molecular Property Prediction
Authors:
Roshan Balaji,
Joe Bobby,
Nirav Pravinbhai Bhatt
Abstract:
Molecular property prediction using deep learning (DL) models has accelerated drug and materials discovery, but the resulting DL models often lack interpretability, hindering their adoption by chemists. This work proposes developing molecule representations using the concept of Functional Groups (FG) in chemistry. We introduce the Functional Group Representation (FGR) framework, a novel approach t…
▽ More
Molecular property prediction using deep learning (DL) models has accelerated drug and materials discovery, but the resulting DL models often lack interpretability, hindering their adoption by chemists. This work proposes developing molecule representations using the concept of Functional Groups (FG) in chemistry. We introduce the Functional Group Representation (FGR) framework, a novel approach to encoding molecules based on their fundamental chemical substructures. Our method integrates two types of functional groups: those curated from established chemical knowledge (FG), and those mined from a large molecular corpus using sequential pattern mining (MFG). The resulting FGR framework encodes molecules into a lower-dimensional latent space by leveraging pre-training on a large dataset of unlabeled molecules. Furthermore, the proposed framework allows the inclusion of 2D structure-based descriptors of molecules. We demonstrate that the FGR framework achieves state-of-the-art performance on a diverse range of 33 benchmark datasets spanning physical chemistry, biophysics, quantum mechanics, biological activity, and pharmacokinetics while enabling chemical interpretability. Crucially, the model's representations are intrinsically aligned with established chemical principles, allowing chemists to directly link predicted properties to specific functional groups and facilitating novel insights into structure-property relationships. Our work presents a significant step toward developing high-performing, chemically interpretable DL models for molecular discovery.
△ Less
Submitted 11 September, 2025;
originally announced September 2025.
-
NovoMolGen: Rethinking Molecular Language Model Pretraining
Authors:
Kamran Chitsaz,
Roshan Balaji,
Quentin Fournier,
Nirav Pravinbhai Bhatt,
Sarath Chandar
Abstract:
Designing de-novo molecules with desired property profiles requires efficient exploration of the vast chemical space ranging from $10^{23}$ to $10^{60}$ possible synthesizable candidates. While various deep generative models have been developed to design small molecules using diverse input representations, Molecular Large Language Models (Mol-LLMs) based on string representations have emerged as a…
▽ More
Designing de-novo molecules with desired property profiles requires efficient exploration of the vast chemical space ranging from $10^{23}$ to $10^{60}$ possible synthesizable candidates. While various deep generative models have been developed to design small molecules using diverse input representations, Molecular Large Language Models (Mol-LLMs) based on string representations have emerged as a scalable approach capable of exploring billions of molecules. However, there remains limited understanding regarding how standard language modeling practices such as textual representations, tokenization strategies, model size, and dataset scale impact molecular generation performance. In this work, we systematically investigate these critical aspects by introducing NovoMolGen, a family of transformer-based foundation models pretrained on 1.5 billion molecules for de-novo molecule generation. Through extensive empirical analyses, we identify a weak correlation between performance metrics measured during pretraining and actual downstream performance, revealing important distinctions between molecular and general NLP training dynamics. NovoMolGen establishes new state-of-the-art results, substantially outperforming prior Mol-LLMs and specialized generative models in both unconstrained and goal-directed molecular generation tasks, thus providing a robust foundation for advancing efficient and effective molecular modeling strategies.
△ Less
Submitted 22 August, 2025; v1 submitted 18 August, 2025;
originally announced August 2025.
-
Foundation Models for Logistics: Toward Certifiable, Conversational Planning Interfaces
Authors:
Yunhao Yang,
Neel P. Bhatt,
Christian Ellis,
Alvaro Velasquez,
Zhangyang Wang,
Ufuk Topcu
Abstract:
Logistics operators, from battlefield coordinators rerouting airlifts ahead of a storm to warehouse managers juggling late trucks, often face life-critical decisions that demand both domain expertise and rapid and continuous replanning. While popular methods like integer programming yield logistics plans that satisfy user-defined logical constraints, they are slow and assume an idealized mathemati…
▽ More
Logistics operators, from battlefield coordinators rerouting airlifts ahead of a storm to warehouse managers juggling late trucks, often face life-critical decisions that demand both domain expertise and rapid and continuous replanning. While popular methods like integer programming yield logistics plans that satisfy user-defined logical constraints, they are slow and assume an idealized mathematical model of the environment that does not account for uncertainty. On the other hand, large language models (LLMs) can handle uncertainty and promise to accelerate replanning while lowering the barrier to entry by translating free-form utterances into executable plans, yet they remain prone to misinterpretations and hallucinations that jeopardize safety and cost. We introduce a neurosymbolic framework that pairs the accessibility of natural-language dialogue with verifiable guarantees on goal interpretation. It converts user requests into structured planning specifications, quantifies its own uncertainty at the field and token level, and invokes an interactive clarification loop whenever confidence falls below an adaptive threshold. A lightweight model, fine-tuned on just 100 uncertainty-filtered examples, surpasses the zero-shot performance of GPT-4.1 while cutting inference latency by nearly 50%. These preliminary results highlight a practical path toward certifiable, real-time, and user-aligned decision-making for complex logistics.
△ Less
Submitted 15 July, 2025;
originally announced July 2025.
-
Real-Time Privacy Preservation for Robot Visual Perception
Authors:
Minkyu Choi,
Yunhao Yang,
Neel P. Bhatt,
Kushagra Gupta,
Sahil Shah,
Aditya Rai,
David Fridovich-Keil,
Ufuk Topcu,
Sandeep P. Chinchali
Abstract:
Many robots (e.g., iRobot's Roomba) operate based on visual observations from live video streams, and such observations may inadvertently include privacy-sensitive objects, such as personal identifiers. Existing approaches for preserving privacy rely on deep learning models, differential privacy, or cryptography. They lack guarantees for the complete concealment of all sensitive objects. Guarantee…
▽ More
Many robots (e.g., iRobot's Roomba) operate based on visual observations from live video streams, and such observations may inadvertently include privacy-sensitive objects, such as personal identifiers. Existing approaches for preserving privacy rely on deep learning models, differential privacy, or cryptography. They lack guarantees for the complete concealment of all sensitive objects. Guaranteeing concealment requires post-processing techniques and thus is inadequate for real-time video streams. We develop a method for privacy-constrained video streaming, PCVS, that conceals sensitive objects within real-time video streams. PCVS takes a logical specification constraining the existence of privacy-sensitive objects, e.g., never show faces when a person exists. It uses a detection model to evaluate the existence of these objects in each incoming frame. Then, it blurs out a subset of objects such that the existence of the remaining objects satisfies the specification. We then propose a conformal prediction approach to (i) establish a theoretical lower bound on the probability of the existence of these objects in a sequence of frames satisfying the specification and (ii) update the bound with the arrival of each subsequent frame. Quantitative evaluations show that PCVS achieves over 95 percent specification satisfaction rate in multiple datasets, significantly outperforming other methods. The satisfaction rate is consistently above the theoretical bounds across all datasets, indicating that the established bounds hold. Additionally, we deploy PCVS on robots in real-time operation and show that the robots operate normally without being compromised when PCVS conceals objects.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
Authors:
Adam Štorek,
Mukur Gupta,
Noopur Bhatt,
Aditya Gupta,
Janie Kim,
Prashast Srivastava,
Suman Jana
Abstract:
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to co…
▽ More
AI coding assistants are widely used for tasks like code generation. These tools now require large and complex contexts, automatically sourced from various origins$\unicode{x2014}$across files, projects, and contributors$\unicode{x2014}$forming part of the prompt fed to underlying LLMs. This automatic context-gathering introduces new vulnerabilities, allowing attackers to subtly poison input to compromise the assistant's outputs, potentially generating vulnerable code or introducing critical errors. We propose a novel attack, Cross-Origin Context Poisoning (XOXO), that is challenging to detect as it relies on adversarial code modifications that are semantically equivalent. Traditional program analysis techniques struggle to identify these perturbations since the semantics of the code remains correct, making it appear legitimate. This allows attackers to manipulate coding assistants into producing incorrect outputs, while shifting the blame to the victim developer. We introduce a novel, task-agnostic, black-box attack algorithm GCGS that systematically searches the transformation space using a Cayley Graph, achieving a 75.72% attack success rate on average across five tasks and eleven models, including GPT 4.1 and Claude 3.5 Sonnet v2 used by popular AI coding assistants. Furthermore, defenses like adversarial fine-tuning are ineffective against our attack, underscoring the need for new security measures in LLM-powered coding tools.
△ Less
Submitted 20 May, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
CodeSCM: Causal Analysis for Multi-Modal Code Generation
Authors:
Mukur Gupta,
Noopur Bhatt,
Suman Jana
Abstract:
In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural lang…
▽ More
In this paper, we propose CodeSCM, a Structural Causal Model (SCM) for analyzing multi-modal code generation using large language models (LLMs). By applying interventions to CodeSCM, we measure the causal effects of different prompt modalities, such as natural language, code, and input-output examples, on the model. CodeSCM introduces latent mediator variables to separate the code and natural language semantics of a multi-modal code generation prompt. Using the principles of Causal Mediation Analysis on these mediators we quantify direct effects representing the model's spurious leanings. We find that, in addition to natural language instructions, input-output examples significantly influence code generation.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Know Where You're Uncertain When Planning with Multimodal Foundation Models: A Formal Framework
Authors:
Neel P. Bhatt,
Yunhao Yang,
Rohan Siva,
Daniel Milan,
Ufuk Topcu,
Zhangyang Wang
Abstract:
Multimodal foundation models offer a promising framework for robotic perception and planning by processing sensory inputs to generate actionable plans. However, addressing uncertainty in both perception (sensory interpretation) and decision-making (plan generation) remains a critical challenge for ensuring task reliability. We present a comprehensive framework to disentangle, quantify, and mitigat…
▽ More
Multimodal foundation models offer a promising framework for robotic perception and planning by processing sensory inputs to generate actionable plans. However, addressing uncertainty in both perception (sensory interpretation) and decision-making (plan generation) remains a critical challenge for ensuring task reliability. We present a comprehensive framework to disentangle, quantify, and mitigate these two forms of uncertainty. We first introduce a framework for uncertainty disentanglement, isolating perception uncertainty arising from limitations in visual understanding and decision uncertainty relating to the robustness of generated plans.
To quantify each type of uncertainty, we propose methods tailored to the unique properties of perception and decision-making: we use conformal prediction to calibrate perception uncertainty and introduce Formal-Methods-Driven Prediction (FMDP) to quantify decision uncertainty, leveraging formal verification techniques for theoretical guarantees. Building on this quantification, we implement two targeted intervention mechanisms: an active sensing process that dynamically re-observes high-uncertainty scenes to enhance visual input quality and an automated refinement procedure that fine-tunes the model on high-certainty data, improving its capability to meet task specifications. Empirical validation in real-world and simulated robotic tasks demonstrates that our uncertainty disentanglement framework reduces variability by up to 40% and enhances task success rates by 5% compared to baselines. These improvements are attributed to the combined effect of both interventions and highlight the importance of uncertainty disentanglement, which facilitates targeted interventions that enhance the robustness and reliability of autonomous systems. Fine-tuned models, code, and datasets are available at https://uncertainty-in-planning.github.io/.
△ Less
Submitted 16 April, 2025; v1 submitted 3 November, 2024;
originally announced November 2024.
-
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability
Authors:
Kevin Wang,
Junbo Li,
Neel P. Bhatt,
Yihan Xi,
Qiang Liu,
Ufuk Topcu,
Zhangyang Wang
Abstract:
Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored. In this study, we evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks, focusing on three key aspects: feasibility, optimality, and generalizability. Through empirical evaluations on c…
▽ More
Recent advancements in Large Language Models (LLMs) have showcased their ability to perform complex reasoning tasks, but their effectiveness in planning remains underexplored. In this study, we evaluate the planning capabilities of OpenAI's o1 models across a variety of benchmark tasks, focusing on three key aspects: feasibility, optimality, and generalizability. Through empirical evaluations on constraint-heavy tasks (e.g., $\textit{Barman}$, $\textit{Tyreworld}$) and spatially complex environments (e.g., $\textit{Termes}$, $\textit{Floortile}$), we highlight o1-preview's strengths in self-evaluation and constraint-following, while also identifying bottlenecks in decision-making and memory management, particularly in tasks requiring robust spatial reasoning. Our results reveal that o1-preview outperforms GPT-4 in adhering to task constraints and managing state transitions in structured environments. However, the model often generates suboptimal solutions with redundant actions and struggles to generalize effectively in spatially complex tasks. This pilot study provides foundational insights into the planning limitations of LLMs, offering key directions for future research on improving memory management, decision-making, and generalization in LLM-based planning. Code available at https://github.com/VITA-Group/o1-planning.
△ Less
Submitted 13 October, 2024; v1 submitted 29 September, 2024;
originally announced September 2024.
-
Artificial Intelligence in Gastrointestinal Bleeding Analysis for Video Capsule Endoscopy: Insights, Innovations, and Prospects (2008-2023)
Authors:
Tanisha Singh,
Shreshtha Jha,
Nidhi Bhatt,
Palak Handa,
Nidhi Goel,
Sreedevi Indu
Abstract:
The escalating global mortality and morbidity rates associated with gastrointestinal (GI) bleeding, compounded by the complexities and limitations of traditional endoscopic methods, underscore the urgent need for a critical review of current methodologies used for addressing this condition. With an estimated 300,000 annual deaths worldwide, the demand for innovative diagnostic and therapeutic stra…
▽ More
The escalating global mortality and morbidity rates associated with gastrointestinal (GI) bleeding, compounded by the complexities and limitations of traditional endoscopic methods, underscore the urgent need for a critical review of current methodologies used for addressing this condition. With an estimated 300,000 annual deaths worldwide, the demand for innovative diagnostic and therapeutic strategies is paramount. The introduction of Video Capsule Endoscopy (VCE) has marked a significant advancement, offering a comprehensive, non-invasive visualization of the digestive tract that is pivotal for detecting bleeding sources unattainable by traditional methods. Despite its benefits, the efficacy of VCE is hindered by diagnostic challenges, including time-consuming analysis and susceptibility to human error. This backdrop sets the stage for exploring Machine Learning (ML) applications in automating GI bleeding detection within capsule endoscopy, aiming to enhance diagnostic accuracy, reduce manual labor, and improve patient outcomes. Through an exhaustive analysis of 113 papers published between 2008 and 2023, this review assesses the current state of ML methodologies in bleeding detection, highlighting their effectiveness, challenges, and prospective directions. It contributes an in-depth examination of AI techniques in VCE frame analysis, offering insights into open-source datasets, mathematical performance metrics, and technique categorization. The paper sets a foundation for future research to overcome existing challenges, advancing gastrointestinal diagnostics through interdisciplinary collaboration and innovation in ML applications.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements
Authors:
Lisong C. Sun,
Neel P. Bhatt,
Jonathan C. Liu,
Zhiwen Fan,
Zhangyang Wang,
Todd E. Humphreys,
Ufuk Topcu
Abstract:
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method…
▽ More
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: https://vita-group.github.io/MM3DGS-SLAM
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Comp4D: LLM-Guided Compositional 4D Scene Generation
Authors:
Dejia Xu,
Hanwen Liang,
Neel P. Bhatt,
Hezhen Hu,
Hanxue Liang,
Konstantinos N. Plataniotis,
Zhangyang Wang
Abstract:
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D…
▽ More
Recent advancements in diffusion models for 2D and 3D content creation have sparked a surge of interest in generating 4D content. However, the scarcity of 3D scene datasets constrains current methodologies to primarily object-centric generation. To overcome this limitation, we present Comp4D, a novel framework for Compositional 4D Generation. Unlike conventional methods that generate a singular 4D representation of the entire scene, Comp4D innovatively constructs each 4D object within the scene separately. Utilizing Large Language Models (LLMs), the framework begins by decomposing an input text prompt into distinct entities and maps out their trajectories. It then constructs the compositional 4D scene by accurately positioning these objects along their designated paths. To refine the scene, our method employs a compositional score distillation technique guided by the pre-defined trajectories, utilizing pre-trained diffusion models across text-to-image, text-to-video, and text-to-3D domains. Extensive experiments demonstrate our outstanding 4D content creation capability compared to prior arts, showcasing superior visual quality, motion fidelity, and enhanced object interactions.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
WATonoBus: Field-Tested All-Weather Autonomous Shuttle Technology
Authors:
Neel P. Bhatt,
Ruihe Zhang,
Minghao Ning,
Ahmad Reza Alghooneh,
Joseph Sun,
Pouya Panahandeh,
Ehsan Mohammadbagher,
Ted Ecclestone,
Ben MacCallum,
Ehsan Hashemi,
Amir Khajepour
Abstract:
All-weather autonomous vehicle operation poses significant challenges, encompassing modules from perception and decision-making to path planning and control. The complexity arises from the need to address adverse weather conditions such as rain, snow, and fog across the autonomy stack. Conventional model-based single-module approaches often lack holistic integration with upstream or downstream tas…
▽ More
All-weather autonomous vehicle operation poses significant challenges, encompassing modules from perception and decision-making to path planning and control. The complexity arises from the need to address adverse weather conditions such as rain, snow, and fog across the autonomy stack. Conventional model-based single-module approaches often lack holistic integration with upstream or downstream tasks. We tackle this problem by proposing a multi-module and modular system architecture with considerations for adverse weather across the perception level, through features such as snow covered curb detection, to decision-making and safety monitoring. Through daily weekday service on the WATonoBus platform for almost two years, we demonstrate that our proposed approach is capable of addressing adverse weather conditions and provide valuable insights from edge cases observed during operation.
△ Less
Submitted 14 August, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Fine-Tuning Language Models Using Formal Methods Feedback
Authors:
Yunhao Yang,
Neel P. Bhatt,
Tyler Ingebrand,
William Ward,
Steven Carr,
Zhangyang Wang,
Ufuk Topcu
Abstract:
Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models…
▽ More
Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidences, primarily in autonomous driving, to demonstrate the method's effectiveness across multiple tasks. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts
Authors:
Returaj Burnwal,
Anirban Santara,
Nirav P. Bhatt,
Balaraman Ravindran,
Gaurav Aggarwal
Abstract:
Model predictive control (MPC) is a popular approach for trajectory optimization in practical robotics applications. MPC policies can optimize trajectory parameters under kinodynamic and safety constraints and provide guarantees on safety, optimality, generalizability, interpretability, and explainability. However, some behaviors are complex and it is difficult to hand-craft an MPC objective funct…
▽ More
Model predictive control (MPC) is a popular approach for trajectory optimization in practical robotics applications. MPC policies can optimize trajectory parameters under kinodynamic and safety constraints and provide guarantees on safety, optimality, generalizability, interpretability, and explainability. However, some behaviors are complex and it is difficult to hand-craft an MPC objective function. A special class of MPC policies called Learnable-MPC addresses this difficulty using imitation learning from expert demonstrations. However, they require the demonstrator and the imitator agents to be identical which is hard to satisfy in many real world applications of robotics. In this paper, we address the practical problem of training Learnable-MPC policies when the demonstrator and the imitator do not share the same dynamics and their state spaces may have a partial overlap. We propose a novel approach that uses a generative adversarial network (GAN) to minimize the Jensen-Shannon divergence between the state-trajectory distributions of the demonstrator and the imitator. We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite and demonstrate the efficacy of our approach at learning the demonstrator's behavior without having to copy their actions.
△ Less
Submitted 7 June, 2023; v1 submitted 30 May, 2023;
originally announced May 2023.
-
What Happens When Pneu-Net Soft Robotic Actuators Get Fatigued?
Authors:
Jacqueline Libby,
Aniket A. Somwanshi,
Federico Stancati,
Gayatri Tyagi,
Aadit Patel,
Naigam Bhatt,
JohnRoss Rizzo,
S. Farokh Atashzar
Abstract:
Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macr…
▽ More
Soft actuators have attracted a great deal of interest in the context of rehabilitative and assistive robots for increasing safety and lowering costs as compared to rigid-body robotic systems. During actuation, soft actuators experience high levels of deformation, which can lead to microscale fractures in their elastomeric structure, which fatigues the system over time and eventually leads to macroscale damages and eventually failure. This paper reports finite element modeling (FEM) of pneu-nets at high angles, along with repetitive experimentation at high deformation rates, in order to study the effect and behavior of fatigue in soft robotic actuators, which would result in deviation from the ideal behavior. Comparing the FEM model and experimental data, we show that FEM can model the performance of the actuator before fatigue to a bending angle of 167 degrees with ~96% accuracy. We also show that the FEM model performance will drop to 80% due to fatigue after repetitive high-angle bending. The results of this paper objectively highlight the emergence of fatigue over cyclic activation of the system and the resulting deviation from the computational FEM model. Such behavior can be considered in future controllers to adapt the system with time-variable and non-autonomous response dynamics of soft robots.
△ Less
Submitted 6 December, 2022;
originally announced December 2022.
-
Forecasting Solar Power Generation on the basis of Predictive and Corrective Maintenance Activities
Authors:
Soham Vyas,
Yuvraj Goyal,
Neel Bhatt,
Sanskar Bhuwania,
Hardik Patel,
Shakti Mishra,
Brijesh Tripathi
Abstract:
Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact ener…
▽ More
Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar radiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact energy production. This paper presents a novel work for forecasting solar power energy production based on maintenance activities, problems observed at a power plant, and weather data. The results accomplished on the datasets obtained from the 1MW solar power plant of PDEU (our university) that has generated data set with 13 columns as daily entries from 2012 to 2020. There are 12 structured columns and one unstructured column with manual text entries about different maintenance activities, problems observed, and weather conditions daily. The unstructured column is used to create a new feature column vector using Hash Map, flag words, and stop words. The final dataset comprises five important feature vector columns based on correlation and causality analysis.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Unsupervised Detection of Lung Nodules in Chest Radiography Using Generative Adversarial Networks
Authors:
Nitish Bhatt,
David Ramon Prados,
Nedim Hodzic,
Christos Karanassios,
H. R. Tizhoosh
Abstract:
Lung nodules are commonly missed in chest radiographs. We propose and evaluate P-AnoGAN, an unsupervised anomaly detection approach for lung nodules in radiographs. P-AnoGAN modifies the fast anomaly detection generative adversarial network (f-AnoGAN) by utilizing a progressive GAN and a convolutional encoder-decoder-encoder pipeline. Model training uses only unlabelled healthy lung patches extrac…
▽ More
Lung nodules are commonly missed in chest radiographs. We propose and evaluate P-AnoGAN, an unsupervised anomaly detection approach for lung nodules in radiographs. P-AnoGAN modifies the fast anomaly detection generative adversarial network (f-AnoGAN) by utilizing a progressive GAN and a convolutional encoder-decoder-encoder pipeline. Model training uses only unlabelled healthy lung patches extracted from the Indiana University Chest X-Ray Collection. External validation and testing are performed using healthy and unhealthy patches extracted from the ChestX-ray14 and Japanese Society for Radiological Technology datasets, respectively. Our model robustly identifies patches containing lung nodules in external validation and test data with ROC-AUC of 91.17% and 87.89%, respectively. These results show unsupervised methods may be useful in challenging tasks such as lung nodule detection in radiographs.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Contrastive Semi-Supervised Learning for 2D Medical Image Segmentation
Authors:
Prashant Pandey,
Ajey Pai,
Nisarg Bhatt,
Prasenjit Das,
Govind Makharia,
Prathosh AP,
Mausam
Abstract:
Contrastive Learning (CL) is a recent representation learning approach, which encourages inter-class separability and intra-class compactness in learned image representations. Since medical images often contain multiple semantic classes in an image, using CL to learn representations of local features (as opposed to global) is important. In this work, we present a novel semi-supervised 2D medical s…
▽ More
Contrastive Learning (CL) is a recent representation learning approach, which encourages inter-class separability and intra-class compactness in learned image representations. Since medical images often contain multiple semantic classes in an image, using CL to learn representations of local features (as opposed to global) is important. In this work, we present a novel semi-supervised 2D medical segmentation solution that applies CL on image patches, instead of full images. These patches are meaningfully constructed using the semantic information of different classes obtained via pseudo labeling. We also propose a novel consistency regularization (CR) scheme, which works in synergy with CL. It addresses the problem of confirmation bias, and encourages better clustering in the feature space. We evaluate our method on four public medical segmentation datasets and a novel histopathology dataset that we introduce. Our method obtains consistent improvements over state-of-the-art semi-supervised segmentation approaches for all datasets.
△ Less
Submitted 6 August, 2021; v1 submitted 12 June, 2021;
originally announced June 2021.
-
Soft Constrained Autonomous Vehicle Navigation using Gaussian Processes and Instance Segmentation
Authors:
Bruno H. Groenner Barbosa,
Neel P. Bhatt,
Amir Khajepour,
Ehsan Hashemi
Abstract:
This paper presents a generic feature-based navigation framework for autonomous vehicles using a soft constrained Particle Filter. Selected map features, such as road and landmark locations, and vehicle states are used for designing soft constraints. After obtaining features of mapped landmarks in instance-based segmented images acquired from a monocular camera, vehicle-to-landmark distances are p…
▽ More
This paper presents a generic feature-based navigation framework for autonomous vehicles using a soft constrained Particle Filter. Selected map features, such as road and landmark locations, and vehicle states are used for designing soft constraints. After obtaining features of mapped landmarks in instance-based segmented images acquired from a monocular camera, vehicle-to-landmark distances are predicted using Gaussian Process Regression (GPR) models in a mixture of experts approach. Both mean and variance outputs of GPR models are used for implementing adaptive constraints. Experimental results confirm that the use of image segmentation features improves the vehicle-to-landmark distance prediction notably, and that the proposed soft constrained approach reliably localizes the vehicle even with reduced number of landmarks and noisy observations.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Automated Repair of Resource Leaks in Android Applications
Authors:
Bhargav Nagaraja Bhatt,
Carlo A. Furia
Abstract:
Resource leaks -- a program does not release resources it previously acquired -- are a common kind of bug in Android applications. Even with the help of existing techniques to automatically detect leaks, writing a leak-free program remains tricky. One of the reasons is Android's event-driven programming model, which complicates the understanding of an application's overall control flow.
In this…
▽ More
Resource leaks -- a program does not release resources it previously acquired -- are a common kind of bug in Android applications. Even with the help of existing techniques to automatically detect leaks, writing a leak-free program remains tricky. One of the reasons is Android's event-driven programming model, which complicates the understanding of an application's overall control flow.
In this paper, we present PlumbDroid: a technique to automatically detect and fix resource leaks in Android applications. PlumbDroid uses static analysis to find execution traces that may leak a resource. The information built for detection also undergirds automatically building a fix -- consisting of release operations performed at appropriate locations -- that removes the leak and does not otherwise affect the application's usage of the resource.
An empirical evaluation on resource leaks from the DroidLeaks curated collection demonstrates that PlumbDroid's approach is scalable, precise, and produces correct fixes for a variety of resource leak bugs: PlumbDroid automatically found and repaired 50 leaks that affect 9 widely used resources of the Android system, including all those collected by DroidLeaks for those resources; on average, it took just 2 minutes to detect and repair a leak. PlumbDroid also compares favorably to Relda2/RelFix -- the only other fully automated approach to repair Android resource leaks -- since it usually detects more leaks with higher precision and producing smaller fixes. These results indicate that PlumbDroid can provide valuable support to enhance the quality of Android applications in practice.
△ Less
Submitted 28 June, 2022; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Learning Conserved Networks from Flows
Authors:
Satya Jayadev P.,
Shankar Narasimhan,
Nirav Bhatt
Abstract:
A challenging problem in complex networks is the network reconstruction problem from data. This work deals with a class of networks denoted as conserved networks, in which a flow associated with every edge and the flows are conserved at all non-source and non-sink nodes. We propose a novel polynomial time algorithm to reconstruct conserved networks from flow data by exploiting graph theoretic prop…
▽ More
A challenging problem in complex networks is the network reconstruction problem from data. This work deals with a class of networks denoted as conserved networks, in which a flow associated with every edge and the flows are conserved at all non-source and non-sink nodes. We propose a novel polynomial time algorithm to reconstruct conserved networks from flow data by exploiting graph theoretic properties of conserved networks combined with learning techniques. We prove that exact network reconstruction is possible for arborescence networks. We also extend the methodology for reconstructing networks from noisy data and explore the reconstruction performance on arborescence networks with different structural characteristics.
△ Less
Submitted 12 April, 2020; v1 submitted 21 May, 2019;
originally announced May 2019.
-
Identifying Topology of Power Distribution Networks Based on Smart Meter Data
Authors:
Jayadev P Satya,
Nirav Bhatt,
Ramkrishna Pasumarthy,
Aravind Rajeswaran
Abstract:
In a power distribution network, the network topology information is essential for an efficient operation of the network. This information of network connectivity is not accurately available, at the low voltage level, due to uninformed changes that happen from time to time. In this paper, we propose a novel data--driven approach to identify the underlying network topology including the load phase…
▽ More
In a power distribution network, the network topology information is essential for an efficient operation of the network. This information of network connectivity is not accurately available, at the low voltage level, due to uninformed changes that happen from time to time. In this paper, we propose a novel data--driven approach to identify the underlying network topology including the load phase connectivity from time series of energy measurements. The proposed method involves the application of Principal Component Analysis (PCA) and its graph-theoretic interpretation to infer the topology from smart meter energy measurements. The method is demonstrated through simulation on randomly generated networks and also on IEEE recognized Roy Billinton distribution test system.
△ Less
Submitted 9 September, 2016;
originally announced September 2016.
-
A Novel Approach for Phase Identification in Smart Grids Using Graph Theory and Principal Component Analysis
Authors:
P Satya Jayadev,
Aravind Rajeswaran,
Nirav P Bhatt,
Ramkrishna Pasumarthy
Abstract:
Consumers with low demand, like households, are generally supplied single-phase power by connecting their service mains to one of the phases of a distribution transformer. The distribution companies face the problem of keeping a record of consumer connectivity to a phase due to uninformed changes that happen. The exact phase connectivity information is important for the efficient operation and con…
▽ More
Consumers with low demand, like households, are generally supplied single-phase power by connecting their service mains to one of the phases of a distribution transformer. The distribution companies face the problem of keeping a record of consumer connectivity to a phase due to uninformed changes that happen. The exact phase connectivity information is important for the efficient operation and control of distribution system. We propose a new data driven approach to the problem based on Principal Component Analysis (PCA) and its Graph Theoretic interpretations, using energy measurements in equally timed short intervals, generated from smart meters. We propose an algorithm for inferring phase connectivity from noisy measurements. The algorithm is demonstrated using simulated data for phase connectivities in distribution networks.
△ Less
Submitted 7 June, 2016; v1 submitted 19 November, 2015;
originally announced November 2015.
-
Deconstructing Principal Component Analysis Using a Data Reconciliation Perspective
Authors:
Shankar Narasimhan,
Nirav Bhatt
Abstract:
Data reconciliation (DR) and Principal Component Analysis (PCA) are two popular data analysis techniques in process industries. Data reconciliation is used to obtain accurate and consistent estimates of variables and parameters from erroneous measurements. PCA is primarily used as a method for reducing the dimensionality of high dimensional data and as a preprocessing technique for denoising measu…
▽ More
Data reconciliation (DR) and Principal Component Analysis (PCA) are two popular data analysis techniques in process industries. Data reconciliation is used to obtain accurate and consistent estimates of variables and parameters from erroneous measurements. PCA is primarily used as a method for reducing the dimensionality of high dimensional data and as a preprocessing technique for denoising measurements. These techniques have been developed and deployed independently of each other. The primary purpose of this article is to elucidate the close relationship between these two seemingly disparate techniques. This leads to a unified framework for applying PCA and DR. Further, we show how the two techniques can be deployed together in a collaborative and consistent manner to process data. The framework has been extended to deal with partially measured systems and to incorporate partial knowledge available about the process model.
△ Less
Submitted 2 May, 2015;
originally announced May 2015.
-
Monotonous (Semi-)Nonnegative Matrix Factorization
Authors:
Nirav Bhatt,
Arun Ayyar
Abstract:
Nonnegative matrix factorization (NMF) factorizes a non-negative matrix into product of two non-negative matrices, namely a signal matrix and a mixing matrix. NMF suffers from the scale and ordering ambiguities. Often, the source signals can be monotonous in nature. For example, in source separation problem, the source signals can be monotonously increasing or decreasing while the mixing matrix ca…
▽ More
Nonnegative matrix factorization (NMF) factorizes a non-negative matrix into product of two non-negative matrices, namely a signal matrix and a mixing matrix. NMF suffers from the scale and ordering ambiguities. Often, the source signals can be monotonous in nature. For example, in source separation problem, the source signals can be monotonously increasing or decreasing while the mixing matrix can have nonnegative entries. NMF methods may not be effective for such cases as it suffers from the ordering ambiguity. This paper proposes an approach to incorporate notion of monotonicity in NMF, labeled as monotonous NMF. An algorithm based on alternating least-squares is proposed for recovering monotonous signals from a data matrix. Further, the assumption on mixing matrix is relaxed to extend monotonous NMF for data matrix with real numbers as entries. The approach is illustrated using synthetic noisy data. The results obtained by monotonous NMF are compared with standard NMF algorithms in the literature, and it is shown that monotonous NMF estimates source signals well in comparison to standard NMF algorithms when the underlying sources signals are monotonous.
△ Less
Submitted 1 May, 2015;
originally announced May 2015.