-
DAGDiff: Guiding Dual-Arm Grasp Diffusion to Stable and Collision-Free Grasps
Authors:
Md Faizal Karim,
Vignesh Vembar,
Keshab Patra,
Gaurav Singh,
K Madhava Krishna
Abstract:
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an en…
▽ More
Reliable dual-arm grasping is essential for manipulating large and complex objects but remains a challenging problem due to stability, collision, and generalization requirements. Prior methods typically decompose the task into two independent grasp proposals, relying on region priors or heuristics that limit generalization and provide no principled guarantee of stability. We propose DAGDiff, an end-to-end framework that directly denoises to grasp pairs in the SE(3) x SE(3) space. Our key insight is that stability and collision can be enforced more effectively by guiding the diffusion process with classifier signals, rather than relying on explicit region detection or object priors. To this end, DAGDiff integrates geometry-, stability-, and collision-aware guidance terms that steer the generative process toward grasps that are physically valid and force-closure compliant. We comprehensively evaluate DAGDiff through analytical force-closure checks, collision analysis, and large-scale physics-based simulations, showing consistent improvements over previous work on these metrics. Finally, we demonstrate that our framework generates dual-arm grasps directly on real-world point clouds of previously unseen objects, which are executed on a heterogeneous dual-arm setup where two manipulators reliably grasp and lift them.
△ Less
Submitted 29 September, 2025; v1 submitted 25 September, 2025;
originally announced September 2025.
-
Continuous Donoho-Elad Spark Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Donoho and Elad \textit{[Proc. Natl. Acad. Sci. USA, 2003]} introduced the important notion of the spark of a frame, using which they derived a fundamental uncertainty principle. Based on spark, they also provided a necessary and sufficient condition for the uniqueness of sparse solutions to the NP-hard $\ell_0$-minimization problem. In this nano note, we show that the notion of spark can be exten…
▽ More
Donoho and Elad \textit{[Proc. Natl. Acad. Sci. USA, 2003]} introduced the important notion of the spark of a frame, using which they derived a fundamental uncertainty principle. Based on spark, they also provided a necessary and sufficient condition for the uniqueness of sparse solutions to the NP-hard $\ell_0$-minimization problem. In this nano note, we show that the notion of spark can be extended to linear maps whose domains are measure spaces. Using this generalization, we derive an uncertainty principle and provide a sufficient condition for the existence of sparse solutions to linear systems on measure spaces.
△ Less
Submitted 1 August, 2025;
originally announced September 2025.
-
MonoMPC: Monocular Vision Based Navigation with Learned Collision Model and Risk-Aware Model Predictive Control
Authors:
Basant Sharma,
Prajyot Jadhav,
Pranjal Paul,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments.
We propose an alternative approach: instead of using noisy estim…
▽ More
Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments.
We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. Our joint learning pipeline co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures optimal variance in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show 9x and 7x improvements in success rates over NoMaD and the ROS stack, respectively. Ablation studies further validate the effectiveness of our design choices.
△ Less
Submitted 10 August, 2025;
originally announced August 2025.
-
Diffusion-FS: Multimodal Free-Space Prediction via Diffusion for Autonomous Driving
Authors:
Keshav Gupta,
Tejas S. Stanley,
Pranjal Paul,
Arun K. Singh,
K. Madhava Krishna
Abstract:
Drivable Free-space prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV-centric rep…
▽ More
Drivable Free-space prediction is a fundamental and crucial problem in autonomous driving. Recent works have addressed the problem by representing the entire non-obstacle road regions as the free-space. In contrast our aim is to estimate the driving corridors that are a navigable subset of the entire road region. Unfortunately, existing corridor estimation methods directly assume a BEV-centric representation, which is hard to obtain. In contrast, we frame drivable free-space corridor prediction as a pure image perception task, using only monocular camera input. However such a formulation poses several challenges as one doesn't have the corresponding data for such free-space corridor segments in the image. Consequently, we develop a novel self-supervised approach for free-space sample generation by leveraging future ego trajectories and front-view camera images, making the process of visual corridor estimation dependent on the ego trajectory. We then employ a diffusion process to model the distribution of such segments in the image. However, the existing binary mask-based representation for a segment poses many limitations. Therefore, we introduce ContourDiff, a specialized diffusion-based architecture that denoises over contour points rather than relying on binary mask representations, enabling structured and interpretable free-space predictions. We evaluate our approach qualitatively and quantitatively on both nuScenes and CARLA, demonstrating its effectiveness in accurately predicting safe multimodal navigable corridors in the image.
△ Less
Submitted 24 July, 2025;
originally announced July 2025.
-
p-adic Ghobber-Jaming Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\{τ_j\}_{j=1}^n$ and $\{ω_k\}_{k=1}^n$ be two orthonormal bases for a finite dimensional p-adic Hilbert space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*} \displaystyle \max_{j \in M, k \in N}|\langle τ_j, ω_k \rangle|<1, \end{align*} where $o(M)$ is the cardinality of $M$. Then for all $x \in \mathcal{X}$, we show that \begin{align} (1) \quad \quad \quad \qua…
▽ More
Let $\{τ_j\}_{j=1}^n$ and $\{ω_k\}_{k=1}^n$ be two orthonormal bases for a finite dimensional p-adic Hilbert space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*} \displaystyle \max_{j \in M, k \in N}|\langle τ_j, ω_k \rangle|<1, \end{align*} where $o(M)$ is the cardinality of $M$. Then for all $x \in \mathcal{X}$, we show that \begin{align} (1) \quad \quad \quad \quad \|x\|\leq \left(\frac{1}{1-\displaystyle \max_{j \in M, k \in N}|\langle τ_j, ω_k \rangle|}\right)\max\left\{\displaystyle \max_{j \in M^c}|\langle x, τ_j\rangle |, \displaystyle \max_{k \in N^c}|\langle x, ω_k\rangle |\right\}. \end{align}
We call Inequality (1) as \textbf{p-adic Ghobber-Jaming Uncertainty Principle}. Inequality (1) is the p-adic version of uncertainty principle obtained by Ghobber and Jaming \textit{[Linear Algebra Appl., 2011]}. We also derive analogues of Inequality (1) for non-Archimedean Banach spaces.
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
SparseLoc: Sparse Open-Set Landmark-based Global Localization for Autonomous Navigation
Authors:
Pranjal Paul,
Vineeth Bhat,
Tejas Salian,
Mohammad Omama,
Krishna Murthy Jatavallabhula,
Naveen Arulselvan,
K. Madhava Krishna
Abstract:
Global localization is a critical problem in autonomous navigation, enabling precise positioning without reliance on GPS. Modern global localization techniques often depend on dense LiDAR maps, which, while precise, require extensive storage and computational resources. Recent approaches have explored alternative methods, such as sparse maps and learned features, but they suffer from poor robustne…
▽ More
Global localization is a critical problem in autonomous navigation, enabling precise positioning without reliance on GPS. Modern global localization techniques often depend on dense LiDAR maps, which, while precise, require extensive storage and computational resources. Recent approaches have explored alternative methods, such as sparse maps and learned features, but they suffer from poor robustness and generalization. We propose SparseLoc, a global localization framework that leverages vision-language foundation models to generate sparse, semantic-topometric maps in a zero-shot manner. It combines this map representation with a Monte Carlo localization scheme enhanced by a novel late optimization strategy, ensuring improved pose estimation. By constructing compact yet highly discriminative maps and refining localization through a carefully designed optimization schedule, SparseLoc overcomes the limitations of existing techniques, offering a more efficient and robust solution for global localization. Our system achieves over a 5X improvement in localization accuracy compared to existing sparse mapping techniques. Despite utilizing only 1/500th of the points of dense mapping methods, it achieves comparable performance, maintaining an average global localization error below 5m and 2 degrees on KITTI sequences.
△ Less
Submitted 28 July, 2025; v1 submitted 30 March, 2025;
originally announced March 2025.
-
DG16M: A Large-Scale Dataset for Dual-Arm Grasping with Force-Optimized Grasps
Authors:
Md Faizal Karim,
Mohammed Saad Hashmi,
Shreya Bollimuntha,
Mahesh Reddy Tapeti,
Gaurav Singh,
Nagamanikandan Govindan,
K Madhava Krishna
Abstract:
Dual-arm robotic grasping is crucial for handling large objects that require stable and coordinated manipulation. While single-arm grasping has been extensively studied, datasets tailored for dual-arm settings remain scarce. We introduce a large-scale dataset of 16 million dual-arm grasps, evaluated under improved force-closure constraints. Additionally, we develop a benchmark dataset containing 3…
▽ More
Dual-arm robotic grasping is crucial for handling large objects that require stable and coordinated manipulation. While single-arm grasping has been extensively studied, datasets tailored for dual-arm settings remain scarce. We introduce a large-scale dataset of 16 million dual-arm grasps, evaluated under improved force-closure constraints. Additionally, we develop a benchmark dataset containing 300 objects with approximately 30,000 grasps, evaluated in a physics simulation environment, providing a better grasp quality assessment for dual-arm grasp synthesis methods. Finally, we demonstrate the effectiveness of our dataset by training a Dual-Arm Grasp Classifier network that outperforms the state-of-the-art methods by 15\%, achieving higher grasp success rates and improved generalization across objects.
△ Less
Submitted 27 July, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
Swarm-Gen: Fast Generation of Diverse Feasible Swarm Behaviors
Authors:
Simon Idoko,
B. Bhanu Teja,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Coordination behavior in robot swarms is inherently multi-modal in nature. That is, there are numerous ways in which a swarm of robots can avoid inter-agent collisions and reach their respective goals. However, the problem of generating diverse and feasible swarm behaviors in a scalable manner remains largely unaddressed. In this paper, we fill this gap by combining generative models with a safety…
▽ More
Coordination behavior in robot swarms is inherently multi-modal in nature. That is, there are numerous ways in which a swarm of robots can avoid inter-agent collisions and reach their respective goals. However, the problem of generating diverse and feasible swarm behaviors in a scalable manner remains largely unaddressed. In this paper, we fill this gap by combining generative models with a safety-filter (SF). Specifically, we sample diverse trajectories from a learned generative model which is subsequently projected onto the feasible set using the SF. We experiment with two choices for generative models, namely: Conditional Variational Autoencoder (CVAE) and Vector-Quantized Variational Autoencoder (VQ-VAE). We highlight the trade-offs these two models provide in terms of computation time and trajectory diversity. We develop a custom solver for our SF and equip it with a neural network that predicts context-specific initialization. Thecinitialization network is trained in a self-supervised manner, taking advantage of the differentiability of the SF solver. We provide two sets of empirical results. First, we demonstrate that we can generate a large set of multi-modal, feasible trajectories, simulating diverse swarm behaviors, within a few tens of milliseconds. Second, we show that our initialization network provides faster convergence of our SF solver vis-a-vis other alternative heuristics.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
3-Heisenberg-Robertson-Schrodinger Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathcal{X}$ be a 3-product space. Let $A: \mathcal{D}(A)\subseteq \mathcal{X}\to \mathcal{X}$, $B: \mathcal{D}(B)\subseteq \mathcal{X}\to \mathcal{X}$ and $C: \mathcal{D}(C)\subseteq \mathcal{X}\to \mathcal{X}$ be possibly unbounded 3-self-adjoint operators. Then for all \begin{align*}
x \in \mathcal{D}(ABC)\cap\mathcal{D}(ACB) \cap \mathcal{D}(BAC)\cap\mathcal{D}(BCA) \cap \mathcal{D}(CAB…
▽ More
Let $\mathcal{X}$ be a 3-product space. Let $A: \mathcal{D}(A)\subseteq \mathcal{X}\to \mathcal{X}$, $B: \mathcal{D}(B)\subseteq \mathcal{X}\to \mathcal{X}$ and $C: \mathcal{D}(C)\subseteq \mathcal{X}\to \mathcal{X}$ be possibly unbounded 3-self-adjoint operators. Then for all \begin{align*}
x \in \mathcal{D}(ABC)\cap\mathcal{D}(ACB) \cap \mathcal{D}(BAC)\cap\mathcal{D}(BCA) \cap \mathcal{D}(CAB)\cap\mathcal{D}(CBA) \end{align*} with $\langle x, x, x \rangle =1$, we show that \begin{align*} (1)\quad \quad Δ_x(3, A) Δ_x(3, B) Δ_x(3, C)\geq |\langle (ABC-a BC-b AC-c AB)x, x, x\rangle +2abc|, \end{align*} where \begin{align*}
Δ_x(3, A):= \|Ax-\langle Ax, x, x \rangle x \|, \quad a:= \langle Ax, x, x \rangle, \quad b := \langle Bx, x, x \rangle, \quad c := \langle Cx, x, x \rangle. \end{align*} We call Inequality (1) as 3-Heisenberg-Robertson-Schrodinger uncertainty principle. Classical Heisenberg-Robertson-Schrodinger uncertainty principle (by Schrodinger in 1930) considers two operators whereas Inequality (1) considers three operators.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
MetricGold: Leveraging Text-To-Image Latent Diffusion Models for Metric Depth Estimation
Authors:
Ansh Shah,
K Madhava Krishna
Abstract:
Recovering metric depth from a single image remains a fundamental challenge in computer vision, requiring both scene understanding and accurate scaling. While deep learning has advanced monocular depth estimation, current models often struggle with unfamiliar scenes and layouts, particularly in zero-shot scenarios and when predicting scale-ergodic metric depth. We present MetricGold, a novel appro…
▽ More
Recovering metric depth from a single image remains a fundamental challenge in computer vision, requiring both scene understanding and accurate scaling. While deep learning has advanced monocular depth estimation, current models often struggle with unfamiliar scenes and layouts, particularly in zero-shot scenarios and when predicting scale-ergodic metric depth. We present MetricGold, a novel approach that harnesses generative diffusion model's rich priors to improve metric depth estimation. Building upon recent advances in MariGold, DDVM and Depth Anything V2 respectively, our method combines latent diffusion, log-scaled metric depth representation, and synthetic data training. MetricGold achieves efficient training on a single RTX 3090 within two days using photo-realistic synthetic data from HyperSIM, VirtualKitti, and TartanAir. Our experiments demonstrate robust generalization across diverse datasets, producing sharper and higher quality metric depth estimates compared to existing approaches.
△ Less
Submitted 5 December, 2024; v1 submitted 16 November, 2024;
originally announced November 2024.
-
Imagine-2-Drive: Leveraging High-Fidelity World Models via Multi-Modal Diffusion Policies
Authors:
Anant Garg,
K Madhava Krishna
Abstract:
World Model-based Reinforcement Learning (WMRL) enables sample efficient policy learning by reducing the need for online interactions which can potentially be costly and unsafe, especially for autonomous driving. However, existing world models often suffer from low prediction fidelity and compounding one-step errors, leading to policy degradation over long horizons. Additionally, traditional RL po…
▽ More
World Model-based Reinforcement Learning (WMRL) enables sample efficient policy learning by reducing the need for online interactions which can potentially be costly and unsafe, especially for autonomous driving. However, existing world models often suffer from low prediction fidelity and compounding one-step errors, leading to policy degradation over long horizons. Additionally, traditional RL policies, often deterministic or single Gaussian-based, fail to capture the multi-modal nature of decision-making in complex driving scenarios. To address these challenges, we propose Imagine-2-Drive, a novel WMRL framework that integrates a high-fidelity world model with a multi-modal diffusion-based policy actor. It consists of two key components: DiffDreamer, a diffusion-based world model that generates future observations simultaneously, mitigating error accumulation, and DPA (Diffusion Policy Actor), a diffusion-based policy that models diverse and multi-modal trajectory distributions. By training DPA within DiffDreamer, our method enables robust policy learning with minimal online interactions. We evaluate our method in CARLA using standard driving benchmarks and demonstrate that it outperforms prior world model baselines, improving Route Completion and Success Rate by 15% and 20% respectively.
△ Less
Submitted 9 March, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Product Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Motivated from Deutsch entropic uncertainty principle and several product uncertainty principles, we derive an uncertainty principle for the product of entropies using functions.
Motivated from Deutsch entropic uncertainty principle and several product uncertainty principles, we derive an uncertainty principle for the product of entropies using functions.
△ Less
Submitted 17 October, 2024;
originally announced November 2024.
-
DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control
Authors:
Md Faizal Karim,
Shreya Bollimuntha,
Mohammed Saad Hashmi,
Autrio Das,
Gaurav Singh,
Srinath Sridhar,
Arun Kumar Singh,
Nagamanikandan Govindan,
K Madhava Krishna
Abstract:
Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordina…
▽ More
Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results.
△ Less
Submitted 25 October, 2024;
originally announced October 2024.
-
Imagine2Servo: Intelligent Visual Servoing with Diffusion-Driven Goal Generation for Robotic Tasks
Authors:
Pranjali Pathre,
Gunjan Gupta,
M. Nomaan Qureshi,
Mandyam Brunda,
Samarth Brahmbhatt,
K. Madhava Krishna
Abstract:
Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedba…
▽ More
Visual servoing, the method of controlling robot motion through feedback from visual sensors, has seen significant advancements with the integration of optical flow-based methods. However, its application remains limited by inherent challenges, such as the necessity for a target image at test time, the requirement of substantial overlap between initial and target images, and the reliance on feedback from a single camera. This paper introduces Imagine2Servo, an innovative approach leveraging diffusion-based image editing techniques to enhance visual servoing algorithms by generating intermediate goal images. This methodology allows for the extension of visual servoing applications beyond traditional constraints, enabling tasks like long-range navigation and manipulation without predefined goal images. We propose a pipeline that synthesizes subgoal images grounded in the task at hand, facilitating servoing in scenarios with minimal initial and target image overlap and integrating multi-camera feedback for comprehensive task execution. Our contributions demonstrate a novel application of image generation to robotic control, significantly broadening the capabilities of visual servoing systems. Real-world experiments validate the effectiveness and versatility of the Imagine2Servo framework in accomplishing a variety of tasks, marking a notable advancement in the field of visual servoing.
△ Less
Submitted 7 December, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
CrowdSurfer: Sampling Optimization Augmented with Vector-Quantized Variational AutoEncoder for Dense Crowd Navigation
Authors:
Naman Kumar,
Antareep Singha,
Laksh Nanwani,
Dhruv Potdar,
Tarun R,
Fatemeh Rastgar,
Simon Idoko,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to ge…
▽ More
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.
△ Less
Submitted 7 March, 2025; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Authors:
Aneesh Chavan,
Vaibhav Agrawal,
Vineeth Bhat,
Sarthak Chittawar,
Siddharth Srivastava,
Chetan Arora,
K Madhava Krishna
Abstract:
Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance…
▽ More
Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.
△ Less
Submitted 1 May, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Noncommutative Donoho-Elad-Gribonval-Nielsen-Fuchs Sparsity Theorem
Authors:
K. Mahesh Krishna
Abstract:
Breakthrough Sparsity Theorem, derived independently by Donoho and Elad \textit{[Proc. Natl. Acad. Sci. USA, 2003]}, Gribonval and Nielsen \textit{[IEEE Trans. Inform. Theory, 2003]} and Fuchs \textit{[IEEE Trans. Inform. Theory, 2004]} says that unique sparse solution to NP-Hard $\ell_0$-minimization problem can be obtained using unique solution of P-Type $\ell_1$-minimization problem. In this pa…
▽ More
Breakthrough Sparsity Theorem, derived independently by Donoho and Elad \textit{[Proc. Natl. Acad. Sci. USA, 2003]}, Gribonval and Nielsen \textit{[IEEE Trans. Inform. Theory, 2003]} and Fuchs \textit{[IEEE Trans. Inform. Theory, 2004]} says that unique sparse solution to NP-Hard $\ell_0$-minimization problem can be obtained using unique solution of P-Type $\ell_1$-minimization problem. In this paper, we derive noncommutative version of their result using frames for Hilbert C*-modules.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Modular Deutsch Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Khosravi, Drnovšek and Moslehian [\textit{Filomat, 2012}] derived Buzano inequality for Hilbert C*-modules. Using this inequality we derive Deutsch entropic uncertainty principle for Hilbert C*-modules over commutative unital C*-algebras.
Khosravi, Drnovšek and Moslehian [\textit{Filomat, 2012}] derived Buzano inequality for Hilbert C*-modules. Using this inequality we derive Deutsch entropic uncertainty principle for Hilbert C*-modules over commutative unital C*-algebras.
△ Less
Submitted 8 August, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Continuous Krishna-Parthasarathy Entropic Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to…
▽ More
In 2002, Krishna and Parthasarathy [\textit{Sankhyā Ser. A}] derived discrete quantum version of Maassen-Uffink [\textit{Phys. Rev. Lett., 1988}] entropic uncertainty principle. In this paper, using the notion of continuous operator-valued frames, we derive an entropic uncertainty principle for arbitrary family of operators indexed by measure spaces having finite measure. We give an application to the special case of compact groups.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Open-Set 3D Semantic Instance Maps for Vision Language Navigation -- O3D-SIM
Authors:
Laksh Nanwani,
Kumaraditya Gupta,
Aditya Mathur,
Swayam Agrawal,
A. H. Abdul Hafez,
K. Madhava Krishna
Abstract:
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasi…
▽ More
Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work SI Maps [1] showed that having instance-level information and the semantic understanding of an environment helps significantly improve performance for language-guided tasks. We extend this instance-level approach to 3D while increasing the pipeline's robustness and improving quantitative and qualitative results. Our method leverages foundational models for object recognition, image segmentation, and feature extraction. We propose a representation that results in a 3D point cloud map with instance-level embeddings, which bring in the semantic understanding that natural language commands can query. Quantitatively, the work improves upon the success rate of language-guided tasks. At the same time, we qualitatively observe the ability to identify instances more clearly and leverage the foundational models and language and image-aligned embeddings to identify objects that, otherwise, a closed-set approach wouldn't be able to identify.
△ Less
Submitted 27 April, 2024;
originally announced April 2024.
-
Constrained 6-DoF Grasp Generation on Complex Shapes for Improved Dual-Arm Manipulation
Authors:
Gaurav Singh,
Sanket Kalwar,
Md Faizal Karim,
Bipasha Sen,
Nagamanikandan Govindan,
Srinath Sridhar,
K Madhava Krishna
Abstract:
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore set…
▽ More
Efficiently generating grasp poses tailored to specific regions of an object is vital for various robotic manipulation tasks, especially in a dual-arm setup. This scenario presents a significant challenge due to the complex geometries involved, requiring a deep understanding of the local geometry to generate grasps efficiently on the specified constrained regions. Existing methods only explore settings involving table-top/small objects and require augmented datasets to train, limiting their performance on complex objects. We propose CGDF: Constrained Grasp Diffusion Fields, a diffusion-based grasp generative model that generalizes to objects with arbitrary geometries, as well as generates dense grasps on the target regions. CGDF uses a part-guided diffusion approach that enables it to get high sample efficiency in constrained grasping without explicitly training on massive constraint-augmented datasets. We provide qualitative and quantitative comparisons using analytical metrics and in simulation, in both unconstrained and constrained settings to show that our method can generalize to generate stable grasps on complex objects, especially useful for dual-arm manipulation settings, while existing methods struggle to do so.
△ Less
Submitted 15 July, 2024; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Bi-level Trajectory Optimization on Uneven Terrains with Differentiable Wheel-Terrain Interaction Model
Authors:
Amith Manoharan,
Aditya Sharma,
Himani Belsare,
Kaustab Pal,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-int…
▽ More
Navigation of wheeled vehicles on uneven terrain necessitates going beyond the 2D approaches for trajectory planning. Specifically, it is essential to incorporate the full 6dof variation of vehicle pose and its associated stability cost in the planning process. To this end, most recent works aim to learn a neural network model to predict the vehicle evolution. However, such approaches are data-intensive and fraught with generalization issues. In this paper, we present a purely model-based approach that just requires the digital elevation information of the terrain. Specifically, we express the wheel-terrain interaction and 6dof pose prediction as a non-linear least squares (NLS) problem. As a result, trajectory planning can be viewed as a bi-level optimization. The inner optimization layer predicts the pose on the terrain along a given trajectory, while the outer layer deforms the trajectory itself to reduce the stability and kinematic costs of the pose. We improve the state-of-the-art in the following respects. First, we show that our NLS based pose prediction closely matches the output from a high-fidelity physics engine. This result coupled with the fact that we can query gradients of the NLS solver, makes our pose predictor, a differentiable wheel-terrain interaction model. We further leverage this differentiability to efficiently solve the proposed bi-level trajectory optimization problem. Finally, we perform extensive experiments, and comparison with a baseline to showcase the effectiveness of our approach in obtaining smooth, stable trajectories.
△ Less
Submitted 22 November, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
Unexpected Uncertainty Principle for Disc Banach Spaces
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n…
▽ More
Let $(\{f_n\}_{n=1}^\infty, \{τ_n\}_{n=1}^\infty)$ and $(\{g_n\}_{n=1}^\infty, \{ω_n\}_{n=1}^\infty)$ be unbounded continuous p-Schauder frames ($0<p<1$) for a disc Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB} (1) \quad \quad \quad \quad \|θ_f x\|_0\|θ_g x\|_0 \geq \frac{1}{\left(\displaystyle\sup_{n,m \in \mathbb{N} }|f_n(ω_m)|\right)^p\left(\displaystyle\sup_{n, m \in \mathbb{N}}|g_m(τ_n)|\right)^p}, \end{align} where \begin{align*} & θ_f: \mathcal{D}(θ_f) \ni x \mapsto θ_fx := \{f_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}), \quad θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx := \{g_n(x)\}_{n=1}^\infty\in \ell^p(\mathbb{N}). \end{align*} Inequality (1) is unexpectedly different from both bounded uncertainty principle arXiv:2308.00312v1 and unbounded uncertainty principle arXiv:2312.00366v1 for Banach spaces.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
LeGo-Drive: Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving
Authors:
Pranjal Paul,
Anant Garg,
Tushar Choudhary,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, wh…
▽ More
Existing Vision-Language models (VLMs) estimate either long-term trajectory waypoints or a set of control actions as a reactive solution for closed-loop planning based on their rich scene comprehension. However, these estimations are coarse and are subjective to their "world understanding" which may generate sub-optimal decisions due to perception errors. In this paper, we introduce LeGo-Drive, which aims to address this issue by estimating a goal location based on the given language command as an intermediate representation in an end-to-end setting. The estimated goal might fall in a non-desirable region, like on top of a car for a parking-like command, leading to inadequate planning. Hence, we propose to train the architecture in an end-to-end manner, resulting in iterative refinement of both the goal and the trajectory collectively. We validate the effectiveness of our method through comprehensive experiments conducted in diverse simulated environments. We report significant improvements in standard autonomous driving metrics, with a goal reaching Success Rate of 81%. We further showcase the versatility of LeGo-Drive across different driving scenarios and linguistic inputs, underscoring its potential for practical deployment in autonomous vehicles and intelligent transportation systems.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
△ Less
Submitted 8 August, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Nonlinear Maccone-Pati Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
We show that one of the two important uncertainty principles derived by Maccone and Pati \textit{[Phys. Rev. Lett., 2014]} can be derived for arbitrary maps defined on subsets of $\mathcal{L}^p$ spaces for $1< p<\infty$. Our main tool is the Clarkson inequalities. We also derive a nonlinear uncertainty principle for weak parallelogram spaces and Type-p Banach spaces.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\…
▽ More
Let $\mathcal{X}$ be a Banach space. Let $\{τ_j\}_{j=1}^n, \{ω_k\}_{k=1}^m\subseteq \mathcal{X}$ and $\{f_j\}_{j=1}^n$, $\{g_k\}_{k=1}^m\subseteq \mathcal{X}^*$ satisfy $ |f_j(τ_j)|\geq 1$ for all $ 1\leq j \leq n$, $|g_k(ω_k)|\geq 1 $ for all $1\leq k \leq m$. If $x \in \mathcal{X}\setminus \{0\}$ is such that $x=θ_τθ_f x=θ_ωθ_g x$, then we show that \begin{align}\label{FKDB} (1) \quad\quad\quad\quad \|θ_fx\|_0\|θ_gx\|_0\geq \frac{\bigg[1-(\|θ_fx\|_0-1)\max\limits_{1\leq j,r \leq n,j\neq r}|f_j(τ_r)|\bigg]^+\bigg[1-(\|θ_g x\|_0-1)\max\limits_{1\leq k,s \leq m,k\neq s}|g_k(ω_s)|\bigg]^+}{\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|f_j(ω_k)|\right)\left(\displaystyle\max_{1\leq j \leq n, 1\leq k \leq m}|g_k(τ_j)|\right)}. \end{align}
We call Inequality (1) as \textbf{Functional Kuppinger-Durisi-Bölcskei Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Kuppinger, Durisi and Bölcskei \textit{[IEEE Trans. Inform. Theory (2012)]} (which improved the Donoho-Stark-Elad-Bruckstein uncertainty principle \textit{[SIAM J. Appl. Math. (1989), IEEE Trans. Inform. Theory (2002)]}). We also derive functional form of the uncertainity principle obtained by Studer, Kuppinger, Pope and Bölcskei \textit{[EEE Trans. Inform. Theory (2012)]}.
△ Less
Submitted 1 January, 2024;
originally announced February 2024.
-
ATPPNet: Attention based Temporal Point cloud Prediction Network
Authors:
Kaustab Pal,
Aditya Sharma,
Avinash Sharma,
K. Madhava Krishna
Abstract:
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry d…
▽ More
Point cloud prediction is an important yet challenging task in the field of autonomous driving. The goal is to predict future point cloud sequences that maintain object structures while accurately representing their temporal motion. These predicted point clouds help in other subsequent tasks like object trajectory estimation for collision avoidance or estimating locations with the least odometry drift. In this work, we present ATPPNet, a novel architecture that predicts future point cloud sequences given a sequence of previous time step point clouds obtained with LiDAR sensor. ATPPNet leverages Conv-LSTM along with channel-wise and spatial attention dually complemented by a 3D-CNN branch for extracting an enhanced spatio-temporal context to recover high quality fidel predictions of future point clouds. We conduct extensive experiments on publicly available datasets and report impressive performance outperforming the existing methods. We also conduct a thorough ablative study of the proposed architecture and provide an application study that highlights the potential of our model for tasks like odometry estimation.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principles
Authors:
K. Mahesh Krishna
Abstract:
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f…
▽ More
Let $(Ω, μ)$, $(Δ, ν)$ be measure spaces and $p=1$ or $p=\infty$. Let $(\{f_α\}_{α\in Ω}, \{τ_α\}_{α\in Ω})$ and $(\{g_β\}_{β\in Δ}, \{ω_β\}_{β\in Δ})$ be unbounded continuous p-Schauder frames for a Banach space $\mathcal{X}$. Then for every $x \in ( \mathcal{D}(θ_f) \cap\mathcal{D}(θ_g))\setminus\{0\}$, we show that \begin{align}\label{UB}
(1) \quad \quad \quad \quad μ(\operatorname{supp}(θ_f x))ν(\operatorname{supp}(θ_g x)) \geq \frac{1}{\left(\displaystyle\sup_{α\in Ω, β\in Δ}|f_α(ω_β)|\right)\left(\displaystyle\sup_{α\in Ω, β\in Δ}|g_β(τ_α)|\right)}, \end{align} where \begin{align*} &θ_f:\mathcal{D}(θ_f) \ni x \mapsto θ_fx \in \mathcal{L}^p(Ω, μ); \quad θ_fx: Ω\ni α\mapsto (θ_fx) (α):= f_α(x) \in \mathbb{K},\\ &θ_g: \mathcal{D}(θ_g) \ni x \mapsto θ_gx \in \mathcal{L}^p(Δ, ν); \quad θ_gx: Δ\ni β\mapsto (θ_gx) (β):= g_β(x) \in \mathbb{K}. \end{align*} We call Inequality (1) as \textbf{Unbounded Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Along with recent \textbf{Functional Continuous Uncertainty Principle} [arXiv:2308.00312], Inequality (1) also improves Ricaud-Torrésani uncertainty principle [IEEE Trans. Inform. Theory, 2013]. In particular, it improves Elad-Bruckstein uncertainty principle [IEEE Trans. Inform. Theory, 2002] and Donoho-Stark uncertainty principle [SIAM J. Appl. Math., 1989].
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing
Authors:
Dhruv Patel,
Shivani Chepuri,
Sarvesh Thakur,
K. Harikumar,
Ravi Kiran S.,
K. Madhava Krishna
Abstract:
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and co…
▽ More
Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by developing computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
NeuroSMPC: A Neural Network guided Sampling Based MPC for On-Road Autonomous Driving
Authors:
Kaustab Pal,
Aditya Sharma,
Mohd Omama,
Parth N. Shah,
K. Madhava Krishna
Abstract:
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon tha…
▽ More
In this paper we show an effective means of integrating data driven frameworks to sampling based optimal control to vastly reduce the compute time for easy adoption and adaptation to real time applications such as on-road autonomous driving in the presence of dynamic actors. Presented with training examples, a spatio-temporal CNN learns to predict the optimal mean control over a finite horizon that precludes further resampling, an iterative process that makes sampling based optimal control formulations difficult to adopt in real time settings. Generating control samples around the network-predicted optimal mean retains the advantage of sample diversity while enabling real time rollout of trajectories that avoids multiple dynamic obstacles in an on-road navigation setting. Further the 3D CNN architecture implicitly learns the future trajectories of the dynamic agents in the scene resulting in successful collision free navigation despite no explicit future trajectory prediction. We show performance gain over multiple baselines in a number of on-road scenes through closed loop simulations in CARLA. We also showcase the real world applicability of our system by running it on our custom Autonomous Driving Platform (AutoDP).
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Hilbert Space Embedding-based Trajectory Optimization for Multi-Modal Uncertain Obstacle Trajectory Prediction
Authors:
Basant Sharma,
Aditya Sharma,
K. Madhava Krishna,
Arun Kumar Singh
Abstract:
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. H…
▽ More
Safe autonomous driving critically depends on how well the ego-vehicle can predict the trajectories of neighboring vehicles. To this end, several trajectory prediction algorithms have been presented in the existing literature. Many of these approaches output a multi-modal distribution of obstacle trajectories instead of a single deterministic prediction to account for the underlying uncertainty. However, existing planners cannot handle the multi-modality based on just sample-level information of the predictions. With this motivation, this paper proposes a trajectory optimizer that can leverage the distributional aspects of the prediction in a computationally tractable and sample-efficient manner. Our optimizer can work with arbitrarily complex distributions and thus can be used with output distribution represented as a deep neural network. The core of our approach is built on embedding distribution in Reproducing Kernel Hilbert Space (RKHS), which we leverage in two ways. First, we propose an RKHS embedding approach to select probable samples from the obstacle trajectory distribution. Second, we rephrase chance-constrained optimization as distribution matching in RKHS and propose a novel sampling-based optimizer for its solution. We validate our approach with hand-crafted and neural network-based predictors trained on real-world datasets and show improvement over the existing stochastic optimization approaches in safety metrics.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions
Authors:
Sanket Kalwar,
Mihir Ungarala,
Shruti Jain,
Aaron Monis,
Krishna Reddy Konda,
Sourav Garg,
K Madhava Krishna
Abstract:
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in founda…
▽ More
Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.
△ Less
Submitted 26 March, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Authors:
Tushar Choudhary,
Vikrant Dewangan,
Shivam Chandhok,
Shubham Priyadarshan,
Anushka Jain,
Arun K. Singh,
Siddharth Srivastava,
Krishna Murthy Jatavallabhula,
K. Madhava Krishna
Abstract:
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representation…
▽ More
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
△ Less
Submitted 14 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Functional Donoho-Stark Approximate Support Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\la…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. If $ x \in \mathcal{X}\setminus\{0\}$ is such that $θ_fx$ is $\varepsilon$-supported on $M\subseteq \{1,\dots, n\}$ w.r.t. p-norm and $θ_gx$ is $δ$-supported on $N\subseteq \{1,\dots, n\}$ w.r.t. p-norm, then we show that \begin{align}\label{ME} (1) \quad \quad \quad \quad &o(M)^\frac{1}{p}o(N)^\frac{1}{q}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|f_j(ω_k) |}\max \{1-\varepsilon-δ, 0\},\\ (2) \quad \quad \quad \quad&o(M)^\frac{1}{q}o(N)^\frac{1}{p}\geq \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}\max \{1-\varepsilon-δ, 0\},\label{ME2} \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^n \in \ell^p([n]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequalities (1) and (2) as \textbf{Functional Donoho-Stark Approximate Support Uncertainty Principle}. Inequalities (1) and (2) improve the finite approximate support uncertainty principle obtained by Donoho and Stark \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
HyP-NeRF: Learning Improved NeRF Priors using a HyperNetwork
Authors:
Bipasha Sen,
Gaurav Singh,
Aditya Agarwal,
Rohith Agaram,
K Madhava Krishna,
Srinath Sridhar
Abstract:
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to…
▽ More
Neural Radiance Fields (NeRF) have become an increasingly popular representation to capture high-quality appearance and shape of scenes and objects. However, learning generalizable NeRF priors over categories of scenes or objects has been challenging due to the high dimensionality of network weight space. To address the limitations of existing work on generalization, multi-view consistency and to improve quality, we propose HyP-NeRF, a latent conditioning method for learning generalizable category-level NeRF priors using hypernetworks. Rather than using hypernetworks to estimate only the weights of a NeRF, we estimate both the weights and the multi-resolution hash encodings resulting in significant quality gains. To improve quality even further, we incorporate a denoise and finetune strategy that denoises images rendered from NeRFs estimated by the hypernetwork and finetunes it while retaining multiview consistency. These improvements enable us to use HyP-NeRF as a generalizable prior for multiple downstream tasks including NeRF reconstruction from single-view or cluttered scenes and text-to-NeRF. We provide qualitative comparisons and evaluate HyP-NeRF on three tasks: generalization, compression, and retrieval, demonstrating our state-of-the-art results.
△ Less
Submitted 23 December, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images
Authors:
Vikrant Dewangan,
Basant Sharma,
Tushar Choudhary,
Sarthak Sharma,
Aakash Aanegola,
Arun K. Singh,
K. Madhava Krishna
Abstract:
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous drivin…
▽ More
Autonomous driving requires accurate reasoning of the location of objects from raw sensor data. Recent end-to-end learning methods go from raw sensor data to a trajectory output via Bird's Eye View(BEV) segmentation as an interpretable intermediate representation. Motion planning over cost maps generated via Birds Eye View (BEV) segmentation has emerged as a prominent approach in autonomous driving. However, the current approaches have two critical gaps. First, the optimization process is simplistic and involves just evaluating a fixed set of trajectories over the cost map. The trajectory samples are not adapted based on their associated cost values. Second, the existing cost maps do not account for the uncertainty in the cost maps that can arise due to noise in RGB images, and BEV annotations. As a result, these approaches can struggle in challenging scenarios where there is abrupt cut-in, stopping, overtaking, merging, etc from the neighboring vehicles.
In this paper, we propose UAP-BEV: A novel approach that models the noise in Spatio-Temporal BEV predictions to create an uncertainty-aware occupancy grid map. Using queries of the distance to the closest occupied cell, we obtain a sample estimate of the collision probability of the ego-vehicle. Subsequently, our approach uses gradient-free sampling-based optimization to compute low-cost trajectories over the cost map. Importantly, the sampling distribution is adapted based on the optimal cost values of the sampled trajectories. By explicitly modeling probabilistic collision avoidance in the BEV space, our approach is able to outperform the cost-map-based baselines in collision avoidance, route completion, time to completion, and smoothness. To further validate our method, we also show results on the real-world dataset NuScenes, where we report improvements in collision avoidance and smoothness.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
Functional Ghobber-Jaming Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^n, \{ω_k\}_{k=1}^n)$ be two p-orthonormal bases for a finite dimensional Banach space $\mathcal{X}$. Let $M,N\subseteq \{1, \dots, n\}$ be such that \begin{align*}
o(M)^\frac{1}{q}o(N)^\frac{1}{p}< \frac{1}{\displaystyle \max_{1\leq j,k\leq n}|g_k(τ_j) |}, \end{align*} where $q$ is the conjugate index of $p$. Then for all $x \in \mathcal{X}$, we show that \begin{align}\label{FGJU} (1) \quad \quad \quad \quad \|x\|\leq \left(1+\frac{1}{1-o(M)^\frac{1}{q}o(N)^\frac{1}{p}\displaystyle\max_{1\leq j,k\leq n}|g_k(τ_j)|}\right)\left[\left(\sum_{j\in M^c}|f_j(x)|^p\right)^\frac{1}{p}+\left(\sum_{k\in N^c}|g_k(x) |^p\right)^\frac{1}{p}\right]. \end{align}
We call Inequality (1) as \textbf{Functional Ghobber-Jaming Uncertainty Principle}. Inequality (1) improves the uncertainty principle obtained by Ghobber and Jaming \textit{[Linear Algebra Appl., 2011]}.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Instance-Level Semantic Maps for Vision Language Navigation
Authors:
Laksh Nanwani,
Anmol Agarwal,
Kanishk Jain,
Raghav Prabhakar,
Aaron Monis,
Aditya Mathur,
Krishna Murthy,
Abdul Hafez,
Vineet Gandhi,
K. Madhava Krishna
Abstract:
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this…
▽ More
Humans have a natural ability to perform semantic associations with the surrounding objects in the environment. This allows them to create a mental map of the environment, allowing them to navigate on-demand when given linguistic instructions. A natural goal in Vision Language Navigation (VLN) research is to impart autonomous agents with similar capabilities. Recent works take a step towards this goal by creating a semantic spatial map representation of the environment without any labeled data. However, their representations are limited for practical applicability as they do not distinguish between different instances of the same object. In this work, we address this limitation by integrating instance-level information into spatial map representation using a community detection algorithm and utilizing word ontology learned by large language models (LLMs) to perform open-set semantic associations in the mapping representation. The resulting map representation improves the navigation performance by two-fold (233%) on realistic language commands with instance-specific descriptions compared to the baseline. We validate the practicality and effectiveness of our approach through extensive qualitative and quantitative experiments.
△ Less
Submitted 1 July, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle
Authors:
K. Mahesh Krishna
Abstract:
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\f…
▽ More
Let $(\{f_j\}_{j=1}^n, \{τ_j\}_{j=1}^n)$ and $(\{g_k\}_{k=1}^m, \{ω_k\}_{k=1}^m)$ be p-Schauder frames for a finite dimensional Banach space $\mathcal{X}$. Then for every $x \in \mathcal{X}\setminus\{0\}$, we show that \begin{align} (1) \quad \|θ_f x\|_0^\frac{1}{p}\|θ_g x\|_0^\frac{1}{q} \geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|f_j(ω_k)|}\quad \text{and} \quad \|θ_g x\|_0^\frac{1}{p}\|θ_f x\|_0^\frac{1}{q}\geq \frac{1}{\displaystyle\max_{1\leq j\leq n, 1\leq k\leq m}|g_k(τ_j)|}. \end{align} where \begin{align*} θ_f: \mathcal{X} \ni x \mapsto (f_j(x) )_{j=1}^n \in \ell^p([n]); \quad θ_g: \mathcal{X} \ni x \mapsto (g_k(x) )_{k=1}^m \in \ell^p([m]) \end{align*} and $q$ is the conjugate index of $p$. We call Inequality (1) as \textbf{Functional Donoho-Stark-Elad-Bruckstein-Ricaud-Torrésani Uncertainty Principle}. Inequality (1) improves Ricaud-Torrésani uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2013]}. In particular, it improves Elad-Bruckstein uncertainty principle \textit{[IEEE Trans. Inform. Theory, 2002]} and Donoho-Stark uncertainty principle \textit{[SIAM J. Appl. Math., 1989]}.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
FinderNet: A Data Augmentation Free Canonicalization aided Loop Detection and Closure technique for Point clouds in 6-DOF separation
Authors:
Sudarshan S Harithas,
Gurkirat Singh,
Aneesh Chavan,
Sarthak Sharma,
Suraj Patni,
Chetan Arora,
K. Madhava Krishna
Abstract:
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads…
▽ More
We focus on the problem of LiDAR point cloud based loop detection (or Finding) and closure (LDC) in a multi-agent setting. State-of-the-art (SOTA) techniques directly generate learned embeddings of a given point cloud, require large data transfers, and are not robust to wide variations in 6 Degrees-of-Freedom (DOF) viewpoint. Moreover, absence of strong priors in an unstructured point cloud leads to highly inaccurate LDC. In this original approach, we propose independent roll and pitch canonicalization of the point clouds using a common dominant ground plane. Discretization of the canonicalized point cloud along the axis perpendicular to the ground plane leads to an image similar to Digital Elevation Maps (DEMs), which exposes strong spatial priors in the scene. Our experiments show that LDC based on learnt embeddings of such DEMs is not only data efficient but also significantly more robust, and generalizable than the current SOTA. We report significant performance gain in terms of Average Precision for loop detection and absolute translation/rotation error for relative pose estimation (or loop closure) on Kitti, GPR and Oxford Robot Car over multiple SOTA LDC methods. Our encoder technique allows to compress the original point cloud by over 830 times. To further test the robustness of our technique we create and opensource a custom dataset called Lidar-UrbanFly Dataset (LUF) which consists of point clouds obtained from a LiDAR mounted on a quadrotor.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
MVRackLay: Monocular Multi-View Layout Estimation for Warehouse Racks and Shelves
Authors:
Pranjali Pathre,
Anurag Sahu,
Ashwin Rao,
Avinash Prabhu,
Meher Shashwat Nigam,
Tanvi Karandikar,
Harit Pandya,
K. Madhava Krishna
Abstract:
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented…
▽ More
In this paper, we propose and showcase, for the first time, monocular multi-view layout estimation for warehouse racks and shelves. Unlike typical layout estimation methods, MVRackLay estimates multi-layered layouts, wherein each layer corresponds to the layout of a shelf within a rack. Given a sequence of images of a warehouse scene, a dual-headed Convolutional-LSTM architecture outputs segmented racks, the front and the top view layout of each shelf within a rack. With minimal effort, such an output is transformed into a 3D rendering of all racks, shelves and objects on the shelves, giving an accurate 3D depiction of the entire warehouse scene in terms of racks, shelves and the number of objects on each shelf. MVRackLay generalizes to a diverse set of warehouse scenes with varying number of objects on each shelf, number of shelves and in the presence of other such racks in the background. Further, MVRackLay shows superior performance vis-a-vis its single view counterpart, RackLay, in layout accuracy, quantized in terms of the mean IoU and mAP metrics. We also showcase a multi-view stitching of the 3D layouts resulting in a representation of the warehouse scene with respect to a global reference frame akin to a rendering of the scene from a SLAM pipeline. To the best of our knowledge, this is the first such work to portray a 3D rendering of a warehouse scene in terms of its semantic components - Racks, Shelves and Objects - all from a single monocular camera.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
Non-Archimedean Welch Bounds and Non-Archimedean Zauner Conjecture
Authors:
K. Mahesh Krishna
Abstract:
Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and…
▽ More
Let $\mathbb{K}$ be a non-Archimedean (complete) valued field satisfying \begin{align*} \left|\sum_{j=1}^{n}λ_j^2\right|=\max_{1\leq j \leq n}|λ_j|^2, \quad \forall λ_j \in \mathbb{K}, 1\leq j \leq n, \forall n \in \mathbb{N}. \end{align*} For $d\in \mathbb{N}$, let $\mathbb{K}^d$ be the standard $d$-dimensional non-Archimedean Hilbert space. Let $m \in \mathbb{N}$ and $\text{Sym}^m(\mathbb{K}^d)$ be the non-Archimedean Hilbert space of symmetric m-tensors. We prove the following result. If $\{τ_j\}_{j=1}^n$ is a collection in $\mathbb{K}^d$ satisfying $\langle τ_j, τ_j\rangle =1$ for all $1\leq j \leq n$ and the operator $\text{Sym}^m(\mathbb{K}^d)\ni x \mapsto \sum_{j=1}^n\langle x, τ_j^{\otimes m}\rangle τ_j^{\otimes m} \in \text{Sym}^m(\mathbb{K}^d)$ is diagonalizable, then \begin{align} (1) \quad \quad \quad \max_{1\leq j,k \leq n, j \neq k}\{|n|, |\langle τ_j, τ_k\rangle|^{2m} \}\geq \frac{|n|^2}{\left|{d+m-1 \choose m}\right| }. \end{align} We call Inequality (1) as the non-Archimedean version of Welch bounds obtained by Welch [\textit{IEEE Transactions on Information Theory, 1974}]. We formulate non-Archimedean Zauner conjecture.
△ Less
Submitted 28 August, 2022;
originally announced October 2022.
-
GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions
Authors:
Sanket Kalwar,
Dhruv Patel,
Aakash Aanegola,
Krishna Reddy Konda,
Sourav Garg,
K Madhava Krishna
Abstract:
Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition ima…
▽ More
Detecting objects under adverse weather and lighting conditions is crucial for the safe and continuous operation of an autonomous vehicle, and remains an unsolved problem. We present a Gated Differentiable Image Processing (GDIP) block, a domain-agnostic network architecture, which can be plugged into existing object detection networks (e.g., Yolo) and trained end-to-end with adverse condition images such as those captured under fog and low lighting. Our proposed GDIP block learns to enhance images directly through the downstream object detection loss. This is achieved by learning parameters of multiple image pre-processing (IP) techniques that operate concurrently, with their outputs combined using weights learned through a novel gating mechanism. We further improve GDIP through a multi-stage guidance procedure for progressive image enhancement. Finally, trading off accuracy for speed, we propose a variant of GDIP that can be used as a regularizer for training Yolo, which eliminates the need for GDIP-based image enhancement during inference, resulting in higher throughput and plausible real-world deployment. We demonstrate significant improvement in detection performance over several state-of-the-art methods through quantitative and qualitative studies on synthetic datasets such as PascalVOC, and real-world foggy (RTTS) and low-lighting (ExDark) datasets.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
UAV-based Visual Remote Sensing for Automated Building Inspection
Authors:
Kushagra Srivastava,
Dhruv Patel,
Aditya Kumar Jha,
Mohhit Kumar Jha,
Jaskirat Singh,
Ravi Kiran Sarvadevabhatla,
Pradeep Kumar Ramancharla,
Harikumar Kandath,
K. Madhava Krishna
Abstract:
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the co…
▽ More
Unmanned Aerial Vehicle (UAV) based remote sensing system incorporated with computer vision has demonstrated potential for assisting building construction and in disaster management like damage assessment during earthquakes. The vulnerability of a building to earthquake can be assessed through inspection that takes into account the expected damage progression of the associated component and the component's contribution to structural system performance. Most of these inspections are done manually, leading to high utilization of manpower, time, and cost. This paper proposes a methodology to automate these inspections through UAV-based image data collection and a software library for post-processing that helps in estimating the seismic structural parameters. The key parameters considered here are the distances between adjacent buildings, building plan-shape, building plan area, objects on the rooftop and rooftop layout. The accuracy of the proposed methodology in estimating the above-mentioned parameters is verified through field measurements taken using a distance measuring sensor and also from the data obtained through Google Earth. Additional details and code can be accessed from https://uvrsabi.github.io/ .
△ Less
Submitted 27 September, 2022;
originally announced September 2022.
-
Ground then Navigate: Language-guided Navigation in Dynamic Scenes
Authors:
Kanishk Jain,
Varun Chhangani,
Amogh Tiwari,
K. Madhava Krishna,
Vineet Gandhi
Abstract:
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, w…
▽ More
We investigate the Vision-and-Language Navigation (VLN) problem in the context of autonomous driving in outdoor settings. We solve the problem by explicitly grounding the navigable regions corresponding to the textual command. At each timestamp, the model predicts a segmentation mask corresponding to the intermediate or the final navigable region. Our work contrasts with existing efforts in VLN, which pose this task as a node selection problem, given a discrete connected graph corresponding to the environment. We do not assume the availability of such a discretised map. Our work moves towards continuity in action space, provides interpretability through visual feedback and allows VLN on commands requiring finer manoeuvres like "park between the two cars". Furthermore, we propose a novel meta-dataset CARLA-NAV to allow efficient training and validation. The dataset comprises pre-recorded training sequences and a live environment for validation and testing. We provide extensive qualitative and quantitive empirical results to validate the efficacy of the proposed approach.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Real-Time Heuristic Framework for Safe Landing of UAVs in Dynamic Scenarios
Authors:
Jaskirat Singh,
Neel Adwani,
Harikumar Kandath,
K. Madhava Krishna
Abstract:
The world we live in is full of technology and with each passing day the advancement and usage of UAVs increases efficiently. As a result of the many application scenarios, there are some missions where the UAVs are vulnerable to external disruptions, such as a ground station's loss of connectivity, security missions, safety concerns, and delivery-related missions. Therefore, depending on the scen…
▽ More
The world we live in is full of technology and with each passing day the advancement and usage of UAVs increases efficiently. As a result of the many application scenarios, there are some missions where the UAVs are vulnerable to external disruptions, such as a ground station's loss of connectivity, security missions, safety concerns, and delivery-related missions. Therefore, depending on the scenario, this could affect the operations and result in the safe landing of UAVs. Hence, this paper presents a heuristic approach towards safe landing of multi-rotor UAVs in the dynamic environments. The aim of this approach is to detect safe potential landing zones - PLZ, and find out the best one to land in. The PLZ is initially, detected by processing an image through the canny edge algorithm, and then the diameter-area estimation is applied for each region with minimal edges. The spots that have a higher area than the vehicle's clearance are labeled as safe PLZ. Onto the second phase of this approach, the velocities of dynamic obstacles that are moving towards the PLZs are calculated and their time to reach the zones are taken into consideration. The ETA of the UAV is calculated and during the descending of UAV, the dynamic obstacle avoidance is executed. The approach tested on the real-world environments have shown better results from existing work.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Leveraging Distributional Bias for Reactive Collision Avoidance under Uncertainty: A Kernel Embedding Approach
Authors:
Anish Gupta,
Arun Kumar Singh,
K. Madhava Krishna
Abstract:
Many commodity sensors that measure the robot and dynamic obstacle's state have non-Gaussian noise characteristics. Yet, many current approaches treat the underlying-uncertainty in motion and perception as Gaussian, primarily to ensure computational tractability. On the other hand, existing planners working with non-Gaussian uncertainty do not shed light on leveraging distributional characteristic…
▽ More
Many commodity sensors that measure the robot and dynamic obstacle's state have non-Gaussian noise characteristics. Yet, many current approaches treat the underlying-uncertainty in motion and perception as Gaussian, primarily to ensure computational tractability. On the other hand, existing planners working with non-Gaussian uncertainty do not shed light on leveraging distributional characteristics of motion and perception noise, such as bias for efficient collision avoidance.
This paper fills this gap by interpreting reactive collision avoidance as a distribution matching problem between the collision constraint violations and Dirac Delta distribution. To ensure fast reactivity in the planner, we embed each distribution in Reproducing Kernel Hilbert Space and reformulate the distribution matching as minimizing the Maximum Mean Discrepancy (MMD) between the two distributions. We show that evaluating the MMD for a given control input boils down to just matrix-matrix products. We leverage this insight to develop a simple control sampling approach for reactive collision avoidance with dynamic and uncertain obstacles.
We advance the state-of-the-art in two respects. First, we conduct an extensive empirical study to show that our planner can infer distributional bias from sample-level information. Consequently, it uses this insight to guide the robot to good homotopy. We also highlight how a Gaussian approximation of the underlying uncertainty can lose the bias estimate and guide the robot to unfavorable states with a high collision probability. Second, we show tangible comparative advantages of the proposed distribution matching approach for collision avoidance with previous non-parametric and Gaussian approximated methods of reactive collision avoidance.
△ Less
Submitted 22 September, 2022; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Flow Synthesis Based Visual Servoing Frameworks for Monocular Obstacle Avoidance Amidst High-Rises
Authors:
Harshit K. Sankhla,
M. Nomaan Qureshi,
Shankara Narayanan V.,
Vedansh Mittal,
Gunjan Gupta,
Harit Pandya,
K. Madhava Krishna
Abstract:
We propose a novel flow synthesis based visual servoing framework enabling long-range obstacle avoidance for Micro Air Vehicles (MAV) flying amongst tall skyscrapers. Recent deep learning based frameworks use optical flow to do high-precision visual servoing. In this paper, we explore the question: can we design a surrogate flow for these high-precision visual-servoing methods, which leads to obst…
▽ More
We propose a novel flow synthesis based visual servoing framework enabling long-range obstacle avoidance for Micro Air Vehicles (MAV) flying amongst tall skyscrapers. Recent deep learning based frameworks use optical flow to do high-precision visual servoing. In this paper, we explore the question: can we design a surrogate flow for these high-precision visual-servoing methods, which leads to obstacle avoidance? We revisit the concept of saliency for identifying high-rise structures in/close to the line of attack amongst other competing skyscrapers and buildings as a collision obstacle. A synthesised flow is used to displace the salient object segmentation mask. This flow is so computed that the visual servoing controller maneuvers the MAV safely around the obstacle. In this approach, we use a multi-step Cross-Entropy Method (CEM) based servo control to achieve flow convergence, resulting in obstacle avoidance. We use this novel pipeline to successfully and persistently maneuver high-rises and reach the goal in simulated and photo-realistic real-world scenes. We conduct extensive experimentation and compare our approach with optical flow and short-range depth-based obstacle avoidance methods to demonstrate the proposed framework's merit. Additional Visualisation can be found at https://sites.google.com/view/monocular-obstacle/home
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
Approaches and Challenges in Robotic Perception for Table-top Rearrangement and Planning
Authors:
Aditya Agarwal,
Bipasha Sen,
Shankara Narayanan V,
Vishal Reddy Mandadi,
Brojeshwar Bhowmick,
K Madhava Krishna
Abstract:
Table-top Rearrangement and Planning is a challenging problem that relies heavily on an excellent perception stack. The perception stack involves observing and registering the 3D scene on the table, detecting what objects are on the table, and how to manipulate them. Consequently, it greatly influences the system's task-planning and motion-planning stacks that follow. We present a comprehensive ov…
▽ More
Table-top Rearrangement and Planning is a challenging problem that relies heavily on an excellent perception stack. The perception stack involves observing and registering the 3D scene on the table, detecting what objects are on the table, and how to manipulate them. Consequently, it greatly influences the system's task-planning and motion-planning stacks that follow. We present a comprehensive overview and discuss the different challenges associated with the perception module. This work is a result of our extensive involvement in the ICRA 2022 Open Cloud Robot Table Organization Challenge, in which we stood third in the final rankings.
△ Less
Submitted 3 June, 2022; v1 submitted 9 May, 2022;
originally announced May 2022.