Search | arXiv e-print repository

Active Illumination Control in Low-Light Environments using NightHawk

Authors: Yash Turkar, Youngjin Kim, Karthik Dantu

Abstract: Subterranean environments such as culverts present significant challenges to robot vision due to dim lighting and lack of distinctive features. Although onboard illumination can help, it introduces issues such as specular reflections, overexposure, and increased power consumption. We propose NightHawk, a framework that combines active illumination with exposure control to optimize image quality in… ▽ More Subterranean environments such as culverts present significant challenges to robot vision due to dim lighting and lack of distinctive features. Although onboard illumination can help, it introduces issues such as specular reflections, overexposure, and increased power consumption. We propose NightHawk, a framework that combines active illumination with exposure control to optimize image quality in these settings. NightHawk formulates an online Bayesian optimization problem to determine the best light intensity and exposure-time for a given scene. We propose a novel feature detector-based metric to quantify image utility and use it as the cost function for the optimizer. We built NightHawk as an event-triggered recursive optimization pipeline and deployed it on a legged robot navigating a culvert beneath the Erie Canal. Results from field experiments demonstrate improvements in feature detection and matching by 47-197% enabling more reliable visual estimation in challenging lighting conditions. △ Less

Submitted 5 June, 2025; originally announced June 2025.

arXiv:2505.16513 [pdf, ps, other]

Detailed Evaluation of Modern Machine Learning Approaches for Optic Plastics Sorting

Authors: Vaishali Maheshkar, Aadarsh Anantha Ramakrishnan, Charuvahan Adhivarahan, Karthik Dantu

Abstract: According to the EPA, only 25% of waste is recycled, and just 60% of U.S. municipalities offer curbside recycling. Plastics fare worse, with a recycling rate of only 8%; an additional 16% is incinerated, while the remaining 76% ends up in landfills. The low plastic recycling rate stems from contamination, poor economic incentives, and technical difficulties, making efficient recycling a challenge.… ▽ More According to the EPA, only 25% of waste is recycled, and just 60% of U.S. municipalities offer curbside recycling. Plastics fare worse, with a recycling rate of only 8%; an additional 16% is incinerated, while the remaining 76% ends up in landfills. The low plastic recycling rate stems from contamination, poor economic incentives, and technical difficulties, making efficient recycling a challenge. To improve recovery, automated sorting plays a critical role. Companies like AMP Robotics and Greyparrot utilize optical systems for sorting, while Materials Recovery Facilities (MRFs) employ Near-Infrared (NIR) sensors to detect plastic types. Modern optical sorting uses advances in computer vision such as object recognition and instance segmentation, powered by machine learning. Two-stage detectors like Mask R-CNN use region proposals and classification with deep backbones like ResNet. Single-stage detectors like YOLO handle detection in one pass, trading some accuracy for speed. While such methods excel under ideal conditions with a large volume of labeled training data, challenges arise in realistic scenarios, emphasizing the need to further examine the efficacy of optic detection for automated sorting. In this study, we compiled novel datasets totaling 20,000+ images from varied sources. Using both public and custom machine learning pipelines, we assessed the capabilities and limitations of optical recognition for sorting. Grad-CAM, saliency maps, and confusion matrices were employed to interpret model behavior. We perform this analysis on our custom trained models from the compiled datasets. To conclude, our findings are that optic recognition methods have limited success in accurate sorting of real-world plastics at MRFs, primarily because they rely on physical properties such as color and shape. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: Accepted at the 2024 REMADE Circular Economy Tech Summit and Conference, https://remadeinstitute.org/2024-conference/

MSC Class: 68T45 ACM Class: I.4.9; I.4.6

arXiv:2501.13725 [pdf, other]

You Only Crash Once v2: Perceptually Consistent Strong Features for One-Stage Domain Adaptive Detection of Space Terrain

Authors: Timothy Chase Jr, Christopher Wilson, Karthik Dantu

Abstract: The in-situ detection of planetary, lunar, and small-body surface terrain is crucial for autonomous spacecraft applications, where learning-based computer vision methods are increasingly employed to enable intelligence without prior information or human intervention. However, many of these methods remain computationally expensive for spacecraft processors and prevent real-time operation. Training… ▽ More The in-situ detection of planetary, lunar, and small-body surface terrain is crucial for autonomous spacecraft applications, where learning-based computer vision methods are increasingly employed to enable intelligence without prior information or human intervention. However, many of these methods remain computationally expensive for spacecraft processors and prevent real-time operation. Training of such algorithms is additionally complex due to the scarcity of labeled data and reliance on supervised learning approaches. Unsupervised Domain Adaptation (UDA) offers a promising solution by facilitating model training with disparate data sources such as simulations or synthetic scenes, although UDA is difficult to apply to celestial environments where challenging feature spaces are paramount. To alleviate such issues, You Only Crash Once (YOCOv1) has studied the integration of Visual Similarity-based Alignment (VSA) into lightweight one-stage object detection architectures to improve space terrain UDA. Although proven effective, the approach faces notable limitations, including performance degradations in multi-class and high-altitude scenarios. Building upon the foundation of YOCOv1, we propose novel additions to the VSA scheme that enhance terrain detection capabilities under UDA, and our approach is evaluated across both simulated and real-world data. Our second YOCO rendition, YOCOv2, is capable of achieving state-of-the-art UDA performance on surface terrain detection, where we showcase improvements upwards of 31% compared with YOCOv1 and terrestrial state-of-the-art. We demonstrate the practical utility of YOCOv2 with spacecraft flight hardware performance benchmarking and qualitative evaluation of NASA mission data. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.12654 [pdf, other]

AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation

Authors: Taimeng Fu, Zitong Zhan, Zhipeng Zhao, Shaoshu Su, Xiao Lin, Ehsan Tarkesh Esfahani, Karthik Dantu, Souma Chowdhury, Chen Wang

Abstract: Off-road navigation is essential for a wide range of applications in field robotics such as planetary exploration and disaster response. However, it remains an unresolved challenge due to the unstructured environments and inherent complexity of terrain-vehicle interactions. Traditional physics-based methods struggle to accurately model the nonlinear dynamics of these interactions, while data-drive… ▽ More Off-road navigation is essential for a wide range of applications in field robotics such as planetary exploration and disaster response. However, it remains an unresolved challenge due to the unstructured environments and inherent complexity of terrain-vehicle interactions. Traditional physics-based methods struggle to accurately model the nonlinear dynamics of these interactions, while data-driven approaches often suffer from overfitting to specific motion patterns, vehicle sizes, and types, limiting their generalizability. To overcome these challenges, we introduce a vision-based friction estimation framework grounded in neuro-symbolic principles, integrating neural networks for visual perception with symbolic reasoning for physical modeling. This enables significantly improved generalization abilities through explicit physical reasoning incorporating the predicted friction. Additionally, we develop a physics-informed planner that leverages the learned friction coefficient to generate physically feasible and efficient paths, along with corresponding speed profiles. We refer to our approach as AnyNav and evaluate it in both simulation and real-world experiments, demonstrating its utility and robustness across various off-road scenarios and multiple types of four-wheeled vehicles. These results mark an important step toward developing neuro-symbolic spatial intelligence to reason about complex, unstructured environments and enable autonomous off-road navigation in challenging scenarios. Video demonstrations are available at https://sairlab.org/anynav/, where the source code will also be released. △ Less

Submitted 22 January, 2025; originally announced January 2025.

arXiv:2501.01430 [pdf, other]

TERA: A Simulation Environment for Terrain Excavation Robot Autonomy

Authors: Christo Aluckal, Roopesh Vinodh Kumar Lal, Sean Courtney, Yash Turkar, Yashom Dighe, Young-Jin Kim, Jake Gemerek, Karthik Dantu

Abstract: Developing excavation autonomy is challenging given the environments where excavators operate, the complexity of physical interaction and the degrees of freedom of operation of the excavator itself. Simulation is a useful tool to build parts of the autonomy without the complexity of experimentation. Traditional excavator simulators are geared towards high fidelity interactions between the joints o… ▽ More Developing excavation autonomy is challenging given the environments where excavators operate, the complexity of physical interaction and the degrees of freedom of operation of the excavator itself. Simulation is a useful tool to build parts of the autonomy without the complexity of experimentation. Traditional excavator simulators are geared towards high fidelity interactions between the joints or between the terrain but do not incorporate other challenges such as perception required for end to end autonomy. A complete simulator should be capable of supporting real time operation while providing high fidelity simulation of the excavator(s), the environment, and their interaction. In this paper we present TERA (Terrain Excavation Robot Autonomy), a simulator geared towards autonomous excavator applications based on Unity3D and AGX that provides the extensibility and scalability required to study full autonomy. It provides the ability to configure the excavator and the environment per the user requirements. We also demonstrate realistic dynamics by incorporating a time-varying model that introduces variations in the system's responses. The simulator is then evaluated with different scenarios such as track deformation, velocities on different terrains, similarity of the system with the real excavator and the overall path error to show the capabilities of the simulation. △ Less

Submitted 13 December, 2024; originally announced January 2025.

Comments: 7 pages, 9 figures

arXiv:2410.05182 [pdf, other]

MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain

Authors: Timothy Chase Jr, Karthik Dantu

Abstract: The visual detection and tracking of surface terrain is required for spacecraft to safely land on or navigate within close proximity to celestial objects. Current approaches rely on template matching with pre-gathered patch-based features, which are expensive to obtain and a limiting factor in perceptual capability. While recent literature has focused on in-situ detection methods to enhance naviga… ▽ More The visual detection and tracking of surface terrain is required for spacecraft to safely land on or navigate within close proximity to celestial objects. Current approaches rely on template matching with pre-gathered patch-based features, which are expensive to obtain and a limiting factor in perceptual capability. While recent literature has focused on in-situ detection methods to enhance navigation and operational autonomy, robust description is still needed. In this work, we explore metric learning as the lightweight feature description mechanism and find that current solutions fail to address inter-class similarity and multi-view observational geometry. We attribute this to the view-unaware attention mechanism and introduce Multi-view Attention Regularizations (MARs) to constrain the channel and spatial attention across multiple feature views, regularizing the what and where of attention focus. We thoroughly analyze many modern metric learning losses with and without MARs and demonstrate improved terrain-feature recognition performance by upwards of 85%. We additionally introduce the Luna-1 dataset, consisting of Moon crater landmarks and reference navigation frames from NASA mission data to support future research in this difficult task. Luna-1 and source code are publicly available at https://droneslab.github.io/mars/. △ Less

Submitted 7 October, 2024; originally announced October 2024.

Comments: ECCV 2024. Project page available at https://droneslab.github.io/mars/

arXiv:2409.16566 [pdf, other]

PANOS: Payload-Aware Navigation in Offroad Scenarios

Authors: Kartikeya Singh, Yash Turkar, Christo Aluckal, Charuvarahan Adhivarahan, Karthik Dantu

Abstract: Nature has evolved humans to walk on different terrains by developing a detailed understanding of their physical characteristics. Similarly, legged robots need to develop their capability to walk on complex terrains with a variety of task-dependent payloads to achieve their goals. However, conventional terrain adaptation methods are susceptible to failure with varying payloads. In this work, we in… ▽ More Nature has evolved humans to walk on different terrains by developing a detailed understanding of their physical characteristics. Similarly, legged robots need to develop their capability to walk on complex terrains with a variety of task-dependent payloads to achieve their goals. However, conventional terrain adaptation methods are susceptible to failure with varying payloads. In this work, we introduce PANOS, a weakly supervised approach that integrates proprioception and exteroception from onboard sensing to achieve a stable gait while walking by a legged robot over various terrains. Our work also provides evidence of its adaptability over varying payloads. We evaluate our method on multiple terrains and payloads using a legged robot. PANOS improves the stability up to 44% without any payload and 53% with 15 lbs payload. We also notice a reduction in the vibration cost of 20% with the payload for various terrain types when compared to state-of-the-art methods. △ Less

Submitted 24 September, 2024; originally announced September 2024.

arXiv:2409.13151 [pdf, other]

Learning Visual Information Utility with PIXER

Authors: Yash Turkar, Timothy Chase Jr, Christo Aluckal, Karthik Dantu

Abstract: Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the co… ▽ More Accurate feature detection is fundamental for various computer vision tasks, including autonomous robotics, 3D reconstruction, medical imaging, and remote sensing. Despite advancements in enhancing the robustness of visual features, no existing method measures the utility of visual information before processing by specific feature-type algorithms. To address this gap, we introduce PIXER and the concept of "Featureness," which reflects the inherent interest and reliability of visual information for robust recognition, independent of any specific feature type. Leveraging a generalization on Bayesian learning, our approach quantifies both the probability and uncertainty of a pixel's contribution to robust visual utility in a single-shot process, avoiding costly operations such as Monte Carlo sampling and permitting customizable featureness definitions adaptable to a wide range of applications. We evaluate PIXER on visual odometry with featureness selectivity, achieving an average of 31% improvement in RMSE trajectory with 49% fewer features. △ Less

Submitted 19 September, 2024; originally announced September 2024.

arXiv:2310.06249 [pdf, other]

l-dyno: framework to learn consistent visual features using robot's motion

Authors: Kartikeya Singh, Charuvaran Adhivarahan, Karthik Dantu

Abstract: Historically, feature-based approaches have been used extensively for camera-based robot perception tasks such as localization, mapping, tracking, and others. Several of these approaches also combine other sensors (inertial sensing, for example) to perform combined state estimation. Our work rethinks this approach; we present a representation learning mechanism that identifies visual features that… ▽ More Historically, feature-based approaches have been used extensively for camera-based robot perception tasks such as localization, mapping, tracking, and others. Several of these approaches also combine other sensors (inertial sensing, for example) to perform combined state estimation. Our work rethinks this approach; we present a representation learning mechanism that identifies visual features that best correspond to robot motion as estimated by an external signal. Specifically, we utilize the robot's transformations through an external signal (inertial sensing, for example) and give attention to image space that is most consistent with the external signal. We use a pairwise consistency metric as a representation to keep the visual features consistent through a sequence with the robot's relative pose transformations. This approach enables us to incorporate information from the robot's perspective instead of solely relying on the image attributes. We evaluate our approach on real-world datasets such as KITTI & EuRoC and compare the refined features with existing feature descriptors. We also evaluate our method using our real robot experiment. We notice an average of 49% reduction in the image search space without compromising the trajectory estimation accuracy. Our method reduces the execution time of visual odometry by 4.3% and also reduces reprojection errors. We demonstrate the need to select only the most important features and show the competitiveness using various feature detection baselines. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: 7 pages, 6 figures

arXiv:2309.13035 [pdf, other]

PyPose v0.6: The Imperative Programming Interface for Robotics

Authors: Zitong Zhan, Xiangfu Li, Qihang Li, Haonan He, Abhinav Pandey, Haitao Xiao, Yangmengfei Xu, Xiangyu Chen, Kuan Xu, Kun Cao, Zhipeng Zhao, Zihan Wang, Huan Xu, Zihang Fang, Yutian Chen, Wentao Wang, Xu Fang, Yi Du, Tianhao Wu, Xiao Lin, Yuheng Qiu, Fan Yang, Jingnan Shi, Shaoshu Su, Yiren Lu , et al. (11 additional authors not shown)

Abstract: PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, inco… ▽ More PyPose is an open-source library for robot learning. It combines a learning-based approach with physics-based optimization, which enables seamless end-to-end robot learning. It has been used in many tasks due to its meticulously designed application programming interface (API) and efficient implementation. From its initial launch in early 2022, PyPose has experienced significant enhancements, incorporating a wide variety of new features into its platform. To satisfy the growing demand for understanding and utilizing the library and reduce the learning curve of new users, we present the fundamental design principle of the imperative programming interface, and showcase the flexible usage of diverse functionalities and modules using an extremely simple Dubins car example. We also demonstrate that the PyPose can be easily used to navigate a real quadruped robot with a few lines of code. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12429 [pdf, other]

DIOR: Dataset for Indoor-Outdoor Reidentification -- Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods

Authors: Yuyang Chen, Praveen Raj Masilamani, Bhavin Jawade, Srirangaraj Setlur, Karthik Dantu

Abstract: In recent times, there is an increased interest in the identification and re-identification of people at long distances, such as from rooftop cameras, UAV cameras, street cams, and others. Such recognition needs to go beyond face and use whole-body markers such as gait. However, datasets to train and test such recognition algorithms are not widely prevalent, and fewer are labeled. This paper intro… ▽ More In recent times, there is an increased interest in the identification and re-identification of people at long distances, such as from rooftop cameras, UAV cameras, street cams, and others. Such recognition needs to go beyond face and use whole-body markers such as gait. However, datasets to train and test such recognition algorithms are not widely prevalent, and fewer are labeled. This paper introduces DIOR -- a framework for data collection, semi-automated annotation, and also provides a dataset with 14 subjects and 1.649 million RGB frames with 3D/2D skeleton gait labels, including 200 thousands frames from a long range camera. Our approach leverages advanced 3D computer vision techniques to attain pixel-level accuracy in indoor settings with motion capture systems. Additionally, for outdoor long-range settings, we remove the dependency on motion capture systems and adopt a low-cost, hybrid 3D computer vision and learning pipeline with only 4 low-cost RGB cameras, successfully achieving precise skeleton labeling on far-away subjects, even when their height is limited to a mere 20-25 pixels within an RGB frame. On publication, we will make our pipeline open for others to use. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2308.09075 [pdf, other]

Fast Decision Support for Air Traffic Management at Urban Air Mobility Vertiports using Graph Learning

Authors: Prajit KrisshnaKumar, Jhoel Witter, Steve Paul, Hanvit Cho, Karthik Dantu, Souma Chowdhury

Abstract: Urban Air Mobility (UAM) promises a new dimension to decongested, safe, and fast travel in urban and suburban hubs. These UAM aircraft are conceived to operate from small airports called vertiports each comprising multiple take-off/landing and battery-recharging spots. Since they might be situated in dense urban areas and need to handle many aircraft landings and take-offs each hour, managing this… ▽ More Urban Air Mobility (UAM) promises a new dimension to decongested, safe, and fast travel in urban and suburban hubs. These UAM aircraft are conceived to operate from small airports called vertiports each comprising multiple take-off/landing and battery-recharging spots. Since they might be situated in dense urban areas and need to handle many aircraft landings and take-offs each hour, managing this schedule in real-time becomes challenging for a traditional air-traffic controller but instead calls for an automated solution. This paper provides a novel approach to this problem of Urban Air Mobility - Vertiport Schedule Management (UAM-VSM), which leverages graph reinforcement learning to generate decision-support policies. Here the designated physical spots within the vertiport's airspace and the vehicles being managed are represented as two separate graphs, with feature extraction performed through a graph convolutional network (GCN). Extracted features are passed onto perceptron layers to decide actions such as continue to hover or cruise, continue idling or take-off, or land on an allocated vertiport spot. Performance is measured based on delays, safety (no. of collisions) and battery consumption. Through realistic simulations in AirSim applied to scaled down multi-rotor vehicles, our results demonstrate the suitability of using graph reinforcement learning to solve the UAM-VSM problem and its superiority to basic reinforcement learning (with graph embeddings) or random choice baselines. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted for presentation in proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems 2023

arXiv:2306.03660 [pdf, other]

Empir3D : A Framework for Multi-Dimensional Point Cloud Assessment

Authors: Yash Turkar, Pranay Meshram, Christo Aluckal, Charuvahan Adhivarahan, Karthik Dantu

Abstract: Advancements in sensors, algorithms, and compute hardware have made 3D perception feasible in real time. Current methods to compare and evaluate the quality of a 3D model, such as Chamfer, Hausdorff, and Earth-Mover's distance, are uni-dimensional and have limitations, including an inability to capture coverage, local variations in density and error, and sensitivity to outliers. In this paper, we… ▽ More Advancements in sensors, algorithms, and compute hardware have made 3D perception feasible in real time. Current methods to compare and evaluate the quality of a 3D model, such as Chamfer, Hausdorff, and Earth-Mover's distance, are uni-dimensional and have limitations, including an inability to capture coverage, local variations in density and error, and sensitivity to outliers. In this paper, we propose an evaluation framework for point clouds (Empir3D) that consists of four metrics: resolution to quantify the ability to distinguish between individual parts in the point cloud, accuracy to measure registration error, coverage to evaluate the portion of missing data, and artifact score to characterize the presence of artifacts. Through detailed analysis, we demonstrate the complementary nature of each of these dimensions and the improvements they provide compared to the aforementioned uni-dimensional measures. Furthermore, we illustrate the utility of Empir3D by comparing our metrics with uni-dimensional metrics for two 3D perception applications (SLAM and point cloud completion). We believe that Empir3D advances our ability to reason about point clouds and helps better debug 3D perception applications by providing a richer evaluation of their performance. Our implementation of Empir3D, custom real-world datasets, evaluations on learning methods, and detailed documentation on how to integrate the pipeline will be made available upon publication. △ Less

Submitted 19 September, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2303.04891 [pdf, other]

You Only Crash Once: Improved Object Detection for Real-Time, Sim-to-Real Hazardous Terrain Detection and Classification for Autonomous Planetary Landings

Authors: Timothy Chase Jr, Chris Gnam, John Crassidis, Karthik Dantu

Abstract: The detection of hazardous terrain during the planetary landing of spacecraft plays a critical role in assuring vehicle safety and mission success. A cheap and effective way of detecting hazardous terrain is through the use of visual cameras, which ensure operational ability from atmospheric entry through touchdown. Plagued by resource constraints and limited computational power, traditional techn… ▽ More The detection of hazardous terrain during the planetary landing of spacecraft plays a critical role in assuring vehicle safety and mission success. A cheap and effective way of detecting hazardous terrain is through the use of visual cameras, which ensure operational ability from atmospheric entry through touchdown. Plagued by resource constraints and limited computational power, traditional techniques for visual hazardous terrain detection focus on template matching and registration to pre-built hazard maps. Although successful on previous missions, this approach is restricted to the specificity of the templates and limited by the fidelity of the underlying hazard map, which both require extensive pre-flight cost and effort to obtain and develop. Terrestrial systems that perform a similar task in applications such as autonomous driving utilize state-of-the-art deep learning techniques to successfully localize and classify navigation hazards. Advancements in spacecraft co-processors aimed at accelerating deep learning inference enable the application of these methods in space for the first time. In this work, we introduce You Only Crash Once (YOCO), a deep learning-based visual hazardous terrain detection and classification technique for autonomous spacecraft planetary landings. Through the use of unsupervised domain adaptation we tailor YOCO for training by simulation, removing the need for real-world annotated data and expensive mission surveying phases. We further improve the transfer of representative terrain knowledge between simulation and the real world through visual similarity clustering. We demonstrate the utility of YOCO through a series of terrestrial and extraterrestrial simulation-to-real experiments and show substantial improvements toward the ability to both detect and accurately classify instances of planetary terrain. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: To be published in proceedings of AAS/AIAA Astrodynamics Specialist Conference 2022

arXiv:2302.14334 [pdf, other]

doi 10.1109/TRO.2024.3371885

Design of an Adaptive Lightweight LiDAR to Decouple Robot-Camera Geometry

Authors: Yuyang Chen, Dingkang Wang, Lenworth Thomas, Karthik Dantu, Sanjeev J. Koppal

Abstract: A fundamental challenge in robot perception is the coupling of the sensor pose and robot pose. This has led to research in active vision where robot pose is changed to reorient the sensor to areas of interest for perception. Further, egomotion such as jitter, and external effects such as wind and others affect perception requiring additional effort in software such as image stabilization. This eff… ▽ More A fundamental challenge in robot perception is the coupling of the sensor pose and robot pose. This has led to research in active vision where robot pose is changed to reorient the sensor to areas of interest for perception. Further, egomotion such as jitter, and external effects such as wind and others affect perception requiring additional effort in software such as image stabilization. This effect is particularly pronounced in micro-air vehicles and micro-robots who typically are lighter and subject to larger jitter but do not have the computational capability to perform stabilization in real-time. We present a novel microelectromechanical (MEMS) mirror LiDAR system to change the field of view of the LiDAR independent of the robot motion. Our design has the potential for use on small, low-power systems where the expensive components of the LiDAR can be placed external to the small robot. We show the utility of our approach in simulation and on prototype hardware mounted on a UAV. We believe that this LiDAR and its compact movable scanning design provide mechanisms to decouple robot and sensor geometry allowing us to simplify robot perception. We also demonstrate examples of motion compensation using IMU and external odometry feedback in hardware. △ Less

Submitted 29 February, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

Comments: This paper is published in IEEE Transactions on Robotics

Journal ref: IEEE Transactions on Robotics, 2024.

arXiv:2302.05849 [pdf, other]

Graph Learning Based Decision Support for Multi-Aircraft Take-Off and Landing at Urban Air Mobility Vertiports

Authors: Prajit KrisshnaKumar, Jhoel Witter, Steve Paul, Karthik Dantu, Souma Chowdhury

Abstract: Majority of aircraft under the Urban Air Mobility (UAM) concept are expected to be of the electric vertical takeoff and landing (eVTOL) vehicle type, which will operate out of vertiports. While this is akin to the relationship between general aviation aircraft and airports, the conceived location of vertiports within dense urban environments presents unique challenges in managing the air traffic s… ▽ More Majority of aircraft under the Urban Air Mobility (UAM) concept are expected to be of the electric vertical takeoff and landing (eVTOL) vehicle type, which will operate out of vertiports. While this is akin to the relationship between general aviation aircraft and airports, the conceived location of vertiports within dense urban environments presents unique challenges in managing the air traffic served by a vertiport. This challenge becomes pronounced within increasing frequency of scheduled landings and take-offs. This paper assumes a centralized air traffic controller (ATC) to explore the performance of a new AI driven ATC approach to manage the eVTOLs served by the vertiport. Minimum separation-driven safety and delays are the two important considerations in this case. The ATC problem is modeled as a task allocation problem, and uncertainties due to communication disruptions (e.g., poor link quality) and inclement weather (e.g., high gust effects) are added as a small probability of action failures. To learn the vertiport ATC policy, a novel graph-based reinforcement learning (RL) solution called "Urban Air Mobility- Vertiport Schedule Management (UAM-VSM)" is developed. This approach uses graph convolutional networks (GCNs) to abstract the vertiport space and eVTOL space as graphs, and aggregate information for a centralized ATC agent to help generalize the environment. Unreal Engine combined with Airsim is used as the simulation environment over which training and testing occurs. Uncertainties are considered only during testing, due to the high cost of Mc sampling over such realistic simulations. The proposed graph RL method demonstrates significantly better performance on the test scenarios when compared against a feasible random decision-making baseline and a first come first serve (FCFS) baseline, including the ability to generalize to unseen scenarios and with uncertainties. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: Presented at AIAA Scitech Forum 2022

arXiv:2209.15428 [pdf, other]

PyPose: A Library for Robot Learning with Physics-based Optimization

Authors: Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey, Aryan, Jiahe Xu, Tianhao Wu, Haonan He, Daning Huang, Zhongqiang Ren, Shibo Zhao, Taimeng Fu, Pranay Reddy, Xiao Lin, Wenshan Wang, Jingnan Shi, Rajat Talak, Kun Cao, Yi Du , et al. (12 additional authors not shown)

Abstract: Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two co… ▽ More Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual models with physics-based optimization. PyPose's architecture is tidy and well-organized, it has an imperative style interface and is efficient and user-friendly, making it easy to integrate into real-world robotic applications. Besides, it supports parallel computing of any order gradients of Lie groups and Lie algebras and $2^{\text{nd}}$-order optimizers, such as trust region methods. Experiments show that PyPose achieves more than $10\times$ speedup in computation compared to the state-of-the-art libraries. To boost future research, we provide concrete examples for several fields of robot learning, including SLAM, planning, control, and inertial navigation. △ Less

Submitted 24 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Project Website: https://pypose.org Documentation: https://pypose.org/docs/ Tutorial: https://pypose.org/tutorials/ Source code: https://github.com/pypose/pypose

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

arXiv:2109.12055 [pdf, other]

Using Physiological Information to Classify Task Difficulty in Human-Swarm Interaction

Authors: Joseph P. Distefano, Hemanth Manjunatha, Souma Chowdhury, Karthik Dantu, David Doermann, Ehsan T. Esfahani

Abstract: Human-swarm interaction has recently gained attention due to its plethora of new applications in disaster relief, surveillance, rescue, and exploration. However, if the task difficulty increases, the performance of the human operator decreases, thereby decreasing the overall efficacy of the human-swarm team. Thus, it is critical to identify the task difficulty and adaptively allocate the task to t… ▽ More Human-swarm interaction has recently gained attention due to its plethora of new applications in disaster relief, surveillance, rescue, and exploration. However, if the task difficulty increases, the performance of the human operator decreases, thereby decreasing the overall efficacy of the human-swarm team. Thus, it is critical to identify the task difficulty and adaptively allocate the task to the human operator to maintain optimal performance. In this direction, we study the classification of task difficulty in a human-swarm interaction experiment performing a target search mission. The human may control platoons of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to search a partially observable environment during the target search mission. The mission complexity is increased by introducing adversarial teams that humans may only see when the environment is explored. While the human is completing the mission, their brain activity is recorded using an electroencephalogram (EEG), which is used to classify the task difficulty. We have used two different approaches for classification: A feature-based approach using coherence values as input and a deep learning-based approach using raw EEG as input. Both approaches can classify the task difficulty well above the chance. The results showed the importance of the occipital lobe (O1 and O2) coherence feature with the other brain regions. Moreover, we also study individual differences (expert vs. novice) in the classification results. The analysis revealed that the temporal lobe in experts (T4 and T3) is predominant for task difficulty classification compared with novices. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2109.05663 [pdf, other]

Learning Robot Swarm Tactics over Complex Adversarial Environments

Authors: Amir Behjat, Hemanth Manjunatha, Prajit KrisshnaKumar, Apurv Jani, Leighton Collins, Payam Ghassemi, Joseph Distefano, David Doermann, Karthik Dantu, Ehsan Esfahani, Souma Chowdhury

Abstract: To accomplish complex swarm robotic missions in the real world, one needs to plan and execute a combination of single robot behaviors, group primitives such as task allocation, path planning, and formation control, and mission-specific objectives such as target search and group coverage. Most such missions are designed manually by teams of robotics experts. Recent work in automated approaches to l… ▽ More To accomplish complex swarm robotic missions in the real world, one needs to plan and execute a combination of single robot behaviors, group primitives such as task allocation, path planning, and formation control, and mission-specific objectives such as target search and group coverage. Most such missions are designed manually by teams of robotics experts. Recent work in automated approaches to learning swarm behavior has been limited to individual primitives with sparse work on learning complete missions. This paper presents a systematic approach to learn tactical mission-specific policies that compose primitives in a swarm to accomplish the mission efficiently using neural networks with special input and output encoding. To learn swarm tactics in an adversarial environment, we employ a combination of 1) map-to-graph abstraction, 2) input/output encoding via Pareto filtering of points of interest and clustering of robots, and 3) learning via neuroevolution and policy gradient approaches. We illustrate this combination as critical to providing tractable learning, especially given the computational cost of simulating swarm missions of this scale and complexity. Successful mission completion outcomes are demonstrated with up to 60 robots. In addition, a close match in the performance statistics in training and testing scenarios shows the potential generalizability of the proposed framework. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: Accepted to IEEE International Symposium on Multi-Robot and Multi-Agent Systems 2021

arXiv:2103.14709 [pdf, other]

Scalable Coverage Path Planning of Multi-Robot Teams for Monitoring Non-Convex Areas

Authors: Leighton Collins, Payam Ghassemi, Ehsan T. Esfahani, David Doermann, Karthik Dantu, Souma Chowdhury

Abstract: This paper presents a novel multi-robot coverage path planning (CPP) algorithm - aka SCoPP - that provides a time-efficient solution, with workload balanced plans for each robot in a multi-robot system, based on their initial states. This algorithm accounts for discontinuities (e.g., no-fly zones) in a specified area of interest, and provides an optimized ordered list of way-points per robot using… ▽ More This paper presents a novel multi-robot coverage path planning (CPP) algorithm - aka SCoPP - that provides a time-efficient solution, with workload balanced plans for each robot in a multi-robot system, based on their initial states. This algorithm accounts for discontinuities (e.g., no-fly zones) in a specified area of interest, and provides an optimized ordered list of way-points per robot using a discrete, computationally efficient, nearest neighbor path planning algorithm. This algorithm involves five main stages, which include the transformation of the user's input as a set of vertices in geographical coordinates, discretization, load-balanced partitioning, auctioning of conflict cells in a discretized space, and a path planning procedure. To evaluate the effectiveness of the primary algorithm, a multi-unmanned aerial vehicle (UAV) post-flood assessment application is considered, and the performance of the algorithm is tested on three test maps of varying sizes. Additionally, our method is compared with a state-of-the-art method created by Guasella et al. Further analyses on scalability and computational time of SCoPP are conducted. The results show that SCoPP is superior in terms of mission completion time; its computing time is found to be under 2 mins for a large map covered by a 150-robot team, thereby demonstrating its computationally scalability. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: Accepted for publication in the proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2102.08542 [pdf, other]

Active Face Frontalization using Commodity Unmanned Aerial Vehicles

Authors: Nagashri Lakshminarayana, Yifang Liu, Karthik Dantu, Venu Govindaraju, Nils Napp

Abstract: This paper describes a system by which Unmanned Aerial Vehicles (UAVs) can gather high-quality face images that can be used in biometric identification tasks. Success in face-based identification depends in large part on the image quality, and a major factor is how frontal the view is. Face recognition software pipelines can improve identification rates by synthesizing frontal views from non-front… ▽ More This paper describes a system by which Unmanned Aerial Vehicles (UAVs) can gather high-quality face images that can be used in biometric identification tasks. Success in face-based identification depends in large part on the image quality, and a major factor is how frontal the view is. Face recognition software pipelines can improve identification rates by synthesizing frontal views from non-frontal views by a process call {\em frontalization}. Here we exploit the high mobility of UAVs to actively gather frontal images using components of a synthetic frontalization pipeline. We define a frontalization error and show that it can be used to guide an UAVs to capture frontal views. Further, we show that the resulting image stream improves matching quality of a typical face recognition similarity metric. The system is implemented using an off-the-shelf hardware and software components and can be easily transfered to any ROS enabled UAVs. △ Less

Submitted 16 February, 2021; originally announced February 2021.

arXiv:2102.07020 [pdf, other]

Understanding Bounding Functions in Safety-Critical UAV Software

Authors: Xiaozhou Liang, John Henry Burns, Joseph Sanchez, Karthik Dantu, Lukasz Ziarek, Yu David Liu

Abstract: Unmanned Aerial Vehicles (UAVs) are an emerging computation platform known for their safety-critical need. In this paper, we conduct an empirical study on a widely used open-source UAV software framework, Paparazzi, with the goal of understanding the safety-critical concerns of UAV software from a bottom-up developer-in-the-field perspective. We set our focus on the use of Bounding Functions (BFs)… ▽ More Unmanned Aerial Vehicles (UAVs) are an emerging computation platform known for their safety-critical need. In this paper, we conduct an empirical study on a widely used open-source UAV software framework, Paparazzi, with the goal of understanding the safety-critical concerns of UAV software from a bottom-up developer-in-the-field perspective. We set our focus on the use of Bounding Functions (BFs), the runtime checks injected by Paparazzi developers on the range of variables. Through an in-depth analysis on BFs in the Paparazzi autopilot software, we found a large number of them (109 instances) are used to bound safety-critical variables essential to the cyber-physical nature of the UAV, such as its thrust, its speed, and its sensor values. The novel contributions of this study are two fold. First, we take a static approach to classify all BF instances, presenting a novel datatype-based 5-category taxonomy with fine-grained insight on the role of BFs in ensuring the safety of UAV systems. Second, we dynamically evaluate the impact of the BF uses through a differential approach, establishing the UAV behavioral difference with and without BFs. The two-pronged static and dynamic approach together illuminates a rarely studied design space of safety-critical UAV software systems. △ Less

Submitted 13 February, 2021; originally announced February 2021.

Comments: 12 pages, 7 figures, to be published in ICSE 2021

arXiv:1903.06687 [pdf, ps, other]

doi 10.1007/s10514-019-09874-z

Augmenting Visual SLAM with Wi-Fi Sensing For Indoor Applications

Authors: Zakieh S. Hashemifar, Charuvahan Adhivarahan, Anand Balakrishnan, Karthik Dantu

Abstract: Recent trends have accelerated the development of spatial applications on mobile devices and robots. These include navigation, augmented reality, human-robot interaction, and others. A key enabling technology for such applications is the understanding of the device's location and the map of the surrounding environment. This generic problem, referred to as Simultaneous Localization and Mapping (SLA… ▽ More Recent trends have accelerated the development of spatial applications on mobile devices and robots. These include navigation, augmented reality, human-robot interaction, and others. A key enabling technology for such applications is the understanding of the device's location and the map of the surrounding environment. This generic problem, referred to as Simultaneous Localization and Mapping (SLAM), is an extensively researched topic in robotics. However, visual SLAM algorithms face several challenges including perceptual aliasing and high computational cost. These challenges affect the accuracy, efficiency, and viability of visual SLAM algorithms, especially for long-term SLAM, and their use in resource-constrained mobile devices. A parallel trend is the ubiquity of Wi-Fi routers for quick Internet access in most urban environments. Most robots and mobile devices are equipped with a Wi-Fi radio as well. We propose a method to utilize Wi-Fi received signal strength to alleviate the challenges faced by visual SLAM algorithms. To demonstrate the utility of this idea, this work makes the following contributions: (i) We propose a generic way to integrate Wi-Fi sensing into visual SLAM algorithms, (ii) We integrate such sensing into three well-known SLAM algorithms, (iii) Using four distinct datasets, we demonstrate the performance of such augmentation in comparison to the original visual algorithms and (iv) We compare our work to Wi-Fi augmented FABMAP algorithm. Overall, we show that our approach can improve the accuracy of visual SLAM algorithms by 11% on average and reduce computation time on average by 15% to 25%. △ Less

Submitted 15 March, 2019; originally announced March 2019.

Comments: 16 pages, 19 figures, Autonomous Robots Journal submission (AuRo)

Journal ref: Autonomous Robots, 43, 2019, 2245-2260

Showing 1–23 of 23 results for author: Dantu, K