-
Steppability-informed Quadrupedal Contact Planning through Deep Visual Search Heuristics
Authors:
Max Asselmeier,
Ye Zhao,
Patricio A. Vela
Abstract:
In this work, we introduce a method for predicting environment steppability -- the ability of a legged robot platform to place a foothold at a particular location in the local environment -- in the image space. This novel environment representation captures this critical geometric property of the local terrain while allowing us to exploit the computational benefits of sensing and planning in the i…
▽ More
In this work, we introduce a method for predicting environment steppability -- the ability of a legged robot platform to place a foothold at a particular location in the local environment -- in the image space. This novel environment representation captures this critical geometric property of the local terrain while allowing us to exploit the computational benefits of sensing and planning in the image space. We adapt a primitive shapes-based synthetic data generation scheme to create geometrically rich and diverse simulation scenes and extract ground truth semantic information in order to train a steppability model. We then integrate this steppability model into an existing interleaved graph search and trajectory optimization-based footstep planner to demonstrate how this steppability paradigm can inform footstep planning in complex, unknown environments. We analyze the steppability model performance to demonstrate its validity, and we deploy the perception-informed footstep planner both in offline and online settings to experimentally verify planning performance.
△ Less
Submitted 30 December, 2024;
originally announced January 2025.
-
Socio-Emotional Response Generation: A Human Evaluation Protocol for LLM-Based Conversational Systems
Authors:
Lorraine Vanel,
Ariel R. Ramos Vela,
Alya Yacoubi,
ChloƩ Clavel
Abstract:
Conversational systems are now capable of producing impressive and generally relevant responses. However, we have no visibility nor control of the socio-emotional strategies behind state-of-the-art Large Language Models (LLMs), which poses a problem in terms of their transparency and thus their trustworthiness for critical applications. Another issue is that current automated metrics are not able…
▽ More
Conversational systems are now capable of producing impressive and generally relevant responses. However, we have no visibility nor control of the socio-emotional strategies behind state-of-the-art Large Language Models (LLMs), which poses a problem in terms of their transparency and thus their trustworthiness for critical applications. Another issue is that current automated metrics are not able to properly evaluate the quality of generated responses beyond the dataset's ground truth. In this paper, we propose a neural architecture that includes an intermediate step in planning socio-emotional strategies before response generation. We compare the performance of open-source baseline LLMs to the outputs of these same models augmented with our planning module. We also contrast the outputs obtained from automated metrics and evaluation results provided by human annotators. We describe a novel evaluation protocol that includes a coarse-grained consistency evaluation, as well as a finer-grained annotation of the responses on various social and emotional criteria. Our study shows that predicting a sequence of expected strategy labels and using this sequence to generate a response yields better results than a direct end-to-end generation scheme. It also highlights the divergences and the limits of current evaluation metrics for generated content. The code for the annotation platform and the annotated data are made publicly available for the evaluation of future models.
△ Less
Submitted 26 November, 2024;
originally announced December 2024.
-
OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB
Authors:
Yunzhi Lin,
Yipu Zhao,
Fu-Jen Chu,
Xingyu Chen,
Weiyao Wang,
Hao Tang,
Patricio A. Vela,
Matt Feiszli,
Kevin Liang
Abstract:
To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions. We additionally present a benchmarking framework for a comprehensive comparison of pose tracking algorithms. We propose a pipeline featuring an uncertainty-aware keypoint refi…
▽ More
To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions. We additionally present a benchmarking framework for a comprehensive comparison of pose tracking algorithms. We propose a pipeline featuring an uncertainty-aware keypoint refinement network, employing probabilistic modeling to refine pose estimation. Comparative evaluations demonstrate that our approach achieves performance superior to existing baselines on real datasets, underscoring the effectiveness of our synthetic dataset and refinement technique in enhancing tracking precision in dynamic contexts. Our contributions set a new precedent for the development and assessment of object pose tracking methodologies in complex scenes.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Task-driven SLAM Benchmarking For Robot Navigation
Authors:
Yanwei Du,
Shiyu Feng,
Carlton G. Cort,
Patricio A. Vela
Abstract:
A critical use case of SLAM for mobile assistive robots is to support localization during a navigation-based task. Current SLAM benchmarks overlook the significance of repeatability (precision), despite its importance in real-world deployments. To address this gap, we propose a task-driven approach to SLAM benchmarking, TaskSLAM-Bench. It employs precision as a key metric, accounts for SLAM's mapp…
▽ More
A critical use case of SLAM for mobile assistive robots is to support localization during a navigation-based task. Current SLAM benchmarks overlook the significance of repeatability (precision), despite its importance in real-world deployments. To address this gap, we propose a task-driven approach to SLAM benchmarking, TaskSLAM-Bench. It employs precision as a key metric, accounts for SLAM's mapping capabilities, and has easy-to-meet implementation requirements. Simulated and real-world testing scenarios of SLAM methods provide insights into the navigation performance properties of modern visual and LiDAR SLAM solutions. The outcomes show that passive stereo SLAM operates at a level of precision comparable to LiDAR SLAM in typical indoor environments. TaskSLAM-Bench complements existing benchmarks and offers richer assessment of SLAM performance in navigation-focused scenarios. Publicly available code permits in-situ SLAM testing in custom environments with properly equipped robots.
△ Less
Submitted 9 March, 2025; v1 submitted 24 September, 2024;
originally announced September 2024.
-
Hierarchical Experience-informed Navigation for Multi-modal Quadrupedal Rebar Grid Traversal
Authors:
Max Asselmeier,
Jane Ivanova,
Ziyi Zhou,
Patricio A. Vela,
Ye Zhao
Abstract:
This study focuses on a layered, experience-based, multi-modal contact planning framework for agile quadrupedal locomotion over a constrained rebar environment. To this end, our hierarchical planner incorporates locomotion-specific modules into the high-level contact sequence planner and solves kinodynamically-aware trajectory optimization as the low-level motion planner. Through quantitative anal…
▽ More
This study focuses on a layered, experience-based, multi-modal contact planning framework for agile quadrupedal locomotion over a constrained rebar environment. To this end, our hierarchical planner incorporates locomotion-specific modules into the high-level contact sequence planner and solves kinodynamically-aware trajectory optimization as the low-level motion planner. Through quantitative analysis of the experience accumulation process and experimental validation of the kinodynamic feasibility of the generated locomotion trajectories, we demonstrate that the experience planning heuristic offers an effective way of providing candidate footholds for a legged contact planner. Additionally, we introduce a guiding torso path heuristic at the global planning level to enhance the navigation success rate in the presence of environmental obstacles. Our results indicate that the torso-path guided experience accumulation requires significantly fewer offline trials to successfully reach the goal compared to regular experience accumulation. Finally, our planning framework is validated in both dynamics simulations and real hardware implementations on a quadrupedal robot provided by Skymul Inc.
△ Less
Submitted 13 April, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Multi-gait Locomotion Planning and Tracking for Tendon-actuated Terrestrial Soft Robot (TerreSoRo)
Authors:
Arun Niddish Mahendran,
Caitlin Freeman,
Alexander H. Chang,
Michael McDougall,
Patricio A. Vela,
Vishesh Vikas
Abstract:
The adaptability of soft robots makes them ideal candidates to maneuver through unstructured environments. However, locomotion challenges arise due to complexities in modeling the body mechanics, actuation, and robot-environment dynamics. These factors contribute to the gap between their potential and actual autonomous field deployment. A closed-loop path planning framework for soft robot locomoti…
▽ More
The adaptability of soft robots makes them ideal candidates to maneuver through unstructured environments. However, locomotion challenges arise due to complexities in modeling the body mechanics, actuation, and robot-environment dynamics. These factors contribute to the gap between their potential and actual autonomous field deployment. A closed-loop path planning framework for soft robot locomotion is critical to close the real-world realization gap. This paper presents a generic path planning framework applied to TerreSoRo (Tetra-Limb Terrestrial Soft Robot) with pose feedback. It employs a gait-based, lattice trajectory planner to facilitate navigation in the presence of obstacles. The locomotion gaits are synthesized using a data-driven optimization approach that allows for learning from the environment. The trajectory planner employs a greedy breadth-first search strategy to obtain a collision-free trajectory. The synthesized trajectory is a sequence of rotate-then-translate gait pairs. The control architecture integrates high-level and low-level controllers with real-time localization (using an overhead webcam). TerreSoRo successfully navigates environments with obstacles where path re-planning is performed. To best of our knowledge, this is the first instance of real-time, closed-loop path planning of a non-pneumatic soft robot.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Planning with Sequence Models through Iterative Energy Minimization
Authors:
Hongyi Chen,
Yilun Du,
Yiye Chen,
Joshua Tenenbaum,
Patricio A. Vela
Abstract:
Recent works have shown that sequence modeling can be effectively used to train reinforcement learning (RL) policies. However, the success of applying existing sequence models to planning, in which we wish to obtain a trajectory of actions to reach some goal, is less straightforward. The typical autoregressive generation procedures of sequence models preclude sequential refinement of earlier steps…
▽ More
Recent works have shown that sequence modeling can be effectively used to train reinforcement learning (RL) policies. However, the success of applying existing sequence models to planning, in which we wish to obtain a trajectory of actions to reach some goal, is less straightforward. The typical autoregressive generation procedures of sequence models preclude sequential refinement of earlier steps, which limits the effectiveness of a predicted plan. In this paper, we suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization, and illustrate how such a procedure leads to improved RL performance across different tasks. We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy. We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments. We further demonstrate unique benefits of our iterative optimization procedure, involving new task generalization, test-time constraints adaptation, and the ability to compose plans together. Project website: https://hychen-naza.github.io/projects/LEAP
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Safe Hierarchical Navigation in Crowded Dynamic Uncertain Environments
Authors:
Hongyi Chen,
Shiyu Feng,
Ye Zhao,
Changliu Liu,
Patricio A. Vela
Abstract:
This paper describes a hierarchical solution consisting of a multi-phase planner and a low-level safe controller to jointly solve the safe navigation problem in crowded, dynamic, and uncertain environments. The planner employs dynamic gap analysis and trajectory optimization to achieve collision avoidance with respect to the predicted trajectories of dynamic agents within the sensing and planning…
▽ More
This paper describes a hierarchical solution consisting of a multi-phase planner and a low-level safe controller to jointly solve the safe navigation problem in crowded, dynamic, and uncertain environments. The planner employs dynamic gap analysis and trajectory optimization to achieve collision avoidance with respect to the predicted trajectories of dynamic agents within the sensing and planning horizon and with robustness to agent uncertainty. To address uncertainty over the planning horizon and real-time safety, a fast reactive safe set algorithm (SSA) is adopted, which monitors and modifies the unsafe control during trajectory tracking. Compared to other existing methods, our approach offers theoretical guarantees of safety and achieves collision-free navigation with higher probability in uncertain environments, as demonstrated in scenarios with 20 and 50 dynamic agents. Project website: https://hychen-naza.github.io/projects/HDAGap/.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Safer Gap: A Gap-based Local Planner for Safe Navigation with Nonholonomic Mobile Robots
Authors:
Shiyu Feng,
Ahmad Abuaish,
Patricio A. Vela
Abstract:
This paper extends the gap-based navigation technique in Potential Gap by guaranteeing safety for nonholonomic robots for all tiers of the local planner hierarchy, so called Safer Gap. The first tier generates a Bezier-based collision-free path through gaps. A subset of navigable free-space from the robot through a gap, called the keyhole, is defined to be the union of the largest collision-free d…
▽ More
This paper extends the gap-based navigation technique in Potential Gap by guaranteeing safety for nonholonomic robots for all tiers of the local planner hierarchy, so called Safer Gap. The first tier generates a Bezier-based collision-free path through gaps. A subset of navigable free-space from the robot through a gap, called the keyhole, is defined to be the union of the largest collision-free disc centered on the robot and a trapezoidal region directed through the gap. It is encoded by a shallow neural network zeroing barrier function (ZBF). Nonlinear model predictive control (NMPC), with Keyhole ZBF constraints and output tracking of the Bezier path, synthesizes a safe kinematically-feasible trajectory. Low-level use of the Keyhole ZBF within a point-wise optimization-based safe control synthesis module serves as a final safety layer. Simulation and experimental validation of Safer Gap confirm its collision-free navigation properties.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
WDiscOOD: Out-of-Distribution Detection via Whitened Linear Discriminant Analysis
Authors:
Yiye Chen,
Yunzhi Lin,
Ruinian Xu,
Patricio A. Vela
Abstract:
Deep neural networks are susceptible to generating overconfident yet erroneous predictions when presented with data beyond known concepts. This challenge underscores the importance of detecting out-of-distribution (OOD) samples in the open world. In this work, we propose a novel feature-space OOD detection score based on class-specific and class-agnostic information. Specifically, the approach uti…
▽ More
Deep neural networks are susceptible to generating overconfident yet erroneous predictions when presented with data beyond known concepts. This challenge underscores the importance of detecting out-of-distribution (OOD) samples in the open world. In this work, we propose a novel feature-space OOD detection score based on class-specific and class-agnostic information. Specifically, the approach utilizes Whitened Linear Discriminant Analysis to project features into two subspaces - the discriminative and residual subspaces - for which the in-distribution (ID) classes are maximally separated and closely clustered, respectively. The OOD score is then determined by combining the deviation from the input data to the ID pattern in both subspaces. The efficacy of our method, named WDiscOOD, is verified on the large-scale ImageNet-1k benchmark, with six OOD datasets that cover a variety of distribution shifts. WDiscOOD demonstrates superior performance on deep classifiers with diverse backbone architectures, including CNN and vision transformer. Furthermore, we also show that WDiscOOD more effectively detects novel concepts in representation spaces trained with contrastive objectives, including supervised contrastive loss and multi-modality contrastive loss.
△ Less
Submitted 29 August, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Zero-Shot Object Searching Using Large-scale Object Relationship Prior
Authors:
Hongyi Chen,
Ruinian Xu,
Shuo Cheng,
Patricio A. Vela,
Danfei Xu
Abstract:
Home-assistant robots have been a long-standing research topic, and one of the biggest challenges is searching for required objects in housing environments. Previous object-goal navigation requires the robot to search for a target object category in an unexplored environment, which may not be suitable for home-assistant robots that typically have some level of semantic knowledge of the environment…
▽ More
Home-assistant robots have been a long-standing research topic, and one of the biggest challenges is searching for required objects in housing environments. Previous object-goal navigation requires the robot to search for a target object category in an unexplored environment, which may not be suitable for home-assistant robots that typically have some level of semantic knowledge of the environment, such as the location of static furniture. In our approach, we leverage this knowledge and the fact that a target object may be located close to its related objects for efficient navigation. To achieve this, we train a graph neural network using the Visual Genome dataset to learn the object co-occurrence relationships and formulate the searching process as iteratively predicting the possible areas where the target object may be located. This approach is entirely zero-shot, meaning it doesn't require new accurate object correlation in the test environment. We empirically show that our method outperforms prior correlational object search algorithms. As our ultimate goal is to build fully autonomous assistant robots for everyday use, we further integrate the task planner for parsing natural language and generating task-completing plans with object navigation to execute human instructions. We demonstrate the effectiveness of our proposed pipeline in both the AI2-THOR simulator and a Stretch robot in a real-world environment.
△ Less
Submitted 10 March, 2023;
originally announced March 2023.
-
KGNv2: Separating Scale and Pose Prediction for Keypoint-based 6-DoF Grasp Synthesis on RGB-D input
Authors:
Yiye Chen,
Ruinian Xu,
Yunzhi Lin,
Hongyi Chen,
Patricio A. Vela
Abstract:
We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image spa…
▽ More
We propose a new 6-DoF grasp pose synthesis approach from 2D/2.5D input based on keypoints. Keypoint-based grasp detector from image input has demonstrated promising results in the previous study, where the additional visual information provided by color images compensates for the noisy depth perception. However, it relies heavily on accurately predicting the location of keypoints in the image space. In this paper, we devise a new grasp generation network that reduces the dependency on precise keypoint estimation. Given an RGB-D input, our network estimates both the grasp pose from keypoint detection as well as scale towards the camera. We further re-design the keypoint output space in order to mitigate the negative impact of keypoint prediction noise to Perspective-n-Point (PnP) algorithm. Experiments show that the proposed method outperforms the baseline by a large margin, validating the efficacy of our approach. Finally, despite trained on simple synthetic objects, our method demonstrate sim-to-real capacity by showing competitive results in real-world robot experiments.
△ Less
Submitted 1 May, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Improving Graph Neural Networks at Scale: Combining Approximate PageRank and CoreRank
Authors:
Ariel R. Ramos Vela,
Johannes F. Lutzeyer,
Anastasios Giovanidis,
Michalis Vazirgiannis
Abstract:
Graph Neural Networks (GNNs) have achieved great successes in many learning tasks performed on graph structures. Nonetheless, to propagate information GNNs rely on a message passing scheme which can become prohibitively expensive when working with industrial-scale graphs. Inspired by the PPRGo model, we propose the CorePPR model, a scalable solution that utilises a learnable convex combination of…
▽ More
Graph Neural Networks (GNNs) have achieved great successes in many learning tasks performed on graph structures. Nonetheless, to propagate information GNNs rely on a message passing scheme which can become prohibitively expensive when working with industrial-scale graphs. Inspired by the PPRGo model, we propose the CorePPR model, a scalable solution that utilises a learnable convex combination of the approximate personalised PageRank and the CoreRank to diffuse multi-hop neighbourhood information in GNNs. Additionally, we incorporate a dynamic mechanism to select the most influential neighbours for a particular node which reduces training time while preserving the performance of the model. Overall, we demonstrate that CorePPR outperforms PPRGo, particularly on large graphs where selecting the most influential nodes is particularly relevant for scalability. Our code is publicly available at: https://github.com/arielramos97/CorePPR.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Parallel Inversion of Neural Radiance Fields for Robust Pose Estimation
Authors:
Yunzhi Lin,
Thomas Müller,
Jonathan Tremblay,
Bowen Wen,
Stephen Tyree,
Alex Evans,
Patricio A. Vela,
Stan Birchfield
Abstract:
We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a moment…
▽ More
We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a momentum-based camera extrinsic optimization procedure into Instant Neural Graphics Primitives, a recent exceptionally fast NeRF implementation. By introducing parallel Monte Carlo sampling into the pose estimation task, our method overcomes local minima and improves efficiency in a more extensive search space. We also show the importance of adopting a more robust pixel-based loss function to reduce error. Experiments demonstrate that our method can achieve improved generalization and robustness on both synthetic and real-world benchmarks.
△ Less
Submitted 10 March, 2023; v1 submitted 18 October, 2022;
originally announced October 2022.
-
Geometry of Radial Basis Neural Networks for Safety Biased Approximation of Unsafe Regions
Authors:
Ahmad Abuaish,
Mohit Srinivasan,
Patricio A. Vela
Abstract:
Barrier function-based inequality constraints are a means to enforce safety specifications for control systems. When used in conjunction with a convex optimization program, they provide a computationally efficient method to enforce safety for the general class of control-affine systems. One of the main assumptions when taking this approach is the a priori knowledge of the barrier function itself,…
▽ More
Barrier function-based inequality constraints are a means to enforce safety specifications for control systems. When used in conjunction with a convex optimization program, they provide a computationally efficient method to enforce safety for the general class of control-affine systems. One of the main assumptions when taking this approach is the a priori knowledge of the barrier function itself, i.e., knowledge of the safe set. In the context of navigation through unknown environments where the locally safe set evolves with time, such knowledge does not exist. This manuscript focuses on the synthesis of a zeroing barrier function characterizing the safe set based on safe and unsafe sample measurements, e.g., from perception data in navigation applications. Prior work formulated a supervised machine learning algorithm whose solution guaranteed the construction of a zeroing barrier function with specific level-set properties. However, it did not explore the geometry of the neural network design used for the synthesis process. This manuscript describes the specific geometry of the neural network used for zeroing barrier function synthesis, and shows how the network provides the necessary representation for splitting the state space into safe and unsafe regions.
△ Less
Submitted 28 March, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
Dynamic Gap: Safe Gap-based Navigation in Dynamic Environments
Authors:
Max Asselmeier,
Dhruv Ahuja,
Abdel Zaro,
Ahmad Abuaish,
Ye Zhao,
Patricio A. Vela
Abstract:
This paper extends the family of gap-based local planners to unknown dynamic environments through generating provable collision-free properties for hierarchical navigation systems. Existing perception-informed local planners that operate in dynamic environments rely on emergent or empirical robustness for collision avoidance as opposed to performing formal analysis of dynamic obstacles. In additio…
▽ More
This paper extends the family of gap-based local planners to unknown dynamic environments through generating provable collision-free properties for hierarchical navigation systems. Existing perception-informed local planners that operate in dynamic environments rely on emergent or empirical robustness for collision avoidance as opposed to performing formal analysis of dynamic obstacles. In addition to this, the obstacle tracking that is performed in these existent planners is often achieved with respect to a global inertial frame, subjecting such tracking estimates to transformation errors from odometry drift. The proposed local planner, dynamic gap, shifts the tracking paradigm to modeling how the free space, represented as gaps, evolves over time. Gap crossing and closing conditions are developed to aid in determining the feasibility of passage through gaps, and a breadth of simulation benchmarking is performed against other navigation planners in the literature where the proposed dynamic gap planner achieves the highest success rate out of all planners tested in all environments.
△ Less
Submitted 18 September, 2024; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Keypoint-Based Category-Level Object Pose Tracking from an RGB Sequence with Uncertainty Estimation
Authors:
Yunzhi Lin,
Jonathan Tremblay,
Stephen Tyree,
Patricio A. Vela,
Stan Birchfield
Abstract:
We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributio…
▽ More
We propose a single-stage, category-level 6-DoF pose estimation algorithm that simultaneously detects and tracks instances of objects within a known category. Our method takes as input the previous and current frame from a monocular RGB video, as well as predictions from the previous frame, to predict the bounding cuboid and 6-DoF pose (up to scale). Internally, a deep network predicts distributions over object keypoints (vertices of the bounding cuboid) in image coordinates, after which a novel probabilistic filtering process integrates across estimates before computing the final pose using PnP. Our framework allows the system to take previous uncertainties into consideration when predicting the current frame, resulting in predictions that are more accurate and stable than single frame methods. Extensive experiments show that our method outperforms existing approaches on the challenging Objectron benchmark of annotated object videos. We also demonstrate the usability of our work in an augmented reality setting.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
In-Place Rotation for Enhancing Snake-like Robot Mobility
Authors:
Alexander H. Chang,
Patricio A. Vela
Abstract:
Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described b…
▽ More
Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described by a standing wave. Presumed viscous robot-ground frictional interactions lead to body dynamics conditioned on the time-varying shape model. The dynamic equations describing the Turn-in-Place gait are validated by an articulated snake-like robot using a physics-based simulator and a physical robot. The results affirm the shape-centric modeling framework's capacity to model a variety of snake-like robot gaits with fundamentally different body-ground contact patterns. As an applied demonstration, example locomotion scenarios partner the shape-centric Turn-in-Place gait with a Rectilinear gait for maneuvering through constrained environments based on a multi-modal locomotive planning strategy. Unified shape-centric modeling facilitates trajectory planning and tracking for a snake-like robot to successfully negotiate non-trivial obstacle configurations.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following
Authors:
Ruinian Xu,
Hongyi Chen,
Yunzhi Lin,
Patricio A. Vela
Abstract:
This paper investigates robot manipulation based on human instruction with ambiguous requests. The intent is to compensate for imperfect natural language via visual observations. Early symbolic methods, based on manually defined symbols, built modular framework consist of semantic parsing and task planning for producing sequences of actions from natural language requests. Modern connectionist meth…
▽ More
This paper investigates robot manipulation based on human instruction with ambiguous requests. The intent is to compensate for imperfect natural language via visual observations. Early symbolic methods, based on manually defined symbols, built modular framework consist of semantic parsing and task planning for producing sequences of actions from natural language requests. Modern connectionist methods employ deep neural networks to automatically learn visual and linguistic features and map to a sequence of low-level actions, in an endto-end fashion. These two approaches are blended to create a hybrid, modular framework: it formulates instruction following as symbolic goal learning via deep neural networks followed by task planning via symbolic planners. Connectionist and symbolic modules are bridged with Planning Domain Definition Language. The vision-and-language learning network predicts its goal representation, which is sent to a planner for producing a task-completing action sequence. For improving the flexibility of natural language, we further incorporate implicit human intents with explicit human instructions. To learn generic features for vision and language, we propose to separately pretrain vision and language encoders on scene graph parsing and semantic textual similarity tasks. Benchmarking evaluates the impacts of different components of, or options for, the vision-and-language learning model and shows the effectiveness of pretraining strategies. Manipulation experiments conducted in the simulator AI2THOR show the robustness of the framework to novel scenarios.
△ Less
Submitted 25 February, 2022;
originally announced February 2022.
-
Impacts of Students Academic Performance Trajectories on Final Academic Success
Authors:
Shahab Boumi,
Adan Vela
Abstract:
Many studies in the field of education analytics have identified student grade point averages (GPA) as an important indicator and predictor of students' final academic outcomes (graduate or halt). And while semester-to-semester fluctuations in GPA are considered normal, significant changes in academic performance may warrant more thorough investigation and consideration, particularly with regards…
▽ More
Many studies in the field of education analytics have identified student grade point averages (GPA) as an important indicator and predictor of students' final academic outcomes (graduate or halt). And while semester-to-semester fluctuations in GPA are considered normal, significant changes in academic performance may warrant more thorough investigation and consideration, particularly with regards to final academic outcomes. However, such an approach is challenging due to the difficulties of representing complex academic trajectories over an academic career. In this study, we apply a Hidden Markov Model (HMM) to provide a standard and intuitive classification over students' academic-performance levels, which leads to a compact representation of academic-performance trajectories. Next, we explore the relationship between different academic-performance trajectories and their correspondence to final academic success. Based on student transcript data from University of Central Florida, our proposed HMM is trained using sequences of students' course grades for each semester. Through the HMM, our analysis follows the expected finding that higher academic performance levels correlate with lower halt rates. However, in this paper, we identify that there exist many scenarios in which both improving or worsening academic-performance trajectories actually correlate to higher graduation rates. This counter-intuitive finding is made possible through the proposed and developed HMM model.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
Primitive Shape Recognition for Object Grasping
Authors:
Yunzhi Lin,
Chao Tang,
Fu-Jen Chu,
Ruinian Xu,
Patricio A. Vela
Abstract:
Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes a…
▽ More
Shape informs how an object should be grasped, both in terms of where and how. As such, this paper describes a segmentation-based architecture for decomposing objects sensed with a depth camera into multiple primitive shapes, along with a post-processing pipeline for robotic grasping. Segmentation employs a deep network, called PS-CNN, trained on synthetic data with 6 classes of primitive shapes and generated using a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape region. The grasps are rank ordered, with the first feasible one chosen for execution. For task-free grasping of individual objects, the method achieves a 94.2% success rate placing it amongst the top performing grasp methods when compared to top-down and SE(3)-based approaches. Additional tests involving variable viewpoints and clutter demonstrate robustness to setup. For task-oriented grasping, PS-CNN achieves a 93.0% success rate. Overall, the outcomes support the hypothesis that explicitly encoding shape primitives within a grasping pipeline should boost grasping performance, including task-free and task-relevant grasp prediction.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Single-Stage Keypoint-Based Category-Level Object Pose Estimation from an RGB Image
Authors:
Yunzhi Lin,
Jonathan Tremblay,
Stephen Tyree,
Patricio A. Vela,
Stan Birchfield
Abstract:
Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for cate…
▽ More
Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6-DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1% higher than the related two-stage approach).
△ Less
Submitted 12 May, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
GKNet: grasp keypoint network for grasp candidates detection
Authors:
Ruinian Xu,
Fu-Jen Chu,
Patricio A. Vela
Abstract:
Contemporary grasp detection approaches employ deep learning to achieve robustness to sensor and object model uncertainty. The two dominant approaches design either grasp-quality scoring or anchor-based grasp recognition networks. This paper presents a different approach to grasp detection by treating it as keypoint detection in image-space. The deep network detects each grasp candidate as a pair…
▽ More
Contemporary grasp detection approaches employ deep learning to achieve robustness to sensor and object model uncertainty. The two dominant approaches design either grasp-quality scoring or anchor-based grasp recognition networks. This paper presents a different approach to grasp detection by treating it as keypoint detection in image-space. The deep network detects each grasp candidate as a pair of keypoints, convertible to the grasp representationg = {x, y, w, Īø} T , rather than a triplet or quartet of corner points. Decreasing the detection difficulty by grouping keypoints into pairs boosts performance. To promote capturing dependencies between keypoints, a non-local module is incorporated into the network design. A final filtering strategy based on discrete and continuous orientation prediction removes false correspondences and further improves grasp detection performance. GKNet, the approach presented here, achieves a good balance between accuracy and speed on the Cornell and the abridged Jacquard datasets (96.9% and 98.39% at 41.67 and 23.26 fps). Follow-up experiments on a manipulator evaluate GKNet using 4 types of grasping experiments reflecting different nuisance sources: static grasping, dynamic grasping, grasping at varied camera angles, and bin picking. GKNet outperforms reference baselines in static and dynamic grasping experiments while showing robustness to varied camera viewpoints and moderate clutter. The results confirm the hypothesis that grasp keypoints are an effective output representation for deep grasp networks that provide robustness to expected nuisance factors.
△ Less
Submitted 14 December, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
A Joint Network for Grasp Detection Conditioned on Natural Language Commands
Authors:
Yiye Chen,
Ruinian Xu,
Yunzhi Lin,
Patricio A. Vela
Abstract:
We consider the task of grasping a target object based on a natural language command query. Previous work primarily focused on localizing the object given the query, which requires a separate grasp detection module to grasp it. The cascaded application of two pipelines incurs errors in overlapping multi-object cases due to ambiguity in the individual outputs. This work proposes a model named Comma…
▽ More
We consider the task of grasping a target object based on a natural language command query. Previous work primarily focused on localizing the object given the query, which requires a separate grasp detection module to grasp it. The cascaded application of two pipelines incurs errors in overlapping multi-object cases due to ambiguity in the individual outputs. This work proposes a model named Command Grasping Network(CGNet) to directly output command satisficing grasps from RGB image and textual command inputs. A dataset with ground truth (image, command, grasps) tuple is generated based on the VMRD dataset to train the proposed network. Experimental results on the generated test set show that CGNet outperforms a cascaded object-retrieval and grasp detection baseline by a large margin. Three physical experiments demonstrate the functionality and performance of CGNet.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
Multi-View Fusion for Multi-Level Robotic Scene Understanding
Authors:
Yunzhi Lin,
Jonathan Tremblay,
Stephen Tyree,
Patricio A. Vela,
Stan Birchfield
Abstract:
We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); an…
▽ More
We present a system for multi-level scene awareness for robotic manipulation. Given a sequence of camera-in-hand RGB images, the system calculates three types of information: 1) a point cloud representation of all the surfaces in the scene, for the purpose of obstacle avoidance; 2) the rough pose of unknown objects from categories corresponding to primitive shapes (e.g., cuboids and cylinders); and 3) full 6-DoF pose of known objects. By developing and fusing recent techniques in these domains, we provide a rich scene representation for robot awareness. We demonstrate the importance of each of these modules, their complementary nature, and the potential benefits of the system in the context of robotic manipulation.
△ Less
Submitted 14 October, 2021; v1 submitted 24 March, 2021;
originally announced March 2021.
-
Potential Gap: Using Reactive Policies to Guarantee Safe Navigation
Authors:
Ruoyang Xu,
Shiyu Feng,
Patricio A. Vela
Abstract:
This paper considers the integration of gap-based local navigation methods with artificial potential field (APF) methods to derive a local planning module for hierarchical navigation systems that has provable collision-free properties. Given that APF theory applies to idealized robot models, the provable properties are lost when applied to more realistic models. We describe a set of algorithm modi…
▽ More
This paper considers the integration of gap-based local navigation methods with artificial potential field (APF) methods to derive a local planning module for hierarchical navigation systems that has provable collision-free properties. Given that APF theory applies to idealized robot models, the provable properties are lost when applied to more realistic models. We describe a set of algorithm modifications that correct for these errors and enhance robustness to non-ideal models. Central to the construction of the local planner is the use of sensory-derived local free-space models that detect gaps and use them for the synthesis of the APF. Modifications are given for a nonholonomic robot model. Integration of the local planner, called potential gap, into a hierarchical navigation system provides the local goals and trajectories needed for collision-free navigation through unknown environments. Monte Carlo experiments in benchmark worlds confirm the asserted safety and robustness properties by testing under various robot models.
△ Less
Submitted 21 March, 2021;
originally announced March 2021.
-
NavTuner: Learning a Scene-Sensitive Family of Navigation Policies
Authors:
Haoxin Ma,
Justin S. Smith,
Patricio A. Vela
Abstract:
The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation modul…
▽ More
The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation module. Having adaptive parameters provides the navigation module with a family of policies that can be dynamically reconfigured based on the local scene structure, and addresses the common assertion in machine learning that engineered solutions are inflexible. Of the methods tested, reinforcement learning (RL) is shown to provide a significant performance boost to a modern navigation method through reduced sensitivity of its success rate to environmental clutter. The outcomes indicate that RL as a meta-policy learner, or dynamic parameter tuner, effectively robustifies algorithms sensitive to external, measurable nuisance factors.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Image-Based Trajectory Tracking through Unknown Environments without Absolute Positioning
Authors:
Shiyu Feng,
Zixuan Wu,
Yipu Zhao,
Patricio A. Vela
Abstract:
This paper describes a stereo image-based visual servoing system for trajectory tracking by a non-holonomic robot without externally derived pose information nor a known visual map of the environment. It is called trajectory servoing. The critical component is a feature-based, indirect Simultaneous Localization And Mapping (SLAM) method to provide a pool of available features with estimated depth,…
▽ More
This paper describes a stereo image-based visual servoing system for trajectory tracking by a non-holonomic robot without externally derived pose information nor a known visual map of the environment. It is called trajectory servoing. The critical component is a feature-based, indirect Simultaneous Localization And Mapping (SLAM) method to provide a pool of available features with estimated depth, so that they may be propagated forward in time to generate image feature trajectories for visual servoing. Short and long distance experiments show the benefits of trajectory servoing for navigating unknown areas without absolute positioning. Empirically, trajectory servoing has better trajectory tracking performance than pose-based feedback when both rely on the same underlying SLAM system.
△ Less
Submitted 14 June, 2022; v1 submitted 26 February, 2021;
originally announced March 2021.
-
Good Graph to Optimize: Cost-Effective, Budget-Aware Bundle Adjustment in Visual SLAM
Authors:
Yipu Zhao,
Justin S. Smith,
Patricio A. Vela
Abstract:
The cost-efficiency of visual(-inertial) SLAM (VSLAM) is a critical characteristic of resource-limited applications. While hardware and algorithm advances have been significantly improved the cost-efficiency of VSLAM front-ends, the cost-efficiency of VSLAM back-ends remains a bottleneck. This paper describes a novel, rigorous method to improve the cost-efficiency of local BA in a BA-based VSLAM b…
▽ More
The cost-efficiency of visual(-inertial) SLAM (VSLAM) is a critical characteristic of resource-limited applications. While hardware and algorithm advances have been significantly improved the cost-efficiency of VSLAM front-ends, the cost-efficiency of VSLAM back-ends remains a bottleneck. This paper describes a novel, rigorous method to improve the cost-efficiency of local BA in a BA-based VSLAM back-end. An efficient algorithm, called Good Graph, is developed to select size-reduced graphs optimized in local BA with condition preservation. To better suit BA-based VSLAM back-ends, the Good Graph predicts future estimation needs, dynamically assigns an appropriate size budget, and selects a condition-maximized subgraph for BA estimation. Evaluations are conducted on two scenarios: 1) VSLAM as standalone process, and 2) VSLAM as part of closed-loop navigation system. Results from the first scenario show Good Graph improves accuracy and robustness of VSLAM estimation, when computational limits exist. Results from the second scenario, indicate that Good Graph benefits the trajectory tracking performance of VSLAM-based closed-loop navigation systems, which is a primary application of VSLAM.
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Quantifying the relationship between student enrollment patterns and student performance
Authors:
Shahab Boumi,
Adan Vela,
Jacquelyn Chini
Abstract:
Simplified categorizations have often led to college students being labeled as full-time or part-time students. However, at many universities student enrollment patterns can be much more complicated, as it is not uncommon for students to alternate between full-time and part-time enrollment each semester based on finances, scheduling, or family needs. While prior research has established full-time…
▽ More
Simplified categorizations have often led to college students being labeled as full-time or part-time students. However, at many universities student enrollment patterns can be much more complicated, as it is not uncommon for students to alternate between full-time and part-time enrollment each semester based on finances, scheduling, or family needs. While prior research has established full-time students maintain better outcomes then their part-time counterparts, limited study has examined the impact of enrollment patterns or strategies on academic outcomes. In this paper, we applying a Hidden Markov Model to identify and cluster students' enrollment strategies into three different categorizes: full-time, part-time, and mixed-enrollment strategies. Based the enrollment strategies we investigate and compare the academic performance outcomes of each group, taking into account differences between first-time-in-college students and transfer students. Analysis of data collected from the University of Central Florida from 2008 to 2017 indicates that first-time-in-college students that apply a mixed enrollment strategy are closer in performance to full-time students, as compared to part-time students. More importantly, during their part-time semesters, mixed-enrollment students significantly outperform part-time students. Similarly, analysis of transfer students shows that a mixed-enrollment strategy is correlated a similar graduation rates as the full-time enrollment strategy, and more than double the graduation rate associated with part-time enrollment. Such a finding suggests that increased engagement through the occasional full-time enrollment leads to better overall outcomes.
△ Less
Submitted 7 November, 2020; v1 submitted 21 March, 2020;
originally announced March 2020.
-
Closed-Loop Benchmarking of Stereo Visual-Inertial SLAM Systems: Understanding the Impact of Drift and Latency on Tracking Accuracy
Authors:
Yipu Zhao,
Justin S. Smith,
Sambhu H. Karumanchi,
Patricio A. Vela
Abstract:
Visual-inertial SLAM is essential for robot navigation in GPS-denied environments, e.g. indoor, underground. Conventionally, the performance of visual-inertial SLAM is evaluated with open-loop analysis, with a focus on the drift level of SLAM systems. In this paper, we raise the question on the importance of visual estimation latency in closed-loop navigation tasks, such as accurate trajectory tra…
▽ More
Visual-inertial SLAM is essential for robot navigation in GPS-denied environments, e.g. indoor, underground. Conventionally, the performance of visual-inertial SLAM is evaluated with open-loop analysis, with a focus on the drift level of SLAM systems. In this paper, we raise the question on the importance of visual estimation latency in closed-loop navigation tasks, such as accurate trajectory tracking. To understand the impact of both drift and latency on visual-inertial SLAM systems, a closed-loop benchmarking simulation is conducted, where a robot is commanded to follow a desired trajectory using the feedback from visual-inertial estimation. By extensively evaluating the trajectory tracking performance of representative state-of-the-art visual-inertial SLAM systems, we reveal the importance of latency reduction in visual estimation module of these systems. The findings suggest directions of future improvements for visual-inertial SLAM.
△ Less
Submitted 7 March, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.
-
Good Feature Matching: Towards Accurate, Robust VO/VSLAM with Low Latency
Authors:
Yipu Zhao,
Patricio A. Vela
Abstract:
Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency). Feature-based systems exhibit good performance, yet have higher latency due to explicit data association; direct & semidirect systems have lower latency, but are inapplicable in some target scenarios or exhibit lower accuracy than feature-based ones. This paper aims…
▽ More
Analysis of state-of-the-art VO/VSLAM system exposes a gap in balancing performance (accuracy & robustness) and efficiency (latency). Feature-based systems exhibit good performance, yet have higher latency due to explicit data association; direct & semidirect systems have lower latency, but are inapplicable in some target scenarios or exhibit lower accuracy than feature-based ones. This paper aims to fill the performance-efficiency gap with an enhancement applied to feature-based VSLAM. We present good feature matching, an active map-to-frame feature matching method. Feature matching effort is tied to submatrix selection, which has combinatorial time complexity and requires choosing a scoring metric. Via simulation, the Max-logDet matrix revealing metric is shown to perform best. For real-time applicability, the combination of deterministic selection and randomized acceleration is studied. The proposed algorithm is integrated into monocular & stereo feature-based VSLAM systems. Extensive evaluations on multiple benchmarks and compute hardware quantify the latency reduction and the accuracy & robustness preservation.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping
Authors:
Yunzhi Lin,
Chao Tang,
Fu-Jen Chu,
Patricio A. Vela
Abstract:
A segmentation-based architecture is proposed to decompose objects into multiple primitive shapes from monocular depth input for robotic manipulation. The backbone deep network is trained on synthetic data with 6 classes of primitive shapes generated by a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candida…
▽ More
A segmentation-based architecture is proposed to decompose objects into multiple primitive shapes from monocular depth input for robotic manipulation. The backbone deep network is trained on synthetic data with 6 classes of primitive shapes generated by a simulation engine. Each primitive shape is designed with parametrized grasp families, permitting the pipeline to identify multiple grasp candidates per shape primitive region. The grasps are priority ordered via proposed ranking algorithm, with the first feasible one chosen for execution. On task-free grasping of individual objects, the method achieves a 94% success rate. On task-oriented grasping, it achieves a 76% success rate. Overall, the method supports the hypothesis that shape primitives can support task-free and task-relevant grasp prediction.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
Recognizing Object Affordances to Support Scene Reasoning for Manipulation Tasks
Authors:
Fu-Jen Chu,
Ruinian Xu,
Chao Tang,
Patricio A. Vela
Abstract:
Affordance information about a scene provides important clues as to what actions may be executed in pursuit of meeting a specified goal state. Thus, integrating affordance-based reasoning into symbolic action plannning pipelines would enhance the flexibility of robot manipulation. Unfortunately, the top performing affordance recognition methods use object category priors to boost the accuracy of a…
▽ More
Affordance information about a scene provides important clues as to what actions may be executed in pursuit of meeting a specified goal state. Thus, integrating affordance-based reasoning into symbolic action plannning pipelines would enhance the flexibility of robot manipulation. Unfortunately, the top performing affordance recognition methods use object category priors to boost the accuracy of affordance detection and segmentation. Object priors limit generalization to unknown object categories. This paper describes an affordance recognition pipeline based on a category-agnostic region proposal network for proposing instance regions of an image across categories. To guide affordance learning in the absence of category priors, the training process includes the auxiliary task of explicitly inferencing existing affordances within a proposal. Secondly, a self-attention mechanism trained to interpret each proposal learns to capture rich contextual dependencies through the region. Visual benchmarking shows that the trained network, called AffContext, reduces the performance gap between object-agnostic and object-informed affordance recognition. AffContext is linked to the Planning Domain Definition Language (PDDL) with an augmented state keeper for action planning across temporally spaced goal-oriented tasks. Manipulation experiments show that AffContext can successfully parse scene content to seed a symbolic planner problem specification, whose execution completes the target task. Additionally, task-oriented grasping for cutting and pounding actions demonstrate the exploitation of multiple affordances for a given object to complete specified tasks.
△ Less
Submitted 12 September, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Autonomous, Monocular, Vision-Based Snake Robot Navigation and Traversal of Cluttered Environments using Rectilinear Gait Motion
Authors:
Alexander H. Chang,
Shiyu Feng,
Yipu Zhao,
Justin S. Smith,
Patricio A. Vela
Abstract:
Rectilinear forms of snake-like robotic locomotion are anticipated to be an advantage in obstacle-strewn scenarios characterizing urban disaster zones, subterranean collapses, and other natural environments. The elongated, laterally-narrow footprint associated with these motion strategies is well-suited to traversal of confined spaces and narrow pathways. Navigation and path planning in the absenc…
▽ More
Rectilinear forms of snake-like robotic locomotion are anticipated to be an advantage in obstacle-strewn scenarios characterizing urban disaster zones, subterranean collapses, and other natural environments. The elongated, laterally-narrow footprint associated with these motion strategies is well-suited to traversal of confined spaces and narrow pathways. Navigation and path planning in the absence of global sensing, however, remains a pivotal challenge to be addressed prior to practical deployment of these robotic mechanisms. Several challenges related to visual processing and localization need to be resolved to to enable navigation. As a first pass in this direction, we equip a wireless, monocular color camera to the head of a robotic snake. Visiual odometry and mapping from ORB-SLAM permits self-localization in planar, obstacle-strewn environments. Ground plane traversability segmentation in conjunction with perception-space collision detection permits path planning for navigation. A previously presented dynamical reduction of rectilinear snake locomotion to a non-holonomic kinematic vehicle informs both SLAM and planning. The simplified motion model is then applied to track planned trajectories through an obstacle configuration. This navigational framework enables a snake-like robotic platform to autonomously navigate and traverse unknown scenarios with only monocular vision.
△ Less
Submitted 19 August, 2019;
originally announced August 2019.
-
Characterizing SLAM Benchmarks and Methods for the Robust Perception Age
Authors:
Wenkai Ye,
Yipu Zhao,
Patricio A. Vela
Abstract:
The diversity of SLAM benchmarks affords extensive testing of SLAM algorithms to understand their performance, individually or in relative terms. The ad-hoc creation of these benchmarks does not necessarily illuminate the particular weak points of a SLAM algorithm when performance is evaluated. In this paper, we propose to use a decision tree to identify challenging benchmark properties for state-…
▽ More
The diversity of SLAM benchmarks affords extensive testing of SLAM algorithms to understand their performance, individually or in relative terms. The ad-hoc creation of these benchmarks does not necessarily illuminate the particular weak points of a SLAM algorithm when performance is evaluated. In this paper, we propose to use a decision tree to identify challenging benchmark properties for state-of-the-art SLAM algorithms and important components within the SLAM pipeline regarding their ability to handle these challenges. Establishing what factors of a particular sequence lead to track failure or degradation relative to these characteristics is important if we are to arrive at a strong understanding for the core computational needs of a robust SLAM algorithm. Likewise, we argue that it is important to profile the computational performance of the individual SLAM components for use when benchmarking. In particular, we advocate the use of time-dilation during ROS bag playback, or what we refer to as slo-mo playback. Using slo-mo to benchmark SLAM instantiations can provide clues to how SLAM implementations should be improved at the computational component level. Three prevalent VO/SLAM algorithms and two low-latency algorithms of our own are tested on selected typical sequences, which are generated from benchmark characterization, to further demonstrate the benefits achieved from computationally efficient components.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
Good Feature Selection for Least Squares Pose Optimization in VO/VSLAM
Authors:
Yipu Zhao,
Patricio A. Vela
Abstract:
This paper aims to select features that contribute most to the pose estimation in VO/VSLAM. Unlike existing feature selection works that are focused on efficiency only, our method significantly improves the accuracy of pose tracking, while introducing little overhead. By studying the impact of feature selection towards least squares pose optimization, we demonstrate the applicability of improving…
▽ More
This paper aims to select features that contribute most to the pose estimation in VO/VSLAM. Unlike existing feature selection works that are focused on efficiency only, our method significantly improves the accuracy of pose tracking, while introducing little overhead. By studying the impact of feature selection towards least squares pose optimization, we demonstrate the applicability of improving accuracy via good feature selection. To that end, we introduce the Max-logDet metric to guide the feature selection, which is connected to the conditioning of least squares pose optimization problem. We then describe an efficient algorithm for approximately solving the NP-hard Max-logDet problem. Integrating Max-logDet feature selection into a state-of-the-art visual SLAM system leads to accuracy improvements with low overhead, as demonstrated via evaluation on a public benchmark.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
Low-latency Visual SLAM with Appearance-Enhanced Local Map Building
Authors:
Yipu Zhao,
Wenkai Ye,
Patricio A. Vela
Abstract:
A local map module is often implemented in modern VO/VSLAM systems to improve data association and pose estimation. Conventionally, the local map contents are determined by co-visibility. While co-visibility is cheap to establish, it utilizes the relatively-weak temporal prior (i.e. seen before, likely to be seen now), therefore admitting more features into the local map than necessary. This paper…
▽ More
A local map module is often implemented in modern VO/VSLAM systems to improve data association and pose estimation. Conventionally, the local map contents are determined by co-visibility. While co-visibility is cheap to establish, it utilizes the relatively-weak temporal prior (i.e. seen before, likely to be seen now), therefore admitting more features into the local map than necessary. This paper describes an enhancement to co-visibility local map building by incorporating a strong appearance prior, which leads to a more compact local map and latency reduction in downstream data association. The appearance prior collected from the current image influences the local map contents: only the map features visually similar to the current measurements are potentially useful for data association. To that end, mapped features are indexed and queried with Multi-index Hashing (MIH). An online hash table selection algorithm is developed to further reduce the query overhead of MIH and the local map size. The proposed appearance-based local map building method is integrated into a state-of-the-art VO/VSLAM system. When evaluated on two public benchmarks, the size of the local map, as well as the latency of real-time pose tracking in VO/VSLAM are significantly reduced. Meanwhile, the VO/VSLAM mean performance is preserved or improves.
△ Less
Submitted 19 May, 2019;
originally announced May 2019.
-
Inferring demand from partially observed data to address the mismatch between demand and supply of taxis in the presence of rain
Authors:
Seyyed Yousef Oleyaei-Motlagh,
Adan Ernesto Vela
Abstract:
Analyzing mismatch in supply and demand of taxis is an important effort to understand passengers' demand. In this paper, we have analyzed the effect of rain on the demand for yellow taxis in city-wide as well as in a point of interest in New York City. Because a pickup event is a realized demand, we studied empty travel time, the number of pickups per driver, the average amount of income per drive…
▽ More
Analyzing mismatch in supply and demand of taxis is an important effort to understand passengers' demand. In this paper, we have analyzed the effect of rain on the demand for yellow taxis in city-wide as well as in a point of interest in New York City. Because a pickup event is a realized demand, we studied empty travel time, the number of pickups per driver, the average amount of income per drive indices to infer demand from taxis data of 2013. Findings highlight that the higher demand exists because of many short-trips during the rain. This paper illustrates the change in passengers' demand increased by the onset of weather condition.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Real-world Multi-object, Multi-grasp Detection
Authors:
Fu-Jen Chu,
Ruinian Xu,
Patricio A. Vela
Abstract:
A deep learning architecture is proposed to predict graspable locations for robotic manipulation. It considers situations where no, one, or multiple object(s) are seen. By defining the learning problem to be classification with null hypothesis competition instead of regression, the deep neural network with RGB-D image input predicts multiple grasp candidates for a single object or multiple objects…
▽ More
A deep learning architecture is proposed to predict graspable locations for robotic manipulation. It considers situations where no, one, or multiple object(s) are seen. By defining the learning problem to be classification with null hypothesis competition instead of regression, the deep neural network with RGB-D image input predicts multiple grasp candidates for a single object or multiple objects, in a single shot. The method outperforms state-of-the-art approaches on the Cornell dataset with 96.0% and 96.1% accuracy on image-wise and object- wise splits, respectively. Evaluation on a multi-object dataset illustrates the generalization capability of the architecture. Grasping experiments achieve 96.0% grasp localization and 88.0% grasping success rates on a test set of household objects. The real-time process takes less than .25 s from image to plan.
△ Less
Submitted 20 July, 2018; v1 submitted 1 February, 2018;
originally announced February 2018.
-
The Helping Hand: An Assistive Manipulation Framework Using Augmented Reality and a Tongue-Drive Interfaces
Authors:
Fu-Jen Chu,
Ruinian Xu,
Zhenxuan Zhang,
Patricio A. Vela,
Maysam Ghovanloo
Abstract:
A human-in-the-loop system is proposed to enable collaborative manipulation tasks for person with physical disabilities. Studies show that the cognitive burden of subject reduces with increased autonomy of assistive system. Our framework obtains high-level intent from the user to specify manipulation tasks. The system processes sensor input to interpret the user's environment. Augmented reality gl…
▽ More
A human-in-the-loop system is proposed to enable collaborative manipulation tasks for person with physical disabilities. Studies show that the cognitive burden of subject reduces with increased autonomy of assistive system. Our framework obtains high-level intent from the user to specify manipulation tasks. The system processes sensor input to interpret the user's environment. Augmented reality glasses provide ego-centric visual feedback of the interpretation and summarize robot affordances on a menu. A tongue drive system serves as the input modality for triggering a robotic arm to execute the tasks. Assistance experiments compare the system to Cartesian control and to state-of-the-art approaches. Our system achieves competitive results with faster completion time by simplifying manipulation tasks.
△ Less
Submitted 24 August, 2018; v1 submitted 1 February, 2018;
originally announced February 2018.
-
Learning to Navigate: Exploiting Deep Networks to Inform Sample-Based Planning During Vision-Based Navigation
Authors:
Justin S. Smith,
Jin-Ha Hwang,
Fu-Jen Chu,
Patricio A. Vela
Abstract:
Recent applications of deep learning to navigation have generated end-to-end navigation solutions whereby visual sensor input is mapped to control signals or to motion primitives. The resulting visual navigation strategies work very well at collision avoidance and have performance that matches traditional reactive navigation algorithms while operating in real-time. It is accepted that these soluti…
▽ More
Recent applications of deep learning to navigation have generated end-to-end navigation solutions whereby visual sensor input is mapped to control signals or to motion primitives. The resulting visual navigation strategies work very well at collision avoidance and have performance that matches traditional reactive navigation algorithms while operating in real-time. It is accepted that these solutions cannot provide the same level of performance as a global planner. However, it is less clear how such end-to-end systems should be integrated into a full navigation pipeline. We evaluate the typical end-to-end solution within a full navigation pipeline in order to expose its weaknesses. Doing so illuminates how to better integrate deep learning methods into the navigation pipeline. In particular, we show that they are an efficient means to provide informed samples for sample-based planners. Controlled simulations with comparison against traditional planners show that the number of samples can be reduced by an order of magnitude while preserving navigation performance. Implementation on a mobile robot matches the simulated performance outcomes.
△ Less
Submitted 16 January, 2018;
originally announced January 2018.
-
Bendable Cuboid Robot Path Planning with Collision Avoidance using Generalized $L_p$ Norms
Authors:
Nak-seung P. Hyun,
Patricio A. Vela,
Erik I. Verriest
Abstract:
Optimal path planning problems for rigid and deformable (bendable) cuboid robots are considered by providing an analytic safety constraint using generalized $L_p$ norms. For regular cuboid robots, level sets of weighted $L_p$ norms generate implicit approximations of their surfaces. For bendable cuboid robots a weighted $L_p$ norm in polar coordinates implicitly approximates the surface boundary t…
▽ More
Optimal path planning problems for rigid and deformable (bendable) cuboid robots are considered by providing an analytic safety constraint using generalized $L_p$ norms. For regular cuboid robots, level sets of weighted $L_p$ norms generate implicit approximations of their surfaces. For bendable cuboid robots a weighted $L_p$ norm in polar coordinates implicitly approximates the surface boundary through a specified level set. Obstacle volumes, in the environment to navigate within, are presumed to be approximately described as sub-level sets of weighted $L_p$ norms. Using these approximate surface models, the optimal safe path planning problem is reformulated as a two stage optimization problem, where the safety constraint depends on a point on the robot which is closest to the obstacle in the obstacle's distance metric. A set of equality and inequality constraints are derived to replace the closest point problem, which is then defines additional analytic constraints on the original path planning problem. Combining all the analytic constraints with logical AND operations leads to a general optimal safe path planning problem. Numerically solving the problem involve conversion to a nonlinear programing problem. Simulations for rigid and bendable cuboid robot verify the proposed method.
△ Less
Submitted 16 December, 2017;
originally announced December 2017.
-
Learning Binary Features Online from Motion Dynamics for Incremental Loop-Closure Detection and Place Recognition
Authors:
Guangcong Zhang,
Mason J. Lilly,
Patricio A. Vela
Abstract:
This paper proposes a simple yet effective approach to learn visual features online for improving loop-closure detection and place recognition, based on bag-of-words frameworks. The approach learns a codeword in bag-of-words model from a pair of matched features from two consecutive frames, such that the codeword has temporally-derived perspective invariance to camera motion. The learning algorith…
▽ More
This paper proposes a simple yet effective approach to learn visual features online for improving loop-closure detection and place recognition, based on bag-of-words frameworks. The approach learns a codeword in bag-of-words model from a pair of matched features from two consecutive frames, such that the codeword has temporally-derived perspective invariance to camera motion. The learning algorithm is efficient: the binary descriptor is generated from the mean image patch, and the mask is learned based on discriminative projection by minimizing the intra-class distances among the learned feature and the two original features. A codeword for bag-of-words models is generated by packaging the learned descriptor and mask, with a masked Hamming distance defined to measure the distance between two codewords. The geometric properties of the learned codewords are then mathematically justified. In addition, hypothesis constraints are imposed through temporal consistency in matched codewords, which improves precision. The approach, integrated in an incremental bag-of-words system, is validated on multiple benchmark data sets and compared to state-of-the-art methods. Experiments demonstrate improved precision/recall outperforming state of the art with little loss in runtime.
△ Less
Submitted 25 January, 2016; v1 submitted 15 January, 2016;
originally announced January 2016.
-
Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines
Authors:
Hassan A. Kingravi,
Patricio A. Vela,
Alexandar Gray
Abstract:
This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empir…
▽ More
This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter . The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter . We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods.
△ Less
Submitted 26 July, 2015;
originally announced July 2015.
-
A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm
Authors:
M. Emre Celebi,
Hassan A. Kingravi,
Patricio A. Vela
Abstract:
K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency…
▽ More
K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.
△ Less
Submitted 10 September, 2012;
originally announced September 2012.