-
SurfR: Surface Reconstruction with Multi-scale Attention
Authors:
Siddhant Ranade,
Gonçalo Dias Pais,
Ross Tyler Whitaker,
Jacinto C. Nascimento,
Pedro Miraldo,
Srikumar Ramalingam
Abstract:
We propose a fast and accurate surface reconstruction algorithm for unorganized point clouds using an implicit representation. Recent learning methods are either single-object representations with small neural models that allow for high surface details but require per-object training or generalized representations that require larger models and generalize to newer shapes but lack details, and infe…
▽ More
We propose a fast and accurate surface reconstruction algorithm for unorganized point clouds using an implicit representation. Recent learning methods are either single-object representations with small neural models that allow for high surface details but require per-object training or generalized representations that require larger models and generalize to newer shapes but lack details, and inference is slow. We propose a new implicit representation for general 3D shapes that is faster than all the baselines at their optimum resolution, with only a marginal loss in performance compared to the state-of-the-art. We achieve the best accuracy-speed trade-off using three key contributions. Many implicit methods extract features from the point cloud to classify whether a query point is inside or outside the object. First, to speed up the reconstruction, we show that this feature extraction does not need to use the query point at an early stage (lazy query). Second, we use a parallel multi-scale grid representation to develop robust features for different noise levels and input resolutions. Finally, we show that attention across scales can provide improved reconstruction results.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
A Probability-guided Sampler for Neural Implicit Surface Rendering
Authors:
Gonçalo Dias Pais,
Valter Piedade,
Moitreya Chatterjee,
Marcus Greiff,
Pedro Miraldo
Abstract:
Several variants of Neural Radiance Fields (NeRFs) have significantly improved the accuracy of synthesized images and surface reconstruction of 3D scenes/objects. In all of these methods, a key characteristic is that none can train the neural network with every possible input data, specifically, every pixel and potential 3D point along the projection rays due to scalability issues. While vanilla N…
▽ More
Several variants of Neural Radiance Fields (NeRFs) have significantly improved the accuracy of synthesized images and surface reconstruction of 3D scenes/objects. In all of these methods, a key characteristic is that none can train the neural network with every possible input data, specifically, every pixel and potential 3D point along the projection rays due to scalability issues. While vanilla NeRFs uniformly sample both the image pixels and 3D points along the projection rays, some variants focus only on guiding the sampling of the 3D points along the projection rays. In this paper, we leverage the implicit surface representation of the foreground scene and model a probability density function in a 3D image projection space to achieve a more targeted sampling of the rays toward regions of interest, resulting in improved rendering. Additionally, a new surface reconstruction loss is proposed for improved performance. This new loss fully explores the proposed 3D image projection space model and incorporates near-to-surface and empty space components. By integrating our novel sampling strategy and novel loss into current state-of-the-art neural implicit surface renderers, we achieve more accurate and detailed 3D reconstructions and improved image rendering, especially for the regions of interest in any given scene.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
FreBIS: Frequency-Based Stratification for Neural Implicit Surface Representations
Authors:
Naoko Sawada,
Pedro Miraldo,
Suhas Lohit,
Tim K. Marks,
Moitreya Chatterjee
Abstract:
Neural implicit surface representation techniques are in high demand for advancing technologies in augmented reality/virtual reality, digital twins, autonomous navigation, and many other fields. With their ability to model object surfaces in a scene as a continuous function, such techniques have made remarkable strides recently, especially over classical 3D surface reconstruction methods, such as…
▽ More
Neural implicit surface representation techniques are in high demand for advancing technologies in augmented reality/virtual reality, digital twins, autonomous navigation, and many other fields. With their ability to model object surfaces in a scene as a continuous function, such techniques have made remarkable strides recently, especially over classical 3D surface reconstruction methods, such as those that use voxels or point clouds. However, these methods struggle with scenes that have varied and complex surfaces principally because they model any given scene with a single encoder network that is tasked to capture all of low through high-surface frequency information in the scene simultaneously. In this work, we propose a novel, neural implicit surface representation approach called FreBIS to overcome this challenge. FreBIS works by stratifying the scene based on the frequency of surfaces into multiple frequency levels, with each level (or a group of levels) encoded by a dedicated encoder. Moreover, FreBIS encourages these encoders to capture complementary information by promoting mutual dissimilarity of the encoded features via a novel, redundancy-aware weighting module. Empirical evaluations on the challenging BlendedMVS dataset indicate that replacing the standard encoder in an off-the-shelf neural surface reconstruction method with our frequency-stratified encoders yields significant improvements. These enhancements are evident both in the quality of the reconstructed 3D surfaces and in the fidelity of their renderings from any viewpoint.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling
Authors:
Xinhang Liu,
Yu-Wing Tai,
Chi-Keung Tang,
Pedro Miraldo,
Suhas Lohit,
Moitreya Chatterjee
Abstract:
Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering. Although these methods have shown some potential in creating immersive experiences, two drawbacks limit their ubiquity: (i) a significant reduction in reconstruction quality when the computing budget is limited, and (ii) a lack of semantic understanding of the und…
▽ More
Extensions of Neural Radiance Fields (NeRFs) to model dynamic scenes have enabled their near photo-realistic, free-viewpoint rendering. Although these methods have shown some potential in creating immersive experiences, two drawbacks limit their ubiquity: (i) a significant reduction in reconstruction quality when the computing budget is limited, and (ii) a lack of semantic understanding of the underlying scenes. To address these issues, we introduce Gear-NeRF, which leverages semantic information from powerful image segmentation models. Our approach presents a principled way for learning a spatio-temporal (4D) semantic embedding, based on which we introduce the concept of gears to allow for stratified modeling of dynamic regions of the scene based on the extent of their motion. Such differentiation allows us to adjust the spatio-temporal sampling resolution for each region in proportion to its motion scale, achieving more photo-realistic dynamic novel view synthesis. At the same time, almost for free, our approach enables free-viewpoint tracking of objects of interest - a functionality not yet achieved by existing NeRF-based methods. Empirical studies validate the effectiveness of our method, where we achieve state-of-the-art rendering and tracking performance on multiple challenging datasets.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Oriented-grid Encoder for 3D Implicit Representations
Authors:
Arihant Gaur,
G. Dias Pais,
Pedro Miraldo
Abstract:
Encoding 3D points is one of the primary steps in learning-based implicit scene representation. Using features that gather information from neighbors with multi-resolution grids has proven to be the best geometric encoder for this task. However, prior techniques do not exploit some characteristics of most objects or scenes, such as surface normals and local smoothness. This paper is the first to e…
▽ More
Encoding 3D points is one of the primary steps in learning-based implicit scene representation. Using features that gather information from neighbors with multi-resolution grids has proven to be the best geometric encoder for this task. However, prior techniques do not exploit some characteristics of most objects or scenes, such as surface normals and local smoothness. This paper is the first to exploit those 3D characteristics in 3D geometric encoders explicitly. In contrast to prior work on using multiple levels of details, regular cube grids, and trilinear interpolation, we propose 3D-oriented grids with a novel cylindrical volumetric interpolation for modeling local planar invariance. In addition, we explicitly include a local feature aggregation for feature regularization and smoothing of the cylindrical interpolation features. We evaluate our approach on ABC, Thingi10k, ShapeNet, and Matterport3D, for object and scene representation. Compared to the use of regular grids, our geometric encoder is shown to converge in fewer steps and obtain sharper 3D surfaces. When compared to the prior techniques, our method gets state-of-the-art results.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus
Authors:
Valter Piedade,
Pedro Miraldo
Abstract:
RANSAC-based algorithms are the standard techniques for robust estimation in computer vision. These algorithms are iterative and computationally expensive; they alternate between random sampling of data, computing hypotheses, and running inlier counting. Many authors tried different approaches to improve efficiency. One of the major improvements is having a guided sampling, letting the RANSAC cycl…
▽ More
RANSAC-based algorithms are the standard techniques for robust estimation in computer vision. These algorithms are iterative and computationally expensive; they alternate between random sampling of data, computing hypotheses, and running inlier counting. Many authors tried different approaches to improve efficiency. One of the major improvements is having a guided sampling, letting the RANSAC cycle stop sooner. This paper presents a new adaptive sampling process for RANSAC. Previous methods either assume no prior information about the inlier/outlier classification of data points or use some previously computed scores in the sampling. In this paper, we derive a dynamic Bayesian network that updates individual data points' inlier scores while iterating RANSAC. At each iteration, we apply weighted sampling using the updated scores. Our method works with or without prior data point scorings. In addition, we use the updated inlier/outlier scoring for deriving a new stopping criterion for the RANSAC loop. We test our method in multiple real-world datasets for several applications and obtain state-of-the-art results. Our method outperforms the baselines in accuracy while needing less computational time.
△ Less
Submitted 25 September, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes
Authors:
Fabien Delattre,
David Dirnfeld,
Phat Nguyen,
Stephen Scarano,
Michael J. Jones,
Pedro Miraldo,
Erik Learned-Miller
Abstract:
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified…
▽ More
We present an approach to estimating camera rotation in crowded, real-world scenes from handheld monocular video. While camera rotation estimation is a well-studied problem, no previous methods exhibit both high accuracy and acceptable speed in this setting. Because the setting is not addressed well by other datasets, we provide a new dataset and benchmark, with high-accuracy, rigorously verified ground truth, on 17 video sequences. Methods developed for wide baseline stereo (e.g., 5-point methods) perform poorly on monocular video. On the other hand, methods used in autonomous driving (e.g., SLAM) leverage specific sensor setups, specific motion models, or local optimization strategies (lagging batch processing) and do not generalize well to handheld video. Finally, for dynamic scenes, commonly used robustification techniques like RANSAC require large numbers of iterations, and become prohibitively slow. We introduce a novel generalization of the Hough transform on SO(3) to efficiently and robustly find the camera rotation most compatible with optical flow. Among comparably fast methods, ours reduces error by almost 50\% over the next best, and is more accurate than any method, irrespective of speed. This represents a strong new performance point for crowded scenes, an important setting for computer vision. The code and the dataset are available at https://fabiendelattre.com/robust-rotation-estimation.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
An observer cascade for velocity and multiple line estimation
Authors:
André Mateus,
Pedro U. Lima,
Pedro Miraldo
Abstract:
Previous incremental estimation methods consider estimating a single line, requiring as many observers as the number of lines to be mapped. This leads to the need for having at least $4N$ state variables, with $N$ being the number of lines. This paper presents the first approach for multi-line incremental estimation. Since lines are common in structured environments, we aim to exploit that structu…
▽ More
Previous incremental estimation methods consider estimating a single line, requiring as many observers as the number of lines to be mapped. This leads to the need for having at least $4N$ state variables, with $N$ being the number of lines. This paper presents the first approach for multi-line incremental estimation. Since lines are common in structured environments, we aim to exploit that structure to reduce the state space. The modeling of structured environments proposed in this paper reduces the state space to $3N + 3$ and is also less susceptible to singular configurations. An assumption the previous methods make is that the camera velocity is available at all times. However, the velocity is usually retrieved from odometry, which is noisy. With this in mind, we propose coupling the camera with an Inertial Measurement Unit (IMU) and an observer cascade. A first observer retrieves the scale of the linear velocity and a second observer for the lines mapping. The stability of the entire system is analyzed. The cascade is shown to be asymptotically stable and shown to converge in experiments with simulated data.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Solving the Discrete Euler-Arnold Equations for the Generalized Rigid Body Motion
Authors:
Joao R. Cardoso,
Pedro Miraldo
Abstract:
We propose three iterative methods for solving the Moser-Veselov equation, which arises in the discretization of the Euler-Arnold differential equations governing the motion of a generalized rigid body. We start by formulating the problem as an optimization problem with orthogonal constraints and proving that the objective function is convex. Then, using techniques from optimization on Riemannian…
▽ More
We propose three iterative methods for solving the Moser-Veselov equation, which arises in the discretization of the Euler-Arnold differential equations governing the motion of a generalized rigid body. We start by formulating the problem as an optimization problem with orthogonal constraints and proving that the objective function is convex. Then, using techniques from optimization on Riemannian manifolds, the three feasible algorithms are designed. The first one splits the orthogonal constraints using the Bregman method, whereas the other two methods are of the steepest-descent type. The second method uses the Cayley-transform to preserve the constraints and a Barzilai-Borwein step size, while the third one involves geodesics, with the step size computed by Armijo's rule. Finally, a set of numerical experiments are carried out to compare the performance of the proposed algorithms, suggesting that the first algorithm has the best performance in terms of accuracy and number of iterations. An essential advantage of these iterative methods is that they work even when the conditions for applicability of the direct methods available in the literature are not satisfied.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
On Incremental Structure-from-Motion using Lines
Authors:
André Mateus,
Omar Tahri,
A. Pedro Aguiar,
Pedro U. Lima,
Pedro Miraldo
Abstract:
Humans tend to build environments with structure, which consists of mainly planar surfaces. From the intersection of planar surfaces arise straight lines. Lines have more degrees-of-freedom than points. Thus, line-based Structure-from-Motion (SfM) provides more information about the environment. In this paper, we present solutions for SfM using lines, namely, incremental SfM. These approaches cons…
▽ More
Humans tend to build environments with structure, which consists of mainly planar surfaces. From the intersection of planar surfaces arise straight lines. Lines have more degrees-of-freedom than points. Thus, line-based Structure-from-Motion (SfM) provides more information about the environment. In this paper, we present solutions for SfM using lines, namely, incremental SfM. These approaches consist of designing state observers for a camera's dynamical visual system looking at a 3D line. We start by presenting a model that uses spherical coordinates for representing the line's moment vector. We show that this parameterization has singularities, and therefore we introduce a more suitable model that considers the line's moment and shortest viewing ray. Concerning the observers, we present two different methodologies. The first uses a memory-less state-of-the-art framework for dynamic visual systems. Since the previous states of the robotic agent are accessible -- while performing the 3D mapping of the environment -- the second approach aims at exploiting the use of memory to improve the estimation accuracy and convergence speed. The two models and the two observers are evaluated in simulation and real data, where mobile and manipulator robots are used.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
Mapping of Sparse 3D Data using Alternating Projection
Authors:
Siddhant Ranade,
Xin Yu,
Shantnu Kakkar,
Pedro Miraldo,
Srikumar Ramalingam
Abstract:
We propose a novel technique to register sparse 3D scans in the absence of texture. While existing methods such as KinectFusion or Iterative Closest Points (ICP) heavily rely on dense point clouds, this task is particularly challenging under sparse conditions without RGB data. Sparse texture-less data does not come with high-quality boundary signal, and this prohibits the use of correspondences fr…
▽ More
We propose a novel technique to register sparse 3D scans in the absence of texture. While existing methods such as KinectFusion or Iterative Closest Points (ICP) heavily rely on dense point clouds, this task is particularly challenging under sparse conditions without RGB data. Sparse texture-less data does not come with high-quality boundary signal, and this prohibits the use of correspondences from corners, junctions, or boundary lines. Moreover, in the case of sparse data, it is incorrect to assume that the same point will be captured in two consecutive scans. We take a different approach and first re-parameterize the point-cloud using a large number of line segments. In this re-parameterized data, there exists a large number of line intersection (and not correspondence) constraints that allow us to solve the registration task. We propose the use of a two-step alternating projection algorithm by formulating the registration as the simultaneous satisfaction of intersection and rigidity constraints. The proposed approach outperforms other top-scoring algorithms on both Kinect and LiDAR datasets. In Kinect, we can use 100X downsampled sparse data and still outperform competing methods operating on full-resolution data.
△ Less
Submitted 9 October, 2020; v1 submitted 4 October, 2020;
originally announced October 2020.
-
Active Depth Estimation: Stability Analysis and its Applications
Authors:
Romulo T. Rodrigues,
Pedro Miraldo,
Dimos V. Dimarogonas,
A. Pedro Aguiar
Abstract:
Recovering the 3D structure of the surrounding environment is an essential task in any vision-controlled Structure-from-Motion (SfM) scheme. This paper focuses on the theoretical properties of the SfM, known as the incremental active depth estimation. The term incremental stands for estimating the 3D structure of the scene over a chronological sequence of image frames. Active means that the camera…
▽ More
Recovering the 3D structure of the surrounding environment is an essential task in any vision-controlled Structure-from-Motion (SfM) scheme. This paper focuses on the theoretical properties of the SfM, known as the incremental active depth estimation. The term incremental stands for estimating the 3D structure of the scene over a chronological sequence of image frames. Active means that the camera actuation is such that it improves estimation performance. Starting from a known depth estimation filter, this paper presents the stability analysis of the filter in terms of the control inputs of the camera. By analyzing the convergence of the estimator using the Lyapunov theory, we relax the constraints on the projection of the 3D point in the image plane when compared to previous results. Nonetheless, our method is capable of dealing with the cameras' limited field-of-view constraints. The main results are validated through experiments with simulated data.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
A Framework for Depth Estimation and Relative Localization of Ground Robots using Computer Vision
Authors:
Romulo T. Rodrigues,
Pedro Miraldo,
Dimos V. Dimarogonas,
A. Pedro Aguiar
Abstract:
The 3D depth estimation and relative pose estimation problem within a decentralized architecture is a challenging problem that arises in missions that require coordination among multiple vision-controlled robots. The depth estimation problem aims at recovering the 3D information of the environment. The relative localization problem consists of estimating the relative pose between two robots, by se…
▽ More
The 3D depth estimation and relative pose estimation problem within a decentralized architecture is a challenging problem that arises in missions that require coordination among multiple vision-controlled robots. The depth estimation problem aims at recovering the 3D information of the environment. The relative localization problem consists of estimating the relative pose between two robots, by sensing each other's pose or sharing information about the perceived environment. Most solutions for these problems use a set of discrete data without taking into account the chronological order of the events. This paper builds on recent results on continuous estimation to propose a framework that estimates the depth and relative pose between two non-holonomic vehicles. The basic idea consists in estimating the depth of the points by explicitly considering the dynamics of the camera mounted on a ground robot, and feeding the estimates of 3D points observed by both cameras in a filter that computes the relative pose between the robots. We evaluate the convergence for a set of simulated scenarios and show experimental results validating the proposed framework.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Can generalised relative pose estimation solve sparse 3D registration?
Authors:
Siddhant Ranade,
Xin Yu,
Shantnu Kakkar,
Pedro Miraldo,
Srikumar Ramalingam
Abstract:
Popular 3D scan registration projects, such as Stanford digital Michelangelo or KinectFusion, exploit the high-resolution sensor data for scan alignment. It is particularly challenging to solve the registration of sparse 3D scans in the absence of RGB components. In this case, we can not establish point correspondences since the same 3D point cannot be captured in two successive scans. In contrast…
▽ More
Popular 3D scan registration projects, such as Stanford digital Michelangelo or KinectFusion, exploit the high-resolution sensor data for scan alignment. It is particularly challenging to solve the registration of sparse 3D scans in the absence of RGB components. In this case, we can not establish point correspondences since the same 3D point cannot be captured in two successive scans. In contrast to correspondence based methods, we take a different viewpoint and formulate the sparse 3D registration problem based on the constraints from the intersection of line segments from adjacent scans. We obtain the line segments by modeling every horizontal and vertical scan-line as piece-wise linear segments. We propose a new alternating projection algorithm for solving the scan alignment problem using line intersection constraints. We develop two new minimal solvers for scan alignment in the presence of plane correspondences: 1) 3 line intersections and 1 plane correspondence, and 2) 1 line intersection and 2 plane correspondences. We outperform other competing methods on Kinect and LiDAR datasets.
△ Less
Submitted 13 June, 2019;
originally announced June 2019.
-
POSEAMM: A Unified Framework for Solving Pose Problems using an Alternating Minimization Method
Authors:
Joao Campos,
Joao R. Cardoso,
Pedro Miraldo
Abstract:
Pose estimation is one of the most important problems in computer vision. It can be divided in two different categories -- absolute and relative -- and may involve two different types of camera models: central and non-central. State-of-the-art methods have been designed to solve separately these problems. This paper presents a unified framework that is able to solve any pose problem by alternating…
▽ More
Pose estimation is one of the most important problems in computer vision. It can be divided in two different categories -- absolute and relative -- and may involve two different types of camera models: central and non-central. State-of-the-art methods have been designed to solve separately these problems. This paper presents a unified framework that is able to solve any pose problem by alternating optimization techniques between two set of parameters, rotation and translation. In order to make this possible, it is necessary to define an objective function that captures the problem at hand. Since the objective function will depend on the rotation and translation it is not possible to solve it as a simple minimization problem. Hence the use of Alternating Minimization methods, in which the function will be alternatively minimized with respect to the rotation and the translation. We show how to use our framework in three distinct pose problems. Our methods are then benchmarked with both synthetic and real data, showing their better balance between computational time and accuracy.
△ Less
Submitted 9 April, 2019;
originally announced April 2019.
-
Minimal Solvers for Mini-Loop Closures in 3D Multi-Scan Alignment
Authors:
Pedro Miraldo,
Surojit Saha,
Srikumar Ramalingam
Abstract:
3D scan registration is a classical, yet a highly useful problem in the context of 3D sensors such as Kinect and Velodyne. While there are several existing methods, the techniques are usually incremental where adjacent scans are registered first to obtain the initial poses, followed by motion averaging and bundle-adjustment refinement. In this paper, we take a different approach and develop minima…
▽ More
3D scan registration is a classical, yet a highly useful problem in the context of 3D sensors such as Kinect and Velodyne. While there are several existing methods, the techniques are usually incremental where adjacent scans are registered first to obtain the initial poses, followed by motion averaging and bundle-adjustment refinement. In this paper, we take a different approach and develop minimal solvers for jointly computing the initial poses of cameras in small loops such as 3-, 4-, and 5-cycles. Note that the classical registration of 2 scans can be done using a minimum of 3 point matches to compute 6 degrees of relative motion. On the other hand, to jointly compute the 3D registrations in n-cycles, we take 2 point matches between the first n-1 consecutive pairs (i.e., Scan 1 & Scan 2, ... , and Scan n-1 & Scan n) and 1 or 2 point matches between Scan 1 and Scan n. Overall, we use 5, 7, and 10 point matches for 3-, 4-, and 5-cycles, and recover 12, 18, and 24 degrees of transformation variables, respectively. Using simulations and real-data we show that the 3D registration using mini n-cycles are computationally efficient, and can provide alternate and better initial poses compared to standard pairwise methods.
△ Less
Submitted 8 April, 2019;
originally announced April 2019.
-
3DRegNet: A Deep Neural Network for 3D Point Registration
Authors:
G. Dias Pais,
Srikumar Ramalingam,
Venu Madhav Govindu,
Jacinto C. Nascimento,
Rama Chellappa,
Pedro Miraldo
Abstract:
We present 3DRegNet, a novel deep learning architecture for the registration of 3D scans. Given a set of 3D point correspondences, we build a deep neural network to address the following two challenges: (i) classification of the point correspondences into inliers/outliers, and (ii) regression of the motion parameters that align the scans into a common reference frame. With regard to regression, we…
▽ More
We present 3DRegNet, a novel deep learning architecture for the registration of 3D scans. Given a set of 3D point correspondences, we build a deep neural network to address the following two challenges: (i) classification of the point correspondences into inliers/outliers, and (ii) regression of the motion parameters that align the scans into a common reference frame. With regard to regression, we present two alternative approaches: (i) a Deep Neural Network (DNN) registration and (ii) a Procrustes approach using SVD to estimate the transformation. Our correspondence-based approach achieves a higher speedup compared to competing baselines. We further propose the use of a refinement network, which consists of a smaller 3DRegNet as a refinement to improve the accuracy of the registration. Extensive experiments on two challenging datasets demonstrate that we outperform other methods and achieve state-of-the-art results. The code is available.
△ Less
Submitted 7 April, 2020; v1 submitted 2 April, 2019;
originally announced April 2019.
-
OmniDRL: Robust Pedestrian Detection using Deep Reinforcement Learning on Omnidirectional Cameras
Authors:
G. Dias Pais,
Tiago J. Dias,
Jacinto C. Nascimento,
Pedro Miraldo
Abstract:
Pedestrian detection is one of the most explored topics in computer vision and robotics. The use of deep learning methods allowed the development of new and highly competitive algorithms. Deep Reinforcement Learning has proved to be within the state-of-the-art in terms of both detection in perspective cameras and robotics applications. However, for detection in omnidirectional cameras, the literat…
▽ More
Pedestrian detection is one of the most explored topics in computer vision and robotics. The use of deep learning methods allowed the development of new and highly competitive algorithms. Deep Reinforcement Learning has proved to be within the state-of-the-art in terms of both detection in perspective cameras and robotics applications. However, for detection in omnidirectional cameras, the literature is still scarce, mostly because of their high levels of distortion. This paper presents a novel and efficient technique for robust pedestrian detection in omnidirectional images. The proposed method uses deep Reinforcement Learning that takes advantage of the distortion in the image. By considering the 3D bounding boxes and their distorted projections into the image, our method is able to provide the pedestrian's position in the world, in contrast to the image positions provided by most state-of-the-art methods for perspective cameras. Our method avoids the need of pre-processing steps to remove the distortion, which is computationally expensive. Beyond the novel solution, our method compares favorably with the state-of-the-art methodologies that do not consider the underlying distortion for the detection task.
△ Less
Submitted 2 March, 2019;
originally announced March 2019.
-
Active Estimation of 3D Lines in Spherical Coordinates
Authors:
André Mateus,
Omar Tahri,
Pedro Miraldo
Abstract:
Straight lines are common features in human made environments, which makes them a frequently explored feature for control applications. Many control schemes, like Visual Servoing, require the 3D parameters of the features to be estimated. In order to obtain the 3D structure of lines, a nonlinear observer is proposed. However, to guarantee convergence, the dynamical system must be coupled with an a…
▽ More
Straight lines are common features in human made environments, which makes them a frequently explored feature for control applications. Many control schemes, like Visual Servoing, require the 3D parameters of the features to be estimated. In order to obtain the 3D structure of lines, a nonlinear observer is proposed. However, to guarantee convergence, the dynamical system must be coupled with an algebraic equation. This is achieved by using spherical coordinates to represent the line's moment vector, and a change of basis, which allows to introduce the algebraic constraint directly on the system's dynamics. Finally, a control law that attempts to optimize the convergence behavior of the observer is presented. The approach is validated in simulation, and with a real robotic platform with a camera onboard.
△ Less
Submitted 21 March, 2019; v1 submitted 1 February, 2019;
originally announced February 2019.
-
A Minimal Closed-Form Solution for Multi-Perspective Pose Estimation using Points and Lines
Authors:
Pedro Miraldo,
Tiago Dias,
Srikumar Ramalingam
Abstract:
We propose a minimal solution for pose estimation using both points and lines for a multi-perspective camera. In this paper, we treat the multi-perspective camera as a collection of rigidly attached perspective cameras. These type of imaging devices are useful for several computer vision applications that require a large coverage such as surveillance, self-driving cars, and motion-capture studios.…
▽ More
We propose a minimal solution for pose estimation using both points and lines for a multi-perspective camera. In this paper, we treat the multi-perspective camera as a collection of rigidly attached perspective cameras. These type of imaging devices are useful for several computer vision applications that require a large coverage such as surveillance, self-driving cars, and motion-capture studios. While prior methods have considered the cases using solely points or lines, the hybrid case involving both points and lines has not been solved for multi-perspective cameras. We present the solutions for two cases. In the first case, we are given 2D to 3D correspondences for two points and one line. In the later case, we are given 2D to 3D correspondences for one point and two lines. We show that the solution for the case of two points and one line can be formulated as a fourth degree equation. This is interesting because we can get a closed-form solution and thereby achieve high computational efficiency. The later case involving two lines and one point can be mapped to an eighth degree equation. We show simulations and real experiments to demonstrate the advantages and benefits over existing methods.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
Active Structure-from-Motion for 3D Straight Lines
Authors:
André Mateus,
Omar Tahri,
Pedro Miraldo
Abstract:
A reliable estimation of 3D parameters is a must for several applications like planning and control. Included in the latter is the Image-Based Visual Servoing, whose control scheme depends directly on 3D parameters e.g. depth of points, and depth and direction of 3D straight lines. Recently a framework for Active Structure-from-Motion was proposed, addressing the former feature type. However, stra…
▽ More
A reliable estimation of 3D parameters is a must for several applications like planning and control. Included in the latter is the Image-Based Visual Servoing, whose control scheme depends directly on 3D parameters e.g. depth of points, and depth and direction of 3D straight lines. Recently a framework for Active Structure-from-Motion was proposed, addressing the former feature type. However, straight lines were not addressed. These are 1D objects, which allow for more robust detection and tracking. In this work, the problem of Active Structure-from-Motion for 3D straight lines is addressed. An explicit representation of this type of feature is presented, and a change of variables is proposed, which allows the dynamics of the line to respect the conditions for observability of the framework. The approach is validated first in simulation for a single line, and second using a real robot. The latter set of experiments are conducted first for a single line, and then for three lines, which is the minimum required number of lines to control a 6 degree of freedom camera.
△ Less
Submitted 12 December, 2018; v1 submitted 2 July, 2018;
originally announced July 2018.
-
Analytical Modeling of Vanishing Points and Curves in Catadioptric Cameras
Authors:
Pedro Miraldo,
Francisco Eiras,
Srikumar Ramalingam
Abstract:
Vanishing points and vanishing lines are classical geometrical concepts in perspective cameras that have a lineage dating back to 3 centuries. A vanishing point is a point on the image plane where parallel lines in 3D space appear to converge, whereas a vanishing line passes through 2 or more vanishing points. While such concepts are simple and intuitive in perspective cameras, their counterparts…
▽ More
Vanishing points and vanishing lines are classical geometrical concepts in perspective cameras that have a lineage dating back to 3 centuries. A vanishing point is a point on the image plane where parallel lines in 3D space appear to converge, whereas a vanishing line passes through 2 or more vanishing points. While such concepts are simple and intuitive in perspective cameras, their counterparts in catadioptric cameras (obtained using mirrors and lenses) are more involved. For example, lines in the 3D space map to higher degree curves in catadioptric cameras. The projection of a set of 3D parallel lines converges on a single point in perspective images, whereas they converge to more than one point in catadioptric cameras. To the best of our knowledge, we are not aware of any systematic development of analytical models for vanishing points and vanishing curves in different types of catadioptric cameras. In this paper, we derive parametric equations for vanishing points and vanishing curves using the calibration parameters, mirror shape coefficients, and direction vectors of parallel lines in 3D space. We show compelling experimental results on vanishing point estimation and absolute pose estimation for a wide range of catadioptric cameras in both simulations and real experiments.
△ Less
Submitted 25 April, 2018;
originally announced April 2018.
-
Low-level Active Visual Navigation: Increasing robustness of vision-based localization using potential fields
Authors:
Romulo T. Rodrigues,
Meysam Basiri,
A. Pedro Aguiar,
Pedro Miraldo
Abstract:
This paper proposes a low-level visual navigation algorithm to improve visual localization of a mobile robot. The algorithm, based on artificial potential fields, associates each feature in the current image frame with an attractive or neutral potential energy, with the objective of generating a control action that drives the vehicle towards the goal, while still favoring feature rich areas within…
▽ More
This paper proposes a low-level visual navigation algorithm to improve visual localization of a mobile robot. The algorithm, based on artificial potential fields, associates each feature in the current image frame with an attractive or neutral potential energy, with the objective of generating a control action that drives the vehicle towards the goal, while still favoring feature rich areas within a local scope, thus improving the localization performance. One key property of the proposed method is that it does not rely on mapping, and therefore it is a lightweight solution that can be deployed on miniaturized aerial robots, in which memory and computational power are major constraints. Simulations and real experimental results using a mini quadrotor equipped with a downward looking camera demonstrate that the proposed method can effectively drive the vehicle to a designated goal through a path that prevents localization failure.
△ Less
Submitted 23 March, 2018; v1 submitted 21 January, 2018;
originally announced January 2018.
-
3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images
Authors:
Tiago Dias,
Helder Araujo,
Pedro Miraldo
Abstract:
In this paper we analyze and compare approaches for 3D reconstruction from low-resolution (250x250), high radial distortion stereo images, which are acquired with small baseline (approximately 1mm). These images are acquired with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo cameras have also small apertures, which means that high levels of illumination are required. The goa…
▽ More
In this paper we analyze and compare approaches for 3D reconstruction from low-resolution (250x250), high radial distortion stereo images, which are acquired with small baseline (approximately 1mm). These images are acquired with the system NanEye Stereo manufactured by CMOSIS/AWAIBA. These stereo cameras have also small apertures, which means that high levels of illumination are required. The goal was to develop an approach yielding accurate reconstructions, with a low computational cost, i.e., avoiding non-linear numerical optimization algorithms. In particular we focused on the analysis and comparison of radial distortion models. To perform the analysis and comparison, we defined a baseline method based on available software and methods, such as the Bouguet toolbox [2] or the Computer Vision Toolbox from Matlab. The approaches tested were based on the use of the polynomial model of radial distortion, and on the application of the division model. The issue of the center of distortion was also addressed within the framework of the application of the division model. We concluded that the division model with a single radial distortion parameter has limitations.
△ Less
Submitted 19 September, 2017;
originally announced September 2017.
-
On the Generalized Essential Matrix Correction: An efficient solution to the problem and its applications
Authors:
Pedro Miraldo,
Joao R. Cardoso
Abstract:
This paper addresses the problem of finding the closest generalized essential matrix from a given $6\times 6$ matrix, with respect to the Frobenius norm. To the best of our knowledge, this nonlinear constrained optimization problem has not been addressed in the literature yet. Although it can be solved directly, it involves a large number of constraints, and any optimization method to solve it wou…
▽ More
This paper addresses the problem of finding the closest generalized essential matrix from a given $6\times 6$ matrix, with respect to the Frobenius norm. To the best of our knowledge, this nonlinear constrained optimization problem has not been addressed in the literature yet. Although it can be solved directly, it involves a large number of constraints, and any optimization method to solve it would require much computational effort. We start by deriving a couple of unconstrained formulations of the problem. After that, we convert the original problem into a new one, involving only orthogonal constraints, and propose an efficient algorithm of steepest descent-type to find its solution. To test the algorithms, we evaluate the methods with synthetic data and conclude that the proposed steepest descent-type approach is much faster than the direct application of general optimization techniques to the original formulation with 33 constraints and to the unconstrained ones. To further motivate the relevance of our method, we apply it in two pose problems (relative and absolute) using synthetic and real data.
△ Less
Submitted 16 March, 2020; v1 submitted 19 September, 2017;
originally announced September 2017.
-
Feature Based Potential Field for Low-level Active Visual Navigation
Authors:
Rômulo T. Rodrigues,
Meysam Basiri,
A. Pedro Aguiar,
Pedro Miraldo
Abstract:
This paper proposes a novel solution for improving visual localization in an active fashion. The solution, based on artificial potential field, associates each feature in the current image frame with an attractive or neutral potential energy. The resultant action drives the vehicle towards the goal, while still favoring feature rich areas. Experimental results with a mini quadrotor equipped with a…
▽ More
This paper proposes a novel solution for improving visual localization in an active fashion. The solution, based on artificial potential field, associates each feature in the current image frame with an attractive or neutral potential energy. The resultant action drives the vehicle towards the goal, while still favoring feature rich areas. Experimental results with a mini quadrotor equipped with a downward looking camera assess the performance of the proposed method.
△ Less
Submitted 14 September, 2017;
originally announced September 2017.
-
Efficient and Robust Pedestrian Detection using Deep Learning for Human-Aware Navigation
Authors:
Andre Mateus,
David Ribeiro,
Pedro Miraldo,
Jacinto C. Nascimento
Abstract:
This paper addresses the problem of Human-Aware Navigation (HAN), using multi camera sensors to implement a vision-based person tracking system. The main contributions of this paper are as follows: a novel and efficient Deep Learning person detection and a standardization of human-aware constraints. In the first stage of the approach, we propose to cascade the Aggregate Channel Features (ACF) dete…
▽ More
This paper addresses the problem of Human-Aware Navigation (HAN), using multi camera sensors to implement a vision-based person tracking system. The main contributions of this paper are as follows: a novel and efficient Deep Learning person detection and a standardization of human-aware constraints. In the first stage of the approach, we propose to cascade the Aggregate Channel Features (ACF) detector with a deep Convolutional Neural Network (CNN) to achieve fast and accurate Pedestrian Detection (PD). Regarding the human awareness (that can be defined as constraints associated with the robot's motion), we use a mixture of asymmetric Gaussian functions, to define the cost functions associated to each constraint. Both methods proposed herein are evaluated individually to measure the impact of each of the components. The final solution (including both the proposed pedestrian detection and the human-aware constraints) is tested in a typical domestic indoor scenario, in four distinct experiments. The results show that the robot is able to cope with human-aware constraints, defined after common proxemics and social rules.
△ Less
Submitted 13 December, 2018; v1 submitted 15 July, 2016;
originally announced July 2016.
-
A Real-Time Deep Learning Pedestrian Detector for Robot Navigation
Authors:
David Ribeiro,
Andre Mateus,
Pedro Miraldo,
Jacinto C. Nascimento
Abstract:
A real-time Deep Learning based method for Pedestrian Detection (PD) is applied to the Human-Aware robot navigation problem. The pedestrian detector combines the Aggregate Channel Features (ACF) detector with a deep Convolutional Neural Network (CNN) in order to obtain fast and accurate performance. Our solution is firstly evaluated using a set of real images taken from onboard and offboard camera…
▽ More
A real-time Deep Learning based method for Pedestrian Detection (PD) is applied to the Human-Aware robot navigation problem. The pedestrian detector combines the Aggregate Channel Features (ACF) detector with a deep Convolutional Neural Network (CNN) in order to obtain fast and accurate performance. Our solution is firstly evaluated using a set of real images taken from onboard and offboard cameras and, then, it is validated in a typical robot navigation environment with pedestrians (two distinct experiments are conducted). The results on both tests show that our pedestrian detector is robust and fast enough to be used on robot navigation applications.
△ Less
Submitted 19 September, 2017; v1 submitted 15 July, 2016;
originally announced July 2016.
-
Non-Central Catadioptric Cameras Pose Estimation using 3D Lines
Authors:
Andre Mateus,
Pedro Miraldo,
Pedro U. Lima
Abstract:
In this article we purpose a novel method for planar pose estimation of mobile robots. This method is based on an analytic solution (which we derived) for the projection of 3D straight lines, onto the mirror of Non-Central Catadioptric Cameras (NCCS). The resulting solution is rewritten as a function of the rotation and translation parameters, which is then used as an error function for a set of m…
▽ More
In this article we purpose a novel method for planar pose estimation of mobile robots. This method is based on an analytic solution (which we derived) for the projection of 3D straight lines, onto the mirror of Non-Central Catadioptric Cameras (NCCS). The resulting solution is rewritten as a function of the rotation and translation parameters, which is then used as an error function for a set of mirror points. Those should be the result of the projection of a set of points incident with the respective 3D lines. The camera's pose is given by minimizing the error function, with the associated constraints. The method is validated by experiments both with synthetic and real data. The latter was collected from a mobile robot equipped with a NCCS.
△ Less
Submitted 8 July, 2016;
originally announced July 2016.
-
Plücker Correction Problem: Analysis and Improvements in Efficiency
Authors:
João R. Cardoso,
Pedro Miraldo,
Helder Araujo
Abstract:
A given six dimensional vector represents a 3D straight line in Plucker coordinates if its coordinates satisfy the Klein quadric constraint. In many problems aiming to find the Plucker coordinates of lines, noise in the data and other type of errors contribute for obtaining 6D vectors that do not correspond to lines, because of that constraint. A common procedure to overcome this drawback is to…
▽ More
A given six dimensional vector represents a 3D straight line in Plucker coordinates if its coordinates satisfy the Klein quadric constraint. In many problems aiming to find the Plucker coordinates of lines, noise in the data and other type of errors contribute for obtaining 6D vectors that do not correspond to lines, because of that constraint. A common procedure to overcome this drawback is to find the Plucker coordinates of the lines that are closest to those vectors. This is known as the Plucker correction problem. In this article we propose a simple, closed-form, and global solution for this problem. When compared with the state-of-the-art method, one can conclude that our algorithm is easier and requires much less operations than previous techniques (it does not require Singular Value Decomposition techniques).
△ Less
Submitted 18 February, 2016;
originally announced February 2016.