-
O-MaMa @ EgoExo4D Correspondence Challenge: Learning Object Mask Matching between Egocentric and Exocentric Views
Authors:
Lorenzo Mur-Labadia,
Maria Santos-Villafranca,
Alejandro Perez-Yus,
Jesus Bermudez-Cameo,
Ruben Martinez-Cantin,
Jose J. Guerrero
Abstract:
The goal of the correspondence task is to segment specific objects across different views. This technical report re-defines cross-image segmentation by treating it as a mask matching task. Our method consists of: (1) A Mask-Context Encoder that pools dense DINOv2 semantic features to obtain discriminative object-level representations from FastSAM mask candidates, (2) an Ego$\leftrightarrow$Exo Cro…
▽ More
The goal of the correspondence task is to segment specific objects across different views. This technical report re-defines cross-image segmentation by treating it as a mask matching task. Our method consists of: (1) A Mask-Context Encoder that pools dense DINOv2 semantic features to obtain discriminative object-level representations from FastSAM mask candidates, (2) an Ego$\leftrightarrow$Exo Cross-Attention that fuses multi-perspective observations, (3) a Mask Matching contrastive loss that aligns cross-view features in a shared latent space, and (4) a Hard Negative Adjacent Mining strategy to encourage the model to better differentiate between nearby objects.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
A Learning-Based Ansatz Satisfying Boundary Conditions in Variational Problems
Authors:
Rafael Florencio,
Julio Guerrero
Abstract:
Recently, innovative adaptations of the Ritz Method incorporating deep learning have been developed, known as the Deep Ritz Method. This approach employs a neural network as the test function for variational problems. However, the neural network does not inherently satisfy the boundary conditions of the variational problem. To resolve this issue, the Deep Ritz Method introduces a penalty term into…
▽ More
Recently, innovative adaptations of the Ritz Method incorporating deep learning have been developed, known as the Deep Ritz Method. This approach employs a neural network as the test function for variational problems. However, the neural network does not inherently satisfy the boundary conditions of the variational problem. To resolve this issue, the Deep Ritz Method introduces a penalty term into the functional of the variational problem, which can lead to misleading results during the optimization process. In this work, an ansatz is proposed that inherently satisfies the boundary conditions of the variational problem. The results demonstrate that the proposed ansatz not only eliminates misleading outcomes but also reduces complexity while maintaining accuracy, showcasing its practical effectiveness in addressing variational problems.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Knowledge Distillation for Multimodal Egocentric Action Recognition Robust to Missing Modalities
Authors:
Maria Santos-Villafranca,
Dustin Carrión-Ojeda,
Alejandro Perez-Yus,
Jesus Bermudez-Cameo,
Jose J. Guerrero,
Simone Schaub-Meyer
Abstract:
Action recognition is an essential task in egocentric vision due to its wide range of applications across many fields. While deep learning methods have been proposed to address this task, most rely on a single modality, typically video. However, including additional modalities may improve the robustness of the approaches to common issues in egocentric videos, such as blurriness and occlusions. Rec…
▽ More
Action recognition is an essential task in egocentric vision due to its wide range of applications across many fields. While deep learning methods have been proposed to address this task, most rely on a single modality, typically video. However, including additional modalities may improve the robustness of the approaches to common issues in egocentric videos, such as blurriness and occlusions. Recent efforts in multimodal egocentric action recognition often assume the availability of all modalities, leading to failures or performance drops when any modality is missing. To address this, we introduce an efficient multimodal knowledge distillation approach for egocentric action recognition that is robust to missing modalities (KARMMA) while still benefiting when multiple modalities are available. Our method focuses on resource-efficient development by leveraging pre-trained models as unimodal feature extractors in our teacher model, which distills knowledge into a much smaller and faster student model. Experiments on the Epic-Kitchens and Something-Something datasets demonstrate that our student model effectively handles missing modalities while reducing its accuracy drop in this scenario.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Authors:
Lorenzo Mur-Labadia,
Josechu Guerrero,
Ruben Martinez-Cantin
Abstract:
Environment understanding in egocentric videos is an important step for applications like robotics, augmented reality and assistive technologies. These videos are characterized by dynamic interactions and a strong dependence on the wearer engagement with the environment. Traditional approaches often focus on isolated clips or fail to integrate rich semantic and geometric information, limiting scen…
▽ More
Environment understanding in egocentric videos is an important step for applications like robotics, augmented reality and assistive technologies. These videos are characterized by dynamic interactions and a strong dependence on the wearer engagement with the environment. Traditional approaches often focus on isolated clips or fail to integrate rich semantic and geometric information, limiting scene comprehension. We introduce Dynamic Image-Video Feature Fields (DIV FF), a framework that decomposes the egocentric scene into persistent, dynamic, and actor based components while integrating both image and video language features. Our model enables detailed segmentation, captures affordances, understands the surroundings and maintains consistent understanding over time. DIV-FF outperforms state-of-the-art methods, particularly in dynamically evolving scenarios, demonstrating its potential to advance long term, spatio temporal scene understanding.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
A Supervised Machine-Learning Approach For Turboshaft Engine Dynamic Modeling Under Real Flight Conditions
Authors:
Damiano Paniccia,
Francesco Aldo Tucci,
Joel Guerrero,
Luigi Capone,
Nicoletta Sanguini,
Tommaso Benacchio,
Luigi Bottasso
Abstract:
Rotorcraft engines are highly complex, nonlinear thermodynamic systems that operate under varying environmental and flight conditions. Simulating their dynamics is crucial for design, fault diagnostics, and deterioration control phases, and requires robust and reliable control systems to estimate engine performance throughout flight envelope. However, the development of detailed physical models of…
▽ More
Rotorcraft engines are highly complex, nonlinear thermodynamic systems that operate under varying environmental and flight conditions. Simulating their dynamics is crucial for design, fault diagnostics, and deterioration control phases, and requires robust and reliable control systems to estimate engine performance throughout flight envelope. However, the development of detailed physical models of the engine based on numerical simulations is a very challenging task due to the complex and entangled physics driving the engine. In this scenario, data-driven machine-learning techniques are of great interest to the aircraft engine community, due to their ability to describe nonlinear systems' dynamic behavior and enable online performance estimation, achieving excellent results with accuracy competitive with the state of the art. In this work, we explore different Neural Network architectures to model the turboshaft engine of Leonardo's AW189P4 prototype, aiming to predict the engine torque. The models are trained on an extensive database of real flight tests featuring a variety of operational maneuvers performed under different flight conditions, providing a comprehensive representation of the engine's performance. To complement the neural network approach, we apply Sparse Identification of Nonlinear Dynamics (SINDy) to derive a low-dimensional dynamical model from the available data, describing the relationship between fuel flow and engine torque. The resulting model showcases SINDy's capability to recover the actual physics underlying the engine dynamics and demonstrates its potential for investigating more complex aspects of the engine. The results prove that data-driven engine models can exploit a wider range of parameters than standard transfer function-based approaches, enabling the use of trained schemes to simulate nonlinear effects in different engines and helicopters.
△ Less
Submitted 23 February, 2025; v1 submitted 19 February, 2025;
originally announced February 2025.
-
Influence of field of view in visual prostheses design: Analysis with a VR system
Authors:
Melani Sanchez-Garcia,
Ruben Martinez-Cantin,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
Visual prostheses are designed to restore partial functional vision in patients with total vision loss. Retinal visual prostheses provide limited capabilities as a result of low resolution, limited field of view and poor dynamic range. Understanding the influence of these parameters in the perception results can guide prostheses research and design. In this work, we evaluate the influence of field…
▽ More
Visual prostheses are designed to restore partial functional vision in patients with total vision loss. Retinal visual prostheses provide limited capabilities as a result of low resolution, limited field of view and poor dynamic range. Understanding the influence of these parameters in the perception results can guide prostheses research and design. In this work, we evaluate the influence of field of view with respect to spatial resolution in visual prostheses, measuring the accuracy and response time in a search and recognition task. Twenty-four normally sighted participants were asked to find and recognize usual objects, such as furniture and home appliance in indoor room scenes. For the experiment, we use a new simulated prosthetic vision system that allows simple and effective experimentation. Our system uses a virtual-reality environment based on panoramic scenes. The simulator employs a head-mounted display which allows users to feel immersed in the scene by perceiving the entire scene all around. Our experiments use public image datasets and a commercial head-mounted display. We have also released the virtual-reality software for replicating and extending the experimentation. Results show that the accuracy and response time decrease when the field of view is increased. Furthermore, performance appears to be correlated with the angular resolution, but showing a diminishing return even with a resolution of less than 2.3 phosphenes per degree. Our results seem to indicate that, for the design of retinal prostheses, it is better to concentrate the phosphenes in a small area, to maximize the angular resolution, even if that implies sacrificing field of view.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Extracting Object Heights From LiDAR & Aerial Imagery
Authors:
Jesus Guerrero
Abstract:
This work shows a procedural method for extracting object heights from LiDAR and aerial imagery. We discuss how to get heights and the future of LiDAR and imagery processing. SOTA object segmentation allows us to take get object heights with no deep learning background. Engineers will be keeping track of world data across generations and reprocessing them. They will be using older procedural metho…
▽ More
This work shows a procedural method for extracting object heights from LiDAR and aerial imagery. We discuss how to get heights and the future of LiDAR and imagery processing. SOTA object segmentation allows us to take get object heights with no deep learning background. Engineers will be keeping track of world data across generations and reprocessing them. They will be using older procedural methods like this paper and newer ones discussed here. SOTA methods are going beyond analysis and into generative AI. We cover both a procedural methodology and the newer ones performed with language models. These include point cloud, imagery and text encoding allowing for spatially aware AI.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Towards a Robotic Intrusion Prevention System: Combining Security and Safety in Cognitive Social Robots
Authors:
Francisco Martín,
Enrique Soriano-Salvador,
José Miguel Guerrero,
Gorka Guardiola Múzquiz,
Juan Carlos Manzanares,
Francisco J. Rodríguez
Abstract:
Social Robots need to be safe and reliable to share their space with humans. This paper reports on the first results of a research project that aims to create more safe and reliable, intelligent autonomous robots by investigating the implications and interactions between cybersecurity and safety. We propose creating a robotic intrusion prevention system (RIPS) that follows a novel approach to dete…
▽ More
Social Robots need to be safe and reliable to share their space with humans. This paper reports on the first results of a research project that aims to create more safe and reliable, intelligent autonomous robots by investigating the implications and interactions between cybersecurity and safety. We propose creating a robotic intrusion prevention system (RIPS) that follows a novel approach to detect and mitigate intrusions in cognitive social robot systems and other cyber-physical systems. The RIPS detects threats at the robotic communication level and enables mitigation of the cyber-physical threats by using System Modes to define what part of the robotic system reduces or limits its functionality while the system is compromised. We demonstrate the validity of our approach by applying it to a cognitive architecture running in a real social robot that preserves the privacy and safety of humans while facing several cyber attack situations.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Authors:
Lorenzo Mur-Labadia,
Ruben Martinez-Cantin,
Josechu Guerrero,
Giovanni Maria Farinella,
Antonino Furnari
Abstract:
Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise an…
▽ More
Short-Term object-interaction Anticipation consists of detecting the location of the next-active objects, the noun and verb categories of the interaction, and the time to contact from the observation of egocentric video. This ability is fundamental for wearable assistants or human robot interaction to understand the user goals, but there is still room for improvement to perform STA in a precise and reliable way. In this work, we improve the performance of STA predictions with two contributions: 1. We propose STAformer, a novel attention-based architecture integrating frame guided temporal pooling, dual image-video attention, and multiscale feature fusion to support STA predictions from an image-input video pair. 2. We introduce two novel modules to ground STA predictions on human behavior by modeling affordances.First, we integrate an environment affordance model which acts as a persistent memory of interactions that can take place in a given physical scene. Second, we predict interaction hotspots from the observation of hands and object trajectories, increasing confidence in STA predictions localized around the hotspot. Our results show significant relative Overall Top-5 mAP improvements of up to +45% on Ego4D and +42% on a novel set of curated EPIC-Kitchens STA labels. We will release the code, annotations, and pre extracted affordances on Ego4D and EPIC- Kitchens to encourage future research in this area.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models
Authors:
Garrett Crumrine,
Izzat Alsmadi,
Jesus Guerrero,
Yuvaraj Munian
Abstract:
Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the…
▽ More
Large language models (LLMs) have revolutionized how we interact with machines. However, this technological advancement has been paralleled by the emergence of "Mallas," malicious services operating underground that exploit LLMs for nefarious purposes. Such services create malware, phishing attacks, and deceptive websites, escalating the cyber security threats landscape. This paper delves into the proliferation of Mallas by examining the use of various pre-trained language models and their efficiency and vulnerabilities when misused. Building on a dataset from the Common Vulnerabilities and Exposures (CVE) program, it explores fine-tuning methodologies to generate code and explanatory text related to identified vulnerabilities. This research aims to shed light on the operational strategies and exploitation techniques of Mallas, leading to the development of more secure and trustworthy AI applications. The paper concludes by emphasizing the need for further research, enhanced safeguards, and ethical guidelines to mitigate the risks associated with the malicious application of LLMs.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Scaled 360 layouts: Revisiting non-central panoramas
Authors:
Bruno Berenguel-Baeta,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
From a non-central panorama, 3D lines can be recovered by geometric reasoning. However, their sensitivity to noise and the complex geometric modeling required has led these panoramas being very little investigated. In this work we present a novel approach for 3D layout recovery of indoor environments using single non-central panoramas. We obtain the boundaries of the structural lines of the room f…
▽ More
From a non-central panorama, 3D lines can be recovered by geometric reasoning. However, their sensitivity to noise and the complex geometric modeling required has led these panoramas being very little investigated. In this work we present a novel approach for 3D layout recovery of indoor environments using single non-central panoramas. We obtain the boundaries of the structural lines of the room from a non-central panorama using deep learning and exploit the properties of non-central projection systems in a new geometrical processing to recover the scaled layout. We solve the problem for Manhattan environments, handling occlusions, and also for Atlanta environments in an unified method. The experiments performed improve the state-of-the-art methods for 3D layout recovery from a single panorama. Our approach is the first work using deep learning with non-central panoramas and recovering the scale of single panorama layouts.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Visual Gyroscope: Combination of Deep Learning Features and Direct Alignment for Panoramic Stabilization
Authors:
Bruno Berenguel-Baeta,
Antoine N. Andre,
Guillaume Caron,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
In this article we present a visual gyroscope based on equirectangular panoramas. We propose a new pipeline where we take advantage of combining three different methods to obtain a robust and accurate estimation of the attitude of the camera. We quantitatively and qualitatively validate our method on two image sequences taken with a $360^\circ$ dual-fisheye camera mounted on different aerial vehic…
▽ More
In this article we present a visual gyroscope based on equirectangular panoramas. We propose a new pipeline where we take advantage of combining three different methods to obtain a robust and accurate estimation of the attitude of the camera. We quantitatively and qualitatively validate our method on two image sequences taken with a $360^\circ$ dual-fisheye camera mounted on different aerial vehicles.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Convolution kernel adaptation to calibrated fisheye
Authors:
Bruno Berenguel-Baeta,
Maria Santos-Villafranca,
Jesus Bermudez-Cameo,
Alejandro Perez-Yus,
Jose J. Guerrero
Abstract:
Convolution kernels are the basic structural component of convolutional neural networks (CNNs). In the last years there has been a growing interest in fisheye cameras for many applications. However, the radially symmetric projection model of these cameras produces high distortions that affect the performance of CNNs, especially when the field of view is very large. In this work, we tackle this pro…
▽ More
Convolution kernels are the basic structural component of convolutional neural networks (CNNs). In the last years there has been a growing interest in fisheye cameras for many applications. However, the radially symmetric projection model of these cameras produces high distortions that affect the performance of CNNs, especially when the field of view is very large. In this work, we tackle this problem by proposing a method that leverages the calibration of cameras to deform the convolution kernel accordingly and adapt to the distortion. That way, the receptive field of the convolution is similar to standard convolutions in perspective images, allowing us to take advantage of pre-trained networks in large perspective datasets. We show how, with just a brief fine-tuning stage in a small dataset, we improve the performance of the network for the calibrated fisheye with respect to standard convolutions in depth estimation and semantic segmentation.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Non-central panorama indoor dataset
Authors:
Bruno Berenguel-Baeta,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
Omnidirectional images are one of the main sources of information for learning based scene understanding algorithms. However, annotated datasets of omnidirectional images cannot keep the pace of these learning based algorithms development. Among the different panoramas and in contrast to standard central ones, non-central panoramas provide geometrical information in the distortion of the image fro…
▽ More
Omnidirectional images are one of the main sources of information for learning based scene understanding algorithms. However, annotated datasets of omnidirectional images cannot keep the pace of these learning based algorithms development. Among the different panoramas and in contrast to standard central ones, non-central panoramas provide geometrical information in the distortion of the image from which we can retrieve 3D information of the environment [2]. However, due to the lack of commercial non-central devices, up until now there was no dataset of these kinds of panoramas. In this data paper, we present the first dataset of non-central panoramas for indoor scene understanding. The dataset is composed by {\bf 2574} RGB non-central panoramas taken in around 650 different rooms. Each panorama has associated a depth map and annotations to obtain the layout of the room from the image as a structural edge map, list of corners in the image, the 3D corners of the room and the camera pose. The images are taken from photorealistic virtual environments and pixel-wise automatically annotated.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision
Authors:
Bruno Berenguel-Baeta,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
Omnidirectional and 360° images are becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. Their wide field of view allows the gathering of a great amount of information about the environment from only an image. However, the distortion of these images requires the development of specific algorithms for their treatment and interpretation.…
▽ More
Omnidirectional and 360° images are becoming widespread in industry and in consumer society, causing omnidirectional computer vision to gain attention. Their wide field of view allows the gathering of a great amount of information about the environment from only an image. However, the distortion of these images requires the development of specific algorithms for their treatment and interpretation. Moreover, a high number of images is essential for the correct training of computer vision algorithms based on learning. In this paper, we present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4 through an interface plugin. We gather a variety of well-known projection models such as equirectangular and cylindrical panoramas, different fish-eye lenses, catadioptric systems, and empiric models. Furthermore, we include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems. As far as we know, this is the first reported tool for generating photorealistic non-central images in the literature. Moreover, since the omnidirectional images are made virtually, we provide pixel-wise information about semantics and depth as well as perfect knowledge of the calibration parameters of the cameras. This allows the creation of ground-truth information with pixel precision for training learning algorithms and testing 3D vision approaches. To validate the proposed tool, different computer vision algorithms are tested as line extractions from dioptric and catadioptric central images, 3D Layout recovery and SLAM using equirectangular panoramas, and 3D reconstruction from non-central panoramas.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Atlanta Scaled layouts from non-central panoramas
Authors:
Bruno Berenguel-Baeta,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
In this work we present a novel approach for 3D layout recovery of indoor environments using a non-central acquisition system. From a non-central panorama, full and scaled 3D lines can be independently recovered by geometry reasoning without geometric nor scale assumptions. However, their sensitivity to noise and complex geometric modeling has led these panoramas being little investigated. Our new…
▽ More
In this work we present a novel approach for 3D layout recovery of indoor environments using a non-central acquisition system. From a non-central panorama, full and scaled 3D lines can be independently recovered by geometry reasoning without geometric nor scale assumptions. However, their sensitivity to noise and complex geometric modeling has led these panoramas being little investigated. Our new pipeline aims to extract the boundaries of the structural lines of an indoor environment with a neural network and exploit the properties of non-central projection systems in a new geometrical processing to recover an scaled 3D layout. The results of our experiments show that we improve state-of-the-art methods for layout reconstruction and line extraction in non-central projection systems. We completely solve the problem in Manhattan and Atlanta environments, handling occlusions and retrieving the metric scale of the room without extra measurements. As far as the authors knowledge goes, our approach is the first work using deep learning on non-central panoramas and recovering scaled layouts from single panoramas.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Floor extraction and door detection for visually impaired guidance
Authors:
Bruno Berenguel-Baeta,
Manuel Guerrero-Viu,
Alejandro de Nova,
Jesus Bermudez-Cameo,
Alejandro Perez-Yus,
Jose J. Guerrero
Abstract:
Finding obstacle-free paths in unknown environments is a big navigation issue for visually impaired people and autonomous robots. Previous works focus on obstacle avoidance, however they do not have a general view of the environment they are moving in. New devices based on computer vision systems can help impaired people to overcome the difficulties of navigating in unknown environments in safe co…
▽ More
Finding obstacle-free paths in unknown environments is a big navigation issue for visually impaired people and autonomous robots. Previous works focus on obstacle avoidance, however they do not have a general view of the environment they are moving in. New devices based on computer vision systems can help impaired people to overcome the difficulties of navigating in unknown environments in safe conditions. In this work it is proposed a combination of sensors and algorithms that can lead to the building of a navigation system for visually impaired people. Based on traditional systems that use RGB-D cameras for obstacle avoidance, it is included and combined the information of a fish-eye camera, which will give a better understanding of the user's surroundings. The combination gives robustness and reliability to the system as well as a wide field of view that allows to obtain many information from the environment. This combination of sensors is inspired by human vision where the center of the retina (fovea) provides more accurate information than the periphery, where humans have a wider field of view. The proposed system is mounted on a wearable device that provides the obstacle-free zones of the scene, allowing the planning of trajectories for people guidance.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Multi-label affordance mapping from egocentric vision
Authors:
Lorenzo Mur-Labadia,
Jose J. Guerrero,
Ruben Martinez-Cantin
Abstract:
Accurate affordance detection and segmentation with pixel precision is an important piece in many complex systems based on interactions, such as robots and assitive devices. We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos of interactions using a 3D map…
▽ More
Accurate affordance detection and segmentation with pixel precision is an important piece in many complex systems based on interactions, such as robots and assitive devices. We present a new approach to affordance perception which enables accurate multi-label segmentation. Our approach can be used to automatically extract grounded affordances from first person videos of interactions using a 3D map of the environment providing pixel level precision for the affordance location. We use this method to build the largest and most complete dataset on affordances based on the EPIC-Kitchen dataset, EPIC-Aff, which provides interaction-grounded, multi-label, metric and spatial affordance annotations. Then, we propose a new approach to affordance segmentation based on multi-label detection which enables multiple affordances to co-exists in the same space, for example if they are associated with the same object. We present several strategies of multi-label detection using several segmentation architectures. The experimental results highlight the importance of the multi-label detection. Finally, we show how our metric representation can be exploited for build a map of interaction hotspots in spatial action-centric zones and use that representation to perform a task-oriented navigation.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Bayesian Deep Learning for Affordance Segmentation in images
Authors:
Lorenzo Mur-Labadia,
Ruben Martinez-Cantin,
Jose J. Guerrero
Abstract:
Affordances are a fundamental concept in robotics since they relate available actions for an agent depending on its sensory-motor capabilities and the environment. We present a novel Bayesian deep network to detect affordances in images, at the same time that we quantify the distribution of the aleatoric and epistemic variance at the spatial level. We adapt the Mask-RCNN architecture to learn a pr…
▽ More
Affordances are a fundamental concept in robotics since they relate available actions for an agent depending on its sensory-motor capabilities and the environment. We present a novel Bayesian deep network to detect affordances in images, at the same time that we quantify the distribution of the aleatoric and epistemic variance at the spatial level. We adapt the Mask-RCNN architecture to learn a probabilistic representation using Monte Carlo dropout. Our results outperform the state-of-the-art of deterministic networks. We attribute this improvement to a better probabilistic feature space representation on the encoder and the Bayesian variability induced at the mask generation, which adapts better to the object contours. We also introduce the new Probability-based Mask Quality measure that reveals the semantic and spatial differences on a probabilistic instance segmentation model. We modify the existing Probabilistic Detection Quality metric by comparing the binary masks rather than the predicted bounding boxes, achieving a finer-grained evaluation of the probabilistic segmentation. We find aleatoric variance in the contours of the objects due to the camera noise, while epistemic variance appears in visual challenging pixels.
△ Less
Submitted 1 March, 2023;
originally announced March 2023.
-
Mutation-Based Adversarial Attacks on Neural Text Detectors
Authors:
Gongbo Liang,
Jesus Guerrero,
Izzat Alsmadi
Abstract:
Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose characte…
▽ More
Neural text detectors aim to decide the characteristics that distinguish neural (machine-generated) from human texts. To challenge such detectors, adversarial attacks can alter the statistical characteristics of the generated text, making the detection task more and more difficult. Inspired by the advances of mutation analysis in software development and testing, in this paper, we propose character- and word-based mutation operators for generating adversarial samples to attack state-of-the-art natural text detectors. This falls under white-box adversarial attacks. In such attacks, attackers have access to the original text and create mutation instances based on this original text. The ultimate goal is to confuse machine learning models and classifiers and decrease their prediction accuracy.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
A Mutation-based Text Generation for Adversarial Machine Learning Applications
Authors:
Jesus Guerrero,
Gongbo Liang,
Izzat Alsmadi
Abstract:
Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated te…
▽ More
Many natural language related applications involve text generation, created by humans or machines. While in many of those applications machines support humans, yet in few others, (e.g. adversarial machine learning, social bots and trolls) machines try to impersonate humans. In this scope, we proposed and evaluated several mutation-based text generation approaches. Unlike machine-based generated text, mutation-based generated text needs human text samples as inputs. We showed examples of mutation operators but this work can be extended in many aspects such as proposing new text-based mutation operators based on the nature of the application.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Synthetic Text Detection: Systemic Literature Review
Authors:
Jesus Guerrero,
Izzat Alsmadi
Abstract:
Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the…
▽ More
Within the text analysis and processing fields, generated text attacks have been made easier to create than ever before. To combat these attacks open sourcing models and datasets have become a major trend to create automated detection algorithms in defense of authenticity. For this purpose, synthetic text detection has become an increasingly viable topic of research. This review is written for the purpose of creating a snapshot of the state of current literature and easing the barrier to entry for future authors. Towards that goal, we identified few research trends and challenges in this field.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
FreDSNet: Joint Monocular Depth and Semantic Segmentation with Fast Fourier Convolutions
Authors:
Bruno Berenguel-Baeta,
Jesus Bermudez-Cameo,
Jose J. Guerrero
Abstract:
In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images…
▽ More
In this work we present FreDSNet, a deep learning solution which obtains semantic 3D understanding of indoor environments from single panoramas. Omnidirectional images reveal task-specific advantages when addressing scene understanding problems due to the 360-degree contextual information about the entire environment they provide. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequential domain obtaining a wider receptive field in each convolutional layer. These convolutions allow to leverage the whole context information from omnidirectional images. FreDSNet is the first network that jointly provides monocular depth estimation and semantic segmentation from a single panoramic image exploiting fast Fourier convolutions. Our experiments show that FreDSNet has similar performance as specific state of the art methods for semantic segmentation and depth estimation. FreDSNet code is publicly available in https://github.com/Sbrunoberenguel/FreDSNet
△ Less
Submitted 5 February, 2024; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Portable Multi-Hypothesis Monte Carlo Localization for Mobile Robots
Authors:
Alberto Garcia,
Francisco Martin,
Jose Miguel Guerrero,
Francisco J. Rodriguez,
Vicente Matellan
Abstract:
Self-localization is a fundamental capability that mobile robot navigation systems integrate to move from one point to another using a map. Thus, any enhancement in localization accuracy is crucial to perform delicate dexterity tasks. This paper describes a new location that maintains several populations of particles using the Monte Carlo Localization (MCL) algorithm, always choosing the best one…
▽ More
Self-localization is a fundamental capability that mobile robot navigation systems integrate to move from one point to another using a map. Thus, any enhancement in localization accuracy is crucial to perform delicate dexterity tasks. This paper describes a new location that maintains several populations of particles using the Monte Carlo Localization (MCL) algorithm, always choosing the best one as the sytems's output. As novelties, our work includes a multi-scale match matching algorithm to create new MCL populations and a metric to determine the most reliable. It also contributes the state-of-the-art implementations, enhancing recovery times from erroneous estimates or unknown initial positions. The proposed method is evaluated in ROS2 in a module fully integrated with Nav2 and compared with the current state-of-the-art Adaptive ACML solution, obtaining good accuracy and recovery times.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Assessing visual acuity in visual prostheses through a virtual-reality system
Authors:
Melani Sanchez-Garcia,
Roberto Morollon-Ruiz,
Ruben Martinez-Cantin,
Jose J. Guerrero,
Eduardo Fernandez-Jover
Abstract:
Current visual implants still provide very low resolution and limited field of view, thus limiting visual acuity in implanted patients. Developments of new strategies of artificial vision simulation systems by harnessing new advancements in technologies are of upmost priorities for the development of new visual devices. In this work, we take advantage of virtual-reality software paired with a port…
▽ More
Current visual implants still provide very low resolution and limited field of view, thus limiting visual acuity in implanted patients. Developments of new strategies of artificial vision simulation systems by harnessing new advancements in technologies are of upmost priorities for the development of new visual devices. In this work, we take advantage of virtual-reality software paired with a portable head-mounted display and evaluated the performance of normally sighted participants under simulated prosthetic vision with variable field of view and number of pixels. Our simulated prosthetic vision system allows simple experimentation in order to study the design parameters of future visual prostheses. Ten normally sighted participants volunteered for a visual acuity study. Subjects were required to identify computer-generated Landolt-C gap orientation and different stimulus based on light perception, time-resolution, light location and motion perception commonly used for visual acuity examination in the sighted. Visual acuity scores were recorded across different conditions of number of electrodes and size of field of view. Our results showed that of all conditions tested, a field of view of 20° and 1000 phosphenes of resolution proved the best, with a visual acuity of 1.3 logMAR. Furthermore, performance appears to be correlated with phosphene density, but showing a diminishing return when field of view is less than 20°. The development of new artificial vision simulation systems can be useful to guide the development of new visual devices and the optimization of field of view and resolution to provide a helpful and valuable visual aid to profoundly or totally blind patients.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
Semantic Pose Verification for Outdoor Visual Localization with Self-supervised Contrastive Learning
Authors:
Semih Orhan,
Jose J. Guerrero,
Yalin Bastanlar
Abstract:
Any city-scale visual localization system has to overcome long-term appearance changes, such as varying illumination conditions or seasonal changes between query and database images. Since semantic content is more robust to such changes, we exploit semantic information to improve visual localization. In our scenario, the database consists of gnomonic views generated from panoramic images (e.g. Goo…
▽ More
Any city-scale visual localization system has to overcome long-term appearance changes, such as varying illumination conditions or seasonal changes between query and database images. Since semantic content is more robust to such changes, we exploit semantic information to improve visual localization. In our scenario, the database consists of gnomonic views generated from panoramic images (e.g. Google Street View) and query images are collected with a standard field-of-view camera at a different time. To improve localization, we check the semantic similarity between query and database images, which is not trivial since the position and viewpoint of the cameras do not exactly match. To learn similarity, we propose training a CNN in a self-supervised fashion with contrastive learning on a dataset of semantically segmented images. With experiments we showed that this semantic similarity estimation approach works better than measuring the similarity at pixel-level. Finally, we used the semantic similarity scores to verify the retrievals obtained by a state-of-the-art visual localization method and observed that contrastive learning-based pose verification increases top-1 recall value to 0.90 which corresponds to a 2% improvement.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Augmented reality navigation system for visual prosthesis
Authors:
Melani Sanchez-Garcia,
Alejandro Perez-Yus,
Ruben Martinez-Cantin,
Jose J. Guerrero
Abstract:
The visual functions of visual prostheses such as field of view, resolution and dynamic range, seriously restrict the person's ability to navigate in unknown environments. Implanted patients still require constant assistance for navigating from one location to another. Hence, there is a need for a system that is able to assist them safely during their journey. In this work, we propose an augmented…
▽ More
The visual functions of visual prostheses such as field of view, resolution and dynamic range, seriously restrict the person's ability to navigate in unknown environments. Implanted patients still require constant assistance for navigating from one location to another. Hence, there is a need for a system that is able to assist them safely during their journey. In this work, we propose an augmented reality navigation system for visual prosthesis that incorporates a software of reactive navigation and path planning which guides the subject through convenient, obstacle-free route. It consists on four steps: locating the subject on a map, planning the subject trajectory, showing it to the subject and re-planning without obstacles. We have also designed a simulated prosthetic vision environment which allows us to systematically study navigation performance. Twelve subjects participated in the experiment. Subjects were guided by the augmented reality navigation system and their instruction was to navigate through different environments until they reached two goals, cross the door and find an object (bin), as fast and accurately as possible. Results show how our augmented navigation system help navigation performance by reducing the time and distance to reach the goals, even significantly reducing the number of obstacles collisions, compared to other baseline methods.
△ Less
Submitted 30 September, 2021;
originally announced September 2021.
-
Unsupervised Learning of Category-Specific Symmetric 3D Keypoints from Point Sets
Authors:
Clara Fernandez-Labrador,
Ajad Chhatkuli,
Danda Pani Paudel,
Jose J. Guerrero,
Cédric Demonceaux,
Luc Van Gool
Abstract:
Automatic discovery of category-specific 3D keypoints from a collection of objects of some category is a challenging problem. One reason is that not all objects in a category necessarily have the same semantic parts. The level of difficulty adds up further when objects are represented by 3D point clouds, with variations in shape and unknown coordinate frames. We define keypoints to be category-spe…
▽ More
Automatic discovery of category-specific 3D keypoints from a collection of objects of some category is a challenging problem. One reason is that not all objects in a category necessarily have the same semantic parts. The level of difficulty adds up further when objects are represented by 3D point clouds, with variations in shape and unknown coordinate frames. We define keypoints to be category-specific, if they meaningfully represent objects' shape and their correspondences can be simply established order-wise across all objects. This paper aims at learning category-specific 3D keypoints, in an unsupervised manner, using a collection of misaligned 3D point clouds of objects from an unknown category. In order to do so, we model shapes defined by the keypoints, within a category, using the symmetric linear basis shapes without assuming the plane of symmetry to be known. The usage of symmetry prior leads us to learn stable keypoints suitable for higher misalignments. To the best of our knowledge, this is the first work on learning such keypoints directly from 3D point clouds. Using categories from four benchmark datasets, we demonstrate the quality of our learned keypoints by quantitative and qualitative evaluations. Our experiments also show that the keypoints discovered by our method are geometrically and semantically consistent.
△ Less
Submitted 6 January, 2021; v1 submitted 17 March, 2020;
originally announced March 2020.
-
What's in my Room? Object Recognition on Indoor Panoramic Images
Authors:
Julia Guerrero-Viu,
Clara Fernandez-Labrador,
Cédric Demonceaux,
Jose J. Guerrero
Abstract:
In the last few years, there has been a growing interest in taking advantage of the 360 panoramic images potential, while managing the new challenges they imply. While several tasks have been improved thanks to the contextual information these images offer, object recognition in indoor scenes still remains a challenging problem that has not been deeply investigated. This paper provides an object r…
▽ More
In the last few years, there has been a growing interest in taking advantage of the 360 panoramic images potential, while managing the new challenges they imply. While several tasks have been improved thanks to the contextual information these images offer, object recognition in indoor scenes still remains a challenging problem that has not been deeply investigated. This paper provides an object recognition system that performs object detection and semantic segmentation tasks by using a deep learning model adapted to match the nature of equirectangular images. From these results, instance segmentation masks are recovered, refined and transformed into 3D bounding boxes that are placed into the 3D model of the room. Quantitative and qualitative results support that our method outperforms the state of the art by a large margin and show a complete understanding of the main objects in indoor scenes.
△ Less
Submitted 18 February, 2021; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Corners for Layout: End-to-End Layout Recovery from 360 Images
Authors:
Clara Fernandez-Labrador,
Jose M. Facil,
Alejandro Perez-Yus,
Cédric Demonceaux,
Javier Civera,
Jose J. Guerrero
Abstract:
The problem of 3D layout recovery in indoor scenes has been a core research topic for over a decade. However, there are still several major challenges that remain unsolved. Among the most relevant ones, a major part of the state-of-the-art methods make implicit or explicit assumptions on the scenes -- e.g. box-shaped or Manhattan layouts. Also, current methods are computationally expensive and not…
▽ More
The problem of 3D layout recovery in indoor scenes has been a core research topic for over a decade. However, there are still several major challenges that remain unsolved. Among the most relevant ones, a major part of the state-of-the-art methods make implicit or explicit assumptions on the scenes -- e.g. box-shaped or Manhattan layouts. Also, current methods are computationally expensive and not suitable for real-time applications like robot navigation and AR/VR. In this work we present CFL (Corners for Layout), the first end-to-end model for 3D layout recovery on 360 images. Our experimental results show that we outperform the state of the art relaxing assumptions about the scene and at a lower cost. We also show that our model generalizes better to camera position variations than conventional approaches by using EquiConvs, a type of convolution applied directly on the sphere projection and hence invariant to the equirectangular distortions.
CFL Webpage: https://cfernandezlab.github.io/CFL/
△ Less
Submitted 25 March, 2019; v1 submitted 19 March, 2019;
originally announced March 2019.
-
Semantic and structural image segmentation for prosthetic vision
Authors:
Melani Sanchez-Garcia,
Ruben Martinez-Cantin,
Jose J. Guerrero
Abstract:
Prosthetic vision is being applied to partially recover the retinal stimulation of visually impaired people. However, the phosphenic images produced by the implants have very limited information bandwidth due to the poor resolution and lack of color or contrast. The ability of object recognition and scene understanding in real environments is severely restricted for prosthetic users. Computer visi…
▽ More
Prosthetic vision is being applied to partially recover the retinal stimulation of visually impaired people. However, the phosphenic images produced by the implants have very limited information bandwidth due to the poor resolution and lack of color or contrast. The ability of object recognition and scene understanding in real environments is severely restricted for prosthetic users. Computer vision can play a key role to overcome the limitations and to optimize the visual information in the simulated prosthetic vision, improving the amount of information that is presented. We present a new approach to build a schematic representation of indoor environments for phosphene images. The proposed method combines a variety of convolutional neural networks for extracting and conveying relevant information about the scene such as structural informative edges of the environment and silhouettes of segmented objects. Experiments were conducted with normal sighted subjects with a Simulated Prosthetic Vision system. The results show good accuracy for object recognition and room identification tasks for indoor scenes using the proposed approach, compared to other image processing methods.
△ Less
Submitted 28 January, 2025; v1 submitted 25 September, 2018;
originally announced September 2018.
-
PanoRoom: From the Sphere to the 3D Layout
Authors:
Clara Fernandez-Labrador,
Jose M. Facil,
Alejandro Perez-Yus,
Cedric Demonceaux,
Jose J. Guerrero
Abstract:
We propose a novel FCN able to work with omnidirectional images that outputs accurate probability maps representing the main structure of indoor scenes, which is able to generalize on different data. Our approach handles occlusions and recovers complex shaped rooms more faithful to the actual shape of the real scenes. We outperform the state of the art not only in accuracy of the 3D models but als…
▽ More
We propose a novel FCN able to work with omnidirectional images that outputs accurate probability maps representing the main structure of indoor scenes, which is able to generalize on different data. Our approach handles occlusions and recovers complex shaped rooms more faithful to the actual shape of the real scenes. We outperform the state of the art not only in accuracy of the 3D models but also in speed.
△ Less
Submitted 29 August, 2018;
originally announced August 2018.
-
RGBiD-SLAM for Accurate Real-time Localisation and 3D Mapping
Authors:
Daniel Gutierrez-Gomez,
Jose J. Guerrero
Abstract:
In this paper we present a complete SLAM system for RGB-D cameras, namely RGB-iD SLAM. The presented approach is a dense direct SLAM method with the main characteristic of working with the depth maps in inverse depth parametrisation for the routines of dense alignment or keyframe fusion. The system consists in 2 CPU threads working in parallel, which share the use of the GPU for dense alignment an…
▽ More
In this paper we present a complete SLAM system for RGB-D cameras, namely RGB-iD SLAM. The presented approach is a dense direct SLAM method with the main characteristic of working with the depth maps in inverse depth parametrisation for the routines of dense alignment or keyframe fusion. The system consists in 2 CPU threads working in parallel, which share the use of the GPU for dense alignment and keyframe fusion routines. The first thread is a front-end operating at frame rate, which processes every incoming frame from the RGB-D sensor to compute the incremental odometry and integrate it in a keyframe which is changed periodically following a covisibility-based strategy. The second thread is a back-end which receives keyframes from the front-end. This thread is in charge of segmenting the keyframes based on their structure, describing them using Bags of Words, trying to find potential loop closures with previous keyframes, and in such case perform pose-graph optimisation for trajectory correction. In addition, our system allows is able to compute the odometry both with unregistered and registered depth maps, allowing to use customised calibrations of the RGB-D sensor. As a consequence in the paper we also propose a detailed calibration pipeline to compute customised calibrations for particular RGB-D cameras. The experiments with our approach in the TUM RGB-D benchmark datasets show results superior in accuracy to the state-of-the-art in many of the sequences. The code has been made available on-line for research purposes https://github.com/dangut/RGBiD-SLAM.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
Layouts from Panoramic Images with Geometry and Deep Learning
Authors:
Clara Fernandez-Labrador,
Alejandro Perez-Yus,
Gonzalo Lopez-Nicolas,
Jose J. Guerrero
Abstract:
In this paper, we propose a novel procedure for 3D layout recovery of indoor scenes from single 360 degrees panoramic images. With such images, all scene is seen at once, allowing to recover closed geometries. Our method combines strategically the accuracy provided by geometric reasoning (lines and vanishing points) with the higher level of data abstraction and pattern recognition achieved by deep…
▽ More
In this paper, we propose a novel procedure for 3D layout recovery of indoor scenes from single 360 degrees panoramic images. With such images, all scene is seen at once, allowing to recover closed geometries. Our method combines strategically the accuracy provided by geometric reasoning (lines and vanishing points) with the higher level of data abstraction and pattern recognition achieved by deep learning techniques (edge and normal maps). Thus, we extract structural corners from which we generate layout hypotheses of the room assuming Manhattan world. The best layout model is selected, achieving good performance on both simple rooms (box-type) and complex shaped rooms (with more than four walls). Experiments of the proposed approach are conducted within two public datasets, SUN360 and Stanford (2D-3D-S) demonstrating the advantages of estimating layouts by combining geometry and deep learning and the effectiveness of our proposal with respect to the state of the art.
△ Less
Submitted 21 June, 2018;
originally announced June 2018.
-
Fast and Robust Fixed-Rank Matrix Recovery
Authors:
German Ros,
Julio Guerrero
Abstract:
We address the problem of efficient sparse fixed-rank (S-FR) matrix decomposition, i.e., splitting a corrupted matrix $M$ into an uncorrupted matrix $L$ of rank $r$ and a sparse matrix of outliers $S$. Fixed-rank constraints are usually imposed by the physical restrictions of the system under study. Here we propose a method to perform accurate and very efficient S-FR decomposition that is more sui…
▽ More
We address the problem of efficient sparse fixed-rank (S-FR) matrix decomposition, i.e., splitting a corrupted matrix $M$ into an uncorrupted matrix $L$ of rank $r$ and a sparse matrix of outliers $S$. Fixed-rank constraints are usually imposed by the physical restrictions of the system under study. Here we propose a method to perform accurate and very efficient S-FR decomposition that is more suitable for large-scale problems than existing approaches. Our method is a grateful combination of geometrical and algebraical techniques, which avoids the bottleneck caused by the Truncated SVD (TSVD). Instead, a polar factorization is used to exploit the manifold structure of fixed-rank problems as the product of two Stiefel and an SPD manifold, leading to a better convergence and stability. Then, closed-form projectors help to speed up each iteration of the method. We introduce a novel and fast projector for the $\text{SPD}$ manifold and a proof of its validity. Further acceleration is achieved using a Nystrom scheme. Extensive experiments with synthetic and real data in the context of robust photometric stereo and spectral clustering show that our proposals outperform the state of the art.
△ Less
Submitted 25 March, 2015; v1 submitted 10 March, 2015;
originally announced March 2015.
-
Motion Estimation via Robust Decomposition with Constrained Rank
Authors:
German Ros,
Jose Alvarez,
Julio Guerrero
Abstract:
In this work, we address the problem of outlier detection for robust motion estimation by using modern sparse-low-rank decompositions, i.e., Robust PCA-like methods, to impose global rank constraints. Robust decompositions have shown to be good at splitting a corrupted matrix into an uncorrupted low-rank matrix and a sparse matrix, containing outliers. However, this process only works when matrice…
▽ More
In this work, we address the problem of outlier detection for robust motion estimation by using modern sparse-low-rank decompositions, i.e., Robust PCA-like methods, to impose global rank constraints. Robust decompositions have shown to be good at splitting a corrupted matrix into an uncorrupted low-rank matrix and a sparse matrix, containing outliers. However, this process only works when matrices have relatively low rank with respect to their ambient space, a property not met in motion estimation problems. As a solution, we propose to exploit the partial information present in the decomposition to decide which matches are outliers. We provide evidences showing that even when it is not possible to recover an uncorrupted low-rank matrix, the resulting information can be exploited for outlier detection. To this end we propose the Robust Decomposition with Constrained Rank (RD-CR), a proximal gradient based method that enforces the rank constraints inherent to motion estimation. We also present a general framework to perform robust estimation for stereo Visual Odometry, based on our RD-CR and a simple but effective compressed optimization method that achieves high performance. Our evaluation on synthetic data and on the KITTI dataset demonstrates the applicability of our approach in complex scenarios and it yields state-of-the-art performance.
△ Less
Submitted 22 October, 2014;
originally announced October 2014.
-
Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)
Authors:
Aitor Aladren,
Sasa Bodiroza,
Hamidreza Chitsaz,
J. J. Guerrero,
Verena Hafner,
Kris Hauser,
Aleksandar Jevtic,
Moslem Kazemi,
Bruno Lara,
Gonzalo Lopez-Nicolas,
Peer Neubert,
Peter Protzel,
Laurel D. Riek,
Niko Sunderhauf,
Chee Yap
Abstract:
Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)
Proceedings of the 1st Workshop on Robotics Challenges and Vision (RCV2013)
△ Less
Submitted 13 February, 2014;
originally announced February 2014.