-
Learning Normal Flow Directly From Event Neighborhoods
Authors:
Dehao Yuan,
Levi Burner,
Jiayi Wu,
Minghui Liu,
Jingxi Chen,
Yiannis Aloimonos,
Cornelia Fermüller
Abstract:
Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more relia…
▽ More
Event-based motion field estimation is an important task. However, current optical flow methods face challenges: learning-based approaches, often frame-based and relying on CNNs, lack cross-domain transferability, while model-based methods, though more robust, are less accurate. To address the limitations of optical flow estimation, recent works have focused on normal flow, which can be more reliably measured in regions with limited texture or strong edges. However, existing normal flow estimators are predominantly model-based and suffer from high errors.
In this paper, we propose a novel supervised point-based method for normal flow estimation that overcomes the limitations of existing event learning-based approaches. Using a local point cloud encoder, our method directly estimates per-event normal flow from raw events, offering multiple unique advantages: 1) It produces temporally and spatially sharp predictions. 2) It supports more diverse data augmentation, such as random rotation, to improve robustness across various domains. 3) It naturally supports uncertainty quantification via ensemble inference, which benefits downstream tasks. 4) It enables training and inference on undistorted data in normalized camera coordinates, improving transferability across cameras. Extensive experiments demonstrate our method achieves better and more consistent performance than state-of-the-art methods when transferred across different datasets. Leveraging this transferability, we train our model on the union of datasets and release it for public use. Finally, we introduce an egomotion solver based on a maximum-margin problem that uses normal flow and IMU to achieve strong performance in challenging scenarios.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
Authors:
Jingxi Chen,
Brandon Y. Feng,
Haoming Cai,
Tianfu Wang,
Levi Burner,
Dehao Yuan,
Cornelia Fermuller,
Christopher A. Metzler,
Yiannis Aloimonos
Abstract:
Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion gui…
▽ More
Video Frame Interpolation aims to recover realistic missing frames between observed frames, generating a high-frame-rate video from a low-frame-rate video. However, without additional guidance, the large motion between frames makes this problem ill-posed. Event-based Video Frame Interpolation (EVFI) addresses this challenge by using sparse, high-temporal-resolution event measurements as motion guidance. This guidance allows EVFI methods to significantly outperform frame-only methods. However, to date, EVFI methods have relied on a limited set of paired event-frame training data, severely limiting their performance and generalization capabilities. In this work, we overcome the limited data challenge by adapting pre-trained video diffusion models trained on internet-scale datasets to EVFI. We experimentally validate our approach on real-world EVFI datasets, including a new one that we introduce. Our method outperforms existing methods and generalizes across cameras far better than existing approaches.
△ Less
Submitted 25 March, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Extremum Seeking Controlled Wiggling for Tactile Insertion
Authors:
Levi Burner,
Pavan Mantripragada,
Gabriele M. Caddeo,
Lorenzo Natale,
Cornelia Fermüller,
Yiannis Aloimonos
Abstract:
When humans perform insertion tasks such as inserting a cup into a cupboard, routing a cable, or key insertion, they wiggle the object and observe the process through tactile and proprioceptive feedback. While recent advances in tactile sensors have resulted in tactile-based approaches, there has not been a generalized formulation based on wiggling similar to human behavior. Thus, we propose an ex…
▽ More
When humans perform insertion tasks such as inserting a cup into a cupboard, routing a cable, or key insertion, they wiggle the object and observe the process through tactile and proprioceptive feedback. While recent advances in tactile sensors have resulted in tactile-based approaches, there has not been a generalized formulation based on wiggling similar to human behavior. Thus, we propose an extremum-seeking control law that can insert four keys into four types of locks without control parameter tuning despite significant variation in lock type. The resulting model-free formulation wiggles the end effector pose to maximize insertion depth while minimizing strain as measured by a GelSight Mini tactile sensor that grasps a key. The algorithm achieves a 71\% success rate over 120 randomly initialized trials with uncertainty in both translation and orientation. Over 240 deterministically initialized trials, where only one translation or rotation parameter is perturbed, 84\% of trials succeeded. Given tactile feedback at 13 Hz, the mean insertion time for these groups of trials are 262 and 147 seconds respectively.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Embodied Visuomotor Representation
Authors:
Levi Burner,
Cornelia Fermüller,
Yiannis Aloimonos
Abstract:
Imagine sitting at your desk, looking at various objects on it. While you do not know their exact distances from your eye in meters, you can reach out and touch them. Instead of an externally defined unit, your sense of distance is inherently tied to your action's effect on your embodiment. In contrast, conventional robotics relies on precise calibration to external units with which separate visio…
▽ More
Imagine sitting at your desk, looking at various objects on it. While you do not know their exact distances from your eye in meters, you can reach out and touch them. Instead of an externally defined unit, your sense of distance is inherently tied to your action's effect on your embodiment. In contrast, conventional robotics relies on precise calibration to external units with which separate vision and control processes communicate. This necessitates highly engineered and expensive systems that cannot be easily reconfigured.
To address this, we introduce Embodied Visuomotor Representation, a methodology through which robots infer distance in a unit implied by their actions. That is, without depending on calibrated 3D sensors or known physical models. With it, we demonstrate that a robot without prior knowledge of its size, environmental scale, or strength can quickly learn to touch and clear obstacles within seconds of operation. Likewise, in simulation, an agent without knowledge of its mass or strength can successfully jump across a gap of unknown size after a few test oscillations. These behaviors mirror natural strategies observed in bees and gerbils, which also lack calibration in an external unit, and highlight the potential for action-driven perception in robotics.
△ Less
Submitted 24 April, 2025; v1 submitted 30 September, 2024;
originally announced October 2024.
-
EVIMO2: An Event Camera Dataset for Motion Segmentation, Optical Flow, Structure from Motion, and Visual Inertial Odometry in Indoor Scenes with Monocular or Stereo Algorithms
Authors:
Levi Burner,
Anton Mitrokhin,
Cornelia Fermüller,
Yiannis Aloimonos
Abstract:
A new event camera dataset, EVIMO2, is introduced that improves on the popular EVIMO dataset by providing more data, from better cameras, in more complex scenarios. As with its predecessor, EVIMO2 provides labels in the form of per-pixel ground truth depth and segmentation as well as camera and object poses. All sequences use data from physical cameras and many sequences feature multiple independe…
▽ More
A new event camera dataset, EVIMO2, is introduced that improves on the popular EVIMO dataset by providing more data, from better cameras, in more complex scenarios. As with its predecessor, EVIMO2 provides labels in the form of per-pixel ground truth depth and segmentation as well as camera and object poses. All sequences use data from physical cameras and many sequences feature multiple independently moving objects. Typically, such labeled data is unavailable in physical event camera datasets. Thus, EVIMO2 will serve as a challenging benchmark for existing algorithms and rich training set for the development of new algorithms. In particular, EVIMO2 is suited for supporting research in motion and object segmentation, optical flow, structure from motion, and visual (inertial) odometry in both monocular or stereo configurations.
EVIMO2 consists of 41 minutes of data from three 640$\times$480 event cameras, one 2080$\times$1552 classical color camera, inertial measurements from two six axis inertial measurement units, and millimeter accurate object poses from a Vicon motion capture system. The dataset's 173 sequences are arranged into three categories. 3.75 minutes of independently moving household objects, 22.55 minutes of static scenes, and 14.85 minutes of basic motions in shallow scenes. Some sequences were recorded in low-light conditions where conventional cameras fail. Depth and segmentation are provided at 60 Hz for the event cameras and 30 Hz for the classical camera. The masks can be regenerated using open-source code up to rates as high as 200 Hz.
This technical report briefly describes EVIMO2. The full documentation is available online. Videos of individual sequences can be sampled on the download page.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
TTCDist: Fast Distance Estimation From an Active Monocular Camera Using Time-to-Contact
Authors:
Levi Burner,
Nitin J. Sanket,
Cornelia Fermüller,
Yiannis Aloimonos
Abstract:
Distance estimation from vision is fundamental for a myriad of robotic applications such as navigation, manipulation, and planning. Inspired by the mammal's visual system, which gazes at specific objects, we develop two novel constraints relating time-to-contact, acceleration, and distance that we call the $τ$-constraint and $Φ$-constraint. They allow an active (moving) camera to estimate depth ef…
▽ More
Distance estimation from vision is fundamental for a myriad of robotic applications such as navigation, manipulation, and planning. Inspired by the mammal's visual system, which gazes at specific objects, we develop two novel constraints relating time-to-contact, acceleration, and distance that we call the $τ$-constraint and $Φ$-constraint. They allow an active (moving) camera to estimate depth efficiently and accurately while using only a small portion of the image. The constraints are applicable to range sensing, sensor fusion, and visual servoing.
We successfully validate the proposed constraints with two experiments. The first applies both constraints in a trajectory estimation task with a monocular camera and an Inertial Measurement Unit (IMU). Our methods achieve 30-70% less average trajectory error while running 25$\times$ and 6.2$\times$ faster than the popular Visual-Inertial Odometry methods VINS-Mono and ROVIO respectively. The second experiment demonstrates that when the constraints are used for feedback with efference copies the resulting closed loop system's eigenvalues are invariant to scaling of the applied control signal. We believe these results indicate the $τ$ and $Φ$ constraint's potential as the basis of robust and efficient algorithms for a multitude of robotic applications.
△ Less
Submitted 7 March, 2023; v1 submitted 14 March, 2022;
originally announced March 2022.