-
Iterative Event-based Motion Segmentation by Variational Contrast Maximization
Authors:
Ryo Yamaki,
Shintaro Shiba,
Guillermo Gallego,
Yoshimitsu Aoki
Abstract:
Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e., motion segmentation), which is useful for various tasks such as object detection and visual servoing. We propose an iterative motion segmentation method, by cl…
▽ More
Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e., motion segmentation), which is useful for various tasks such as object detection and visual servoing. We propose an iterative motion segmentation method, by classifying events into background (e.g., dominant motion hypothesis) and foreground (independent motion residuals), thus extending the Contrast Maximization framework. Experimental results demonstrate that the proposed method successfully classifies event clusters both for public and self-recorded datasets, producing sharp, motion-compensated edge-like images. The proposed method achieves state-of-the-art accuracy on moving object detection benchmarks with an improvement of over 30%, and demonstrates its possibility of applying to more complex and noisy real-world scenes. We hope this work broadens the sensitivity of Contrast Maximization with respect to both motion parameters and input events, thus contributing to theoretical advancements in event-based motion segmentation estimation. https://github.com/aoki-media-lab/event_based_segmentation_vcmax
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Simultaneous Motion And Noise Estimation with Event Cameras
Authors:
Shintaro Shiba,
Yoshimitsu Aoki,
Guillermo Gallego
Abstract:
Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method th…
▽ More
Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method that simultaneously estimates motion in its various forms (e.g., ego-motion, optical flow) and noise. The method is flexible, as it allows replacing the 1-step motion estimation of the widely-used Contrast Maximization framework with any other motion estimator, such as deep neural networks. The experiments show that the proposed method achieves state-of-the-art results on the E-MLB denoising benchmark and competitive results on the DND21 benchmark, while showing its efficacy on motion estimation and intensity reconstruction tasks. We believe that the proposed approach contributes to strengthening the theory of event-data denoising, as well as impacting practical denoising use-cases, as we release the code upon acceptance. Project page: https://github.com/tub-rip/ESMD
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds
Authors:
Yuto Shibata,
Yusuke Oumi,
Go Irie,
Akisato Kimura,
Yoshimitsu Aoki,
Mariko Isogawa
Abstract:
We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents si…
▽ More
We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents significant challenges. In contrast to sound sources specifically designed for measurement, regular music varies in both volume and pitch. These dynamic changes in signals caused by music are inevitably mixed with alterations in the sound field resulting from human motion, making it hard to extract reliable cues for pose estimation. To address these challenges, BGM2Pose introduces a Contrastive Pose Extraction Module that employs contrastive learning and hard negative sampling to eliminate musical components from the recorded data, isolating the pose information. Additionally, we propose a Frequency-wise Attention Module that enables the model to focus on subtle acoustic variations attributable to human movement by dynamically computing attention across frequency bands. Experiments suggest that our method outperforms the existing methods, demonstrating substantial potential for real-world applications. Our datasets and code will be made publicly available.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Pre-training with Synthetic Patterns for Audio
Authors:
Yuchi Ishikawa,
Tatsuya Komatsu,
Yoshimitsu Aoki
Abstract:
In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities wit…
▽ More
In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate the efficacy of our framework, we conduct extensive experiments across a total of 13 audio tasks and 17 synthetic datasets. The experiments provide insights into which types of synthetic patterns are effective for audio. Our results demonstrate that our framework achieves performance comparable to models pre-trained on AudioSet-2M and partially outperforms image-based pre-training methods.
△ Less
Submitted 1 October, 2024;
originally announced October 2024.
-
Event-based Background-Oriented Schlieren
Authors:
Shintaro Shiba,
Friedhelm Hamann,
Yoshimitsu Aoki,
Guillermo Gallego
Abstract:
Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution,…
▽ More
Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This paper presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera's advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data. https://github.com/tub-rip/event_based_bos
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Fast Event-based Optical Flow Estimation by Triplet Matching
Authors:
Shintaro Shiba,
Yoshimitsu Aoki,
Guillermo Gallego
Abstract:
Event cameras are novel bio-inspired sensors that offer advantages over traditional cameras (low latency, high dynamic range, low power, etc.). Optical flow estimation methods that work on packets of events trade off speed for accuracy, while event-by-event (incremental) methods have strong assumptions and have not been tested on common benchmarks that quantify progress in the field. Towards appli…
▽ More
Event cameras are novel bio-inspired sensors that offer advantages over traditional cameras (low latency, high dynamic range, low power, etc.). Optical flow estimation methods that work on packets of events trade off speed for accuracy, while event-by-event (incremental) methods have strong assumptions and have not been tested on common benchmarks that quantify progress in the field. Towards applications on resource-constrained devices, it is important to develop optical flow algorithms that are fast, light-weight and accurate. This work leverages insights from neuroscience, and proposes a novel optical flow estimation scheme based on triplet matching. The experiments on publicly available benchmarks demonstrate its capability to handle complex scenes with comparable results as prior packet-based algorithms. In addition, the proposed method achieves the fastest execution time (> 10 kHz) on standard CPUs as it requires only three events in estimation. We hope that our research opens the door to real-time, incremental motion estimation methods and applications in real-world scenarios.
△ Less
Submitted 23 December, 2022;
originally announced December 2022.
-
Experimental Low-speed Positioning System with VecTwin Rudder for Automatic Docking (Berthing)
Authors:
Dimas M. Rachman,
Yusuke Aoki,
Yoshiki Miyauchi,
Naoya Umeda,
Atsuo Maki
Abstract:
A VecTwin rudder system comprises twin fishtail rudders with reaction fins to increase its performance. With a constant propeller revolution number, the vessel can execute special low-speed maneuvers like hover, crabbing, reverse, and rotation. Such low-speed maneuvers are termed dynamic positioning (DP), and a DP vessel should be fully/overly actuated with several thrusters. This article introduc…
▽ More
A VecTwin rudder system comprises twin fishtail rudders with reaction fins to increase its performance. With a constant propeller revolution number, the vessel can execute special low-speed maneuvers like hover, crabbing, reverse, and rotation. Such low-speed maneuvers are termed dynamic positioning (DP), and a DP vessel should be fully/overly actuated with several thrusters. This article introduces a novel and experimental VecTwin positioning system (VTPS) without making the ship fully/overly actuated. Unlike the usual dynamic positioning system (DPS), the VTPS is developed for low-speed operations in a calm harbor area. It is designed upon an assumption that the forces due to the interaction between the rudders, the propeller, and the hull are linear with the rudder angles within a range around the hover rudder angle. The linear relationship is obtained through linear regression of the results from several CFD simulations. The VTPS implements a PID controller that regulates the actuator forces to achieve the given low-speed positioning objective. It was tested in combined automatic docking and position-keeping experiments where disturbances from the environment exist. It shows promising potential for a practical application but with further improvements.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.