-
Event-based Egocentric Human Pose Estimation in Dynamic Environment
Authors:
Wataru Ikeda,
Masashi Hatano,
Ryosei Hara,
Mariko Isogawa
Abstract:
Estimating human pose using a front-facing egocentric camera is essential for applications such as sports motion analysis, VR/AR, and AI for wearable devices. However, many existing methods rely on RGB cameras and do not account for low-light environments or motion blur. Event-based cameras have the potential to address these challenges. In this work, we introduce a novel task of human pose estima…
▽ More
Estimating human pose using a front-facing egocentric camera is essential for applications such as sports motion analysis, VR/AR, and AI for wearable devices. However, many existing methods rely on RGB cameras and do not account for low-light environments or motion blur. Event-based cameras have the potential to address these challenges. In this work, we introduce a novel task of human pose estimation using a front-facing event-based camera mounted on the head and propose D-EventEgo, the first framework for this task. The proposed method first estimates the head poses, and then these are used as conditions to generate body poses. However, when estimating head poses, the presence of dynamic objects mixed with background events may reduce head pose estimation accuracy. Therefore, we introduce the Motion Segmentation Module to remove dynamic objects and extract background information. Extensive experiments on our synthetic event-based dataset derived from EgoBody, demonstrate that our approach outperforms our baseline in four out of five evaluation metrics in dynamic environments.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
EventEgoHands: Event-based Egocentric 3D Hand Mesh Reconstruction
Authors:
Ryosei Hara,
Wataru Ikeda,
Masashi Hatano,
Mariko Isogawa
Abstract:
Reconstructing 3D hand mesh is challenging but an important task for human-computer interaction and AR/VR applications. In particular, RGB and/or depth cameras have been widely used in this task. However, methods using these conventional cameras face challenges in low-light environments and during motion blur. Thus, to address these limitations, event cameras have been attracting attention in rece…
▽ More
Reconstructing 3D hand mesh is challenging but an important task for human-computer interaction and AR/VR applications. In particular, RGB and/or depth cameras have been widely used in this task. However, methods using these conventional cameras face challenges in low-light environments and during motion blur. Thus, to address these limitations, event cameras have been attracting attention in recent years for their high dynamic range and high temporal resolution. Despite their advantages, event cameras are sensitive to background noise or camera motion, which has limited existing studies to static backgrounds and fixed cameras. In this study, we propose EventEgoHands, a novel method for event-based 3D hand mesh reconstruction in an egocentric view. Our approach introduces a Hand Segmentation Module that extracts hand regions, effectively mitigating the influence of dynamic background events. We evaluated our approach and demonstrated its effectiveness on the N-HOT3D dataset, improving MPJPE by approximately more than 4.5 cm (43%).
△ Less
Submitted 28 May, 2025; v1 submitted 25 May, 2025;
originally announced May 2025.
-
High-Quality Virtual Single-Viewpoint Surgical Video: Geometric Autocalibration of Multiple Cameras in Surgical Lights
Authors:
Yuna Kato,
Mariko Isogawa,
Shohei Mori,
Hideo Saito,
Hiroki Kajita,
Yoshifumi Takatsume
Abstract:
Occlusion-free video generation is challenging due to surgeons' obstructions in the camera field of view. Prior work has addressed this issue by installing multiple cameras on a surgical light, hoping some cameras will observe the surgical field with less occlusion. However, this special camera setup poses a new imaging challenge since camera configurations can change every time surgeons move the…
▽ More
Occlusion-free video generation is challenging due to surgeons' obstructions in the camera field of view. Prior work has addressed this issue by installing multiple cameras on a surgical light, hoping some cameras will observe the surgical field with less occlusion. However, this special camera setup poses a new imaging challenge since camera configurations can change every time surgeons move the light, and manual image alignment is required. This paper proposes an algorithm to automate this alignment task. The proposed method detects frames where the lighting system moves, realigns them, and selects the camera with the least occlusion. This algorithm results in a stabilized video with less occlusion. Quantitative results show that our method outperforms conventional approaches. A user study involving medical doctors also confirmed the superiority of our method.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds
Authors:
Yuto Shibata,
Yusuke Oumi,
Go Irie,
Akisato Kimura,
Yoshimitsu Aoki,
Mariko Isogawa
Abstract:
We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents si…
▽ More
We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents significant challenges. In contrast to sound sources specifically designed for measurement, regular music varies in both volume and pitch. These dynamic changes in signals caused by music are inevitably mixed with alterations in the sound field resulting from human motion, making it hard to extract reliable cues for pose estimation. To address these challenges, BGM2Pose introduces a Contrastive Pose Extraction Module that employs contrastive learning and hard negative sampling to eliminate musical components from the recorded data, isolating the pose information. Additionally, we propose a Frequency-wise Attention Module that enables the model to focus on subtle acoustic variations attributable to human movement by dynamically computing attention across frequency bands. Experiments suggest that our method outperforms the existing methods, demonstrating substantial potential for real-world applications. Our datasets and code will be made publicly available.
△ Less
Submitted 1 March, 2025;
originally announced March 2025.
-
Dense Depth from Event Focal Stack
Authors:
Kenta Horikawa,
Mariko Isogawa,
Hideo Saito,
Shohei Mori
Abstract:
We propose a method for dense depth estimation from an event stream generated when sweeping the focal plane of the driving lens attached to an event camera. In this method, a depth map is inferred from an ``event focal stack'' composed of the event stream using a convolutional neural network trained with synthesized event focal stacks. The synthesized event stream is created from a focal stack gen…
▽ More
We propose a method for dense depth estimation from an event stream generated when sweeping the focal plane of the driving lens attached to an event camera. In this method, a depth map is inferred from an ``event focal stack'' composed of the event stream using a convolutional neural network trained with synthesized event focal stacks. The synthesized event stream is created from a focal stack generated by Blender for any arbitrary 3D scene. This allows for training on scenes with diverse structures. Additionally, we explored methods to eliminate the domain gap between real event streams and synthetic event streams. Our method demonstrates superior performance over a depth-from-defocus method in the image domain on synthetic and real datasets.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Acoustic-based 3D Human Pose Estimation Robust to Human Position
Authors:
Yusuke Oumi,
Yuto Shibata,
Go Irie,
Akisato Kimura,
Yoshimitsu Aoki,
Mariko Isogawa
Abstract:
This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to soun…
▽ More
This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to sound obstruction, the existing model degrades its accuracy significantly when subjects deviate from this line, limiting its practicality in real-world scenarios. To overcome this limitation, we propose a novel method composed of a position discriminator and reverberation-resistant model. The former predicts the standing positions of subjects and applies adversarial learning to extract subject position-invariant features. The latter utilizes acoustic signals before the estimation target time as references to enhance robustness against the variations in sound arrival times due to diffraction and reflection. We construct an acoustic pose estimation dataset that covers diverse human locations and demonstrate through experiments that our proposed method outperforms existing approaches.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Scapegoat Generation for Privacy Protection from Deepfake
Authors:
Gido Kato,
Yoshihiro Fukuhara,
Mariko Isogawa,
Hideki Tsunashima,
Hirokatsu Kataoka,
Shigeo Morishima
Abstract:
To protect privacy and prevent malicious use of deepfake, current studies propose methods that interfere with the generation process, such as detection and destruction approaches. However, these methods suffer from sub-optimal generalization performance to unseen models and add undesirable noise to the original image. To address these problems, we propose a new problem formulation for deepfake pre…
▽ More
To protect privacy and prevent malicious use of deepfake, current studies propose methods that interfere with the generation process, such as detection and destruction approaches. However, these methods suffer from sub-optimal generalization performance to unseen models and add undesirable noise to the original image. To address these problems, we propose a new problem formulation for deepfake prevention: generating a ``scapegoat image'' by modifying the style of the original input in a way that is recognizable as an avatar by the user, but impossible to reconstruct the real face. Even in the case of malicious deepfake, the privacy of the users is still protected. To achieve this, we introduce an optimization-based editing method that utilizes GAN inversion to discourage deepfake models from generating similar scapegoats. We validate the effectiveness of our proposed method through quantitative and user studies.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
Efficient Non-Line-of-Sight Imaging from Transient Sinograms
Authors:
Mariko Isogawa,
Dorian Chan,
Ye Yuan,
Kris Kitani,
Matthew O'Toole
Abstract:
Non-line-of-sight (NLOS) imaging techniques use light that diffusely reflects off of visible surfaces (e.g., walls) to see around corners. One approach involves using pulsed lasers and ultrafast sensors to measure the travel time of multiply scattered light. Unlike existing NLOS techniques that generally require densely raster scanning points across the entirety of a relay wall, we explore a more…
▽ More
Non-line-of-sight (NLOS) imaging techniques use light that diffusely reflects off of visible surfaces (e.g., walls) to see around corners. One approach involves using pulsed lasers and ultrafast sensors to measure the travel time of multiply scattered light. Unlike existing NLOS techniques that generally require densely raster scanning points across the entirety of a relay wall, we explore a more efficient form of NLOS scanning that reduces both acquisition times and computational requirements. We propose a circular and confocal non-line-of-sight (C2NLOS) scan that involves illuminating and imaging a common point, and scanning this point in a circular path along a wall. We observe that (1) these C2NLOS measurements consist of a superposition of sinusoids, which we refer to as a transient sinogram, (2) there exists computationally efficient reconstruction procedures that transform these sinusoidal measurements into 3D positions of hidden scatterers or NLOS images of hidden objects, and (3) despite operating on an order of magnitude fewer measurements than previous approaches, these C2NLOS scans provide sufficient information about the hidden scene to solve these different NLOS imaging tasks. We show results from both simulated and real C2NLOS scans.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Optical Non-Line-of-Sight Physics-based 3D Human Pose Estimation
Authors:
Mariko Isogawa,
Ye Yuan,
Matthew O'Toole,
Kris Kitani
Abstract:
We describe a method for 3D human pose estimation from transient images (i.e., a 3D spatio-temporal histogram of photons) acquired by an optical non-line-of-sight (NLOS) imaging system. Our method can perceive 3D human pose by `looking around corners' through the use of light indirectly reflected by the environment. We bring together a diverse set of technologies from NLOS imaging, human pose esti…
▽ More
We describe a method for 3D human pose estimation from transient images (i.e., a 3D spatio-temporal histogram of photons) acquired by an optical non-line-of-sight (NLOS) imaging system. Our method can perceive 3D human pose by `looking around corners' through the use of light indirectly reflected by the environment. We bring together a diverse set of technologies from NLOS imaging, human pose estimation and deep reinforcement learning to construct an end-to-end data processing pipeline that converts a raw stream of photon measurements into a full 3D human pose sequence estimate. Our contributions are the design of data representation process which includes (1) a learnable inverse point spread function (PSF) to convert raw transient images into a deep feature vector; (2) a neural humanoid control policy conditioned on the transient image feature and learned from interactions with a physics simulator; and (3) a data synthesis and augmentation strategy based on depth data that can be transferred to a real-world NLOS imaging system. Our preliminary experiments suggest that our method is able to generalize to real-world NLOS measurement to estimate physically-valid 3D human poses.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Ultimate confinement of phonon propagation in silicon nano-crystalline structure
Authors:
Takafumi Oyake,
Lei Feng,
Takuma Shiga,
Masayuki Isogawa,
Yoshiaki Nakamura,
Junichiro Shiomi
Abstract:
Temperature-dependent thermal conductivity of epitaxial silicon nano-crystalline (SiNC) structures composed of nanometer-sized grains separated by ultra-thin silicon-oxide (SiO2) films is measured by the time domain thermoreflectance technique in the range from 50 to 300 K. Thermal conductivity of SiNC structures with grain size of 3 nm and 5 nm is anomalously low at the entire temperature range,…
▽ More
Temperature-dependent thermal conductivity of epitaxial silicon nano-crystalline (SiNC) structures composed of nanometer-sized grains separated by ultra-thin silicon-oxide (SiO2) films is measured by the time domain thermoreflectance technique in the range from 50 to 300 K. Thermal conductivity of SiNC structures with grain size of 3 nm and 5 nm is anomalously low at the entire temperature range, significantly below the values of bulk amorphous Si and SiO2. Phonon gas kinetics model, with intrinsic transport properties obtained by first-principles-based anharmonic lattice dynamics and phonon transmittance across ultra-thin SiO2 films obtained by atomistic Green's function, reproduces the measured thermal conductivity without any fitting parameters. The analysis reveals that mean free paths of acoustic phonons in the SiNC structures are equivalent or even below half the phonon wavelength, i.e. the minimum thermal conductivity scenario. The result demonstrates that the nanostructures with extremely small length scales and controlled interface can give rise to ultimate classical confinement of thermal phonon propagation.
△ Less
Submitted 12 January, 2018;
originally announced January 2018.