-
Multi-View Neural Surface Reconstruction with Structured Light
Authors:
Chunyu Li,
Taisuke Hashimoto,
Eiichi Matsumoto,
Hiroharu Kato
Abstract:
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision. DR-based methods minimize the difference between the rendered and target images by optimizing both the shape and appearance and realizing a high visual reproductivity. However, most approaches perform poorly for textureless objects because of the geometrical ambiguity…
▽ More
Three-dimensional (3D) object reconstruction based on differentiable rendering (DR) is an active research topic in computer vision. DR-based methods minimize the difference between the rendered and target images by optimizing both the shape and appearance and realizing a high visual reproductivity. However, most approaches perform poorly for textureless objects because of the geometrical ambiguity, which means that multiple shapes can have the same rendered result in such objects. To overcome this problem, we introduce active sensing with structured light (SL) into multi-view 3D object reconstruction based on DR to learn the unknown geometry and appearance of arbitrary scenes and camera poses. More specifically, our framework leverages the correspondences between pixels in different views calculated by structured light as an additional constraint in the DR-based optimization of implicit surface, color representations, and camera poses. Because camera poses can be optimized simultaneously, our method realizes high reconstruction accuracy in the textureless region and reduces efforts for camera pose calibration, which is required for conventional SL-based methods. Experiment results on both synthetic and real data demonstrate that our system outperforms conventional DR- and SL-based methods in a high-quality surface reconstruction, particularly for challenging objects with textureless or shiny surfaces.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Solving the optimal stopping problem with reinforcement learning: an application in financial option exercise
Authors:
Leonardo Kanashiro Felizardo,
Elia Matsumoto,
Emilio Del-Moral-Hernandez
Abstract:
The optimal stopping problem is a category of decision problems with a specific constrained configuration. It is relevant to various real-world applications such as finance and management. To solve the optimal stopping problem, state-of-the-art algorithms in dynamic programming, such as the least-squares Monte Carlo (LSMC), are employed. This type of algorithm relies on path simulations using only…
▽ More
The optimal stopping problem is a category of decision problems with a specific constrained configuration. It is relevant to various real-world applications such as finance and management. To solve the optimal stopping problem, state-of-the-art algorithms in dynamic programming, such as the least-squares Monte Carlo (LSMC), are employed. This type of algorithm relies on path simulations using only the last price of the underlying asset as a state representation. Also, the LSMC was thinking for option valuation where risk-neutral probabilities can be employed to account for uncertainty. However, the general optimal stopping problem goals may not fit the requirements of the LSMC showing auto-correlated prices. We employ a data-driven method that uses Monte Carlo simulation to train and test artificial neural networks (ANN) to solve the optimal stopping problem. Using ANN to solve decision problems is not entirely new. We propose a different architecture that uses convolutional neural networks (CNN) to deal with the dimensionality problem that arises when we transform the whole history of prices into a Markovian state. We present experiments that indicate that our proposed architecture improves results over the previous implementations under specific simulated time series function sets. Lastly, we employ our proposed method to compare the optimal exercise of the financial options problem with the LSMC algorithm. Our experiments show that our method can capture more accurate exercise opportunities when compared to the LSMC. We have outstandingly higher (above 974\% improvement) expected payoff from these exercise policies under the many Monte Carlo simulations that used the real-world return database on the out-of-sample (test) data.
△ Less
Submitted 21 July, 2022;
originally announced August 2022.
-
Decomposing NeRF for Editing via Feature Field Distillation
Authors:
Sosuke Kobayashi,
Eiichi Matsumoto,
Vincent Sitzmann
Abstract:
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been diff…
▽ More
Emerging neural radiance fields (NeRF) are a promising scene representation for computer graphics, enabling high-quality 3D reconstruction and novel view synthesis from image observations. However, editing a scene represented by a NeRF is challenging, as the underlying connectionist representations such as MLPs or voxel grids are not object-centric or compositional. In particular, it has been difficult to selectively edit specific regions or objects. In this work, we tackle the problem of semantic scene decomposition of NeRFs to enable query-based local editing of the represented 3D scenes. We propose to distill the knowledge of off-the-shelf, self-supervised 2D image feature extractors such as CLIP-LSeg or DINO into a 3D feature field optimized in parallel to the radiance field. Given a user-specified query of various modalities such as text, an image patch, or a point-and-click selection, 3D feature fields semantically decompose 3D space without the need for re-training and enable us to semantically select and edit regions in the radiance field. Our experiments validate that the distilled feature fields (DFFs) can transfer recent progress in 2D vision and language foundation models to 3D scene representations, enabling convincing 3D segmentation and selective editing of emerging neural graphics representations.
△ Less
Submitted 13 October, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Surface-Aligned Neural Radiance Fields for Controllable 3D Human Synthesis
Authors:
Tianhan Xu,
Yasuhiro Fujita,
Eiichi Matsumoto
Abstract:
We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. Our method defines the neural scene representation on the mesh surface points and signed distances from the surface of a human body mesh. We identify an indistinguishability issue that arises when a point in 3D space is mapped to its nearest surface point on a mesh for learning surfa…
▽ More
We propose a new method for reconstructing controllable implicit 3D human models from sparse multi-view RGB videos. Our method defines the neural scene representation on the mesh surface points and signed distances from the surface of a human body mesh. We identify an indistinguishability issue that arises when a point in 3D space is mapped to its nearest surface point on a mesh for learning surface-aligned neural scene representation. To address this issue, we propose projecting a point onto a mesh surface using a barycentric interpolation with modified vertex normals. Experiments with the ZJU-MoCap and Human3.6M datasets show that our approach achieves a higher quality in a novel-view and novel-pose synthesis than existing methods. We also demonstrate that our method easily supports the control of body shape and clothes. Project page: https://pfnet-research.github.io/surface-aligned-nerf/.
△ Less
Submitted 3 April, 2022; v1 submitted 5 January, 2022;
originally announced January 2022.
-
Ray-marching Thurston geometries
Authors:
Rémi Coulon,
Elisabetta A. Matsumoto,
Henry Segerman,
Steve J. Trettel
Abstract:
We describe algorithms that produce accurate real-time interactive in-space views of the eight Thurston geometries using ray-marching. We give a theoretical framework for our algorithms, independent of the geometry involved. In addition to scenes within a geometry $X$, we also consider scenes within quotient manifolds and orbifolds $X / Γ$. We adapt the Phong lighting model to non-euclidean geomet…
▽ More
We describe algorithms that produce accurate real-time interactive in-space views of the eight Thurston geometries using ray-marching. We give a theoretical framework for our algorithms, independent of the geometry involved. In addition to scenes within a geometry $X$, we also consider scenes within quotient manifolds and orbifolds $X / Γ$. We adapt the Phong lighting model to non-euclidean geometries. The most difficult part of this is the calculation of light intensity, which relates to the area density of geodesic spheres. We also give extensive practical details for each geometry.
△ Less
Submitted 13 January, 2022; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score
Authors:
He Huang,
Shunta Saito,
Yuta Kikuchi,
Eiichi Matsumoto,
Wei Tang,
Philip S. Yu
Abstract:
Scene graph parsing aims to detect objects in an image scene and recognize their relations. Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations, as the highly long-tailed distribution of data biases the learning towards frequent labels. Motivated by the fact that detecting these rare relations can be critical in real-world applicatio…
▽ More
Scene graph parsing aims to detect objects in an image scene and recognize their relations. Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations, as the highly long-tailed distribution of data biases the learning towards frequent labels. Motivated by the fact that detecting these rare relations can be critical in real-world applications, this paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem in scene graph parsing. Specifically, we design a new Contrasting Cross-Entropy loss, which promotes the detection of rare relations by suppressing incorrect frequent ones. Furthermore, we propose a novel scoring module, termed as Scorer, which learns to rank the relations based on the image features and relation features to improve the recall of predictions. Our framework is simple and effective, and can be incorporated into current scene graph models. Experimental results show that the proposed approach improves the current state-of-the-art methods, with a clear advantage of detecting rare relations.
△ Less
Submitted 5 October, 2020; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Non-Euclidean Virtual Reality IV: Sol
Authors:
Rémi Coulon,
Elisabetta A. Matsumoto,
Henry Segerman,
Steve Trettel
Abstract:
This article presents virtual reality software designed to explore the Sol geometry. The simulation is available on 3-dimensional.space/sol.html
This article presents virtual reality software designed to explore the Sol geometry. The simulation is available on 3-dimensional.space/sol.html
△ Less
Submitted 2 February, 2020;
originally announced February 2020.
-
Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning
Authors:
Ayaka Kume,
Eiichi Matsumoto,
Kuniyuki Takahashi,
Wilson Ko,
Jethro Tan
Abstract:
In order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation in complex environments. However, existing methods typically employ only a single policy. This can l…
▽ More
In order for robots to perform mission-critical tasks, it is essential that they are able to quickly adapt to changes in their environment as well as to injuries and or other bodily changes. Deep reinforcement learning has been shown to be successful in training robot control policies for operation in complex environments. However, existing methods typically employ only a single policy. This can limit the adaptability since a large environmental modification might require a completely different behavior compared to the learning environment. To solve this problem, we propose Map-based Multi-Policy Reinforcement Learning (MMPRL), which aims to search and store multiple policies that encode different behavioral features while maximizing the expected reward in advance of the environment change. Thanks to these policies, which are stored into a multi-dimensional discrete map according to its behavioral feature, adaptation can be performed within reasonable time without retraining the robot. An appropriate pre-trained policy from the map can be recalled using Bayesian optimization. Our experiments show that MMPRL enables robots to quickly adapt to large changes without requiring any prior knowledge on the type of injuries that could occur. A highlight of the learned behaviors can be found here: https://youtu.be/QwInbilXNOE .
△ Less
Submitted 18 October, 2017; v1 submitted 17 October, 2017;
originally announced October 2017.
-
Learning Discrete Representations via Information Maximizing Self-Augmented Training
Authors:
Weihua Hu,
Takeru Miyato,
Seiya Tokui,
Eiichi Matsumoto,
Masashi Sugiyama
Abstract:
Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore,…
▽ More
Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation. The task includes clustering and hash learning as special cases. Deep neural networks are promising to be used because they can model the non-linearity of data and scale to large datasets. However, their model complexity is huge, and therefore, we need to carefully regularize the networks in order to learn useful representations that exhibit intended invariance for applications of interest. To this end, we propose a method called Information Maximizing Self-Augmented Training (IMSAT). In IMSAT, we use data augmentation to impose the invariance on discrete representations. More specifically, we encourage the predicted representations of augmented data points to be close to those of the original data points in an end-to-end fashion. At the same time, we maximize the information-theoretic dependency between data and their predicted discrete representations. Extensive experiments on benchmark datasets show that IMSAT produces state-of-the-art results for both clustering and unsupervised hash learning.
△ Less
Submitted 14 June, 2017; v1 submitted 28 February, 2017;
originally announced February 2017.
-
Temporal Generative Adversarial Nets with Singular Value Clipping
Authors:
Masaki Saito,
Eiichi Matsumoto,
Shunta Saito
Abstract:
In this paper, we propose a generative model, Temporal Generative Adversarial Nets (TGAN), which can learn a semantic representation of unlabeled videos, and is capable of generating videos. Unlike existing Generative Adversarial Nets (GAN)-based methods that generate videos with a single generator consisting of 3D deconvolutional layers, our model exploits two different types of generators: a tem…
▽ More
In this paper, we propose a generative model, Temporal Generative Adversarial Nets (TGAN), which can learn a semantic representation of unlabeled videos, and is capable of generating videos. Unlike existing Generative Adversarial Nets (GAN)-based methods that generate videos with a single generator consisting of 3D deconvolutional layers, our model exploits two different types of generators: a temporal generator and an image generator. The temporal generator takes a single latent variable as input and outputs a set of latent variables, each of which corresponds to an image frame in a video. The image generator transforms a set of such latent variables into a video. To deal with instability in training of GAN with such advanced networks, we adopt a recently proposed model, Wasserstein GAN, and propose a novel method to train it stably in an end-to-end manner. The experimental results demonstrate the effectiveness of our methods.
△ Less
Submitted 17 August, 2017; v1 submitted 20 November, 2016;
originally announced November 2016.