-
A Message-Passing Perspective on Ptychographic Phase Retrieval
Authors:
Hajime Ueda,
Shun Katakami,
Masato Okada
Abstract:
We introduce a probabilistic approach to ptychographic reconstruction in computational imaging. Ptychography is an imaging method where the complex amplitude of an object is estimated from a sequence of diffraction measurements. We formulate this reconstruction as a Bayesian inverse problem and derive an inference algorithm, termed "Ptycho-EP," based on belief propagation and Vector Approximate Me…
▽ More
We introduce a probabilistic approach to ptychographic reconstruction in computational imaging. Ptychography is an imaging method where the complex amplitude of an object is estimated from a sequence of diffraction measurements. We formulate this reconstruction as a Bayesian inverse problem and derive an inference algorithm, termed "Ptycho-EP," based on belief propagation and Vector Approximate Message Passing from information theory. Prior knowledge about the unknown object can be integrated into the probabilistic model, and the Bayesian framework inherently provides uncertainty quantification of the reconstruction. Numerical experiments demonstrate that, when the probe's illumination function is known, our algorithm accurately retrieves the object image at a sampling ratio approaching the information theoretic limit. In scenarios where the illumination function is unknown, both the object and the probe can be jointly reconstructed via an Expectation-Maximization algorithm. We evaluate the performance of our algorithm against conventional methods, highlighting its superior convergence speed.
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
Stochastic Vector Approximate Message Passing with applications to phase retrieval
Authors:
Hajime Ueda,
Shun Katakami,
Masato Okada
Abstract:
Phase retrieval refers to the problem of recovering a high-dimensional vector $\boldsymbol{x} \in \mathbb{C}^N$ from the magnitude of its linear transform $\boldsymbol{z} = A \boldsymbol{x}$, observed through a noisy channel. To improve the ill-posed nature of the inverse problem, it is a common practice to observe the magnitude of linear measurements…
▽ More
Phase retrieval refers to the problem of recovering a high-dimensional vector $\boldsymbol{x} \in \mathbb{C}^N$ from the magnitude of its linear transform $\boldsymbol{z} = A \boldsymbol{x}$, observed through a noisy channel. To improve the ill-posed nature of the inverse problem, it is a common practice to observe the magnitude of linear measurements $\boldsymbol{z}^{(1)} = A^{(1)} \boldsymbol{x},..., \boldsymbol{z}^{(L)} = A^{(L)}\boldsymbol{x}$ using multiple sensing matrices $A^{(1)},..., A^{(L)}$, with ptychographic imaging being a remarkable example of such strategies. Inspired by existing algorithms for ptychographic reconstruction, we introduce stochasticity to Vector Approximate Message Passing (VAMP), a computationally efficient algorithm applicable to a wide range of Bayesian inverse problems. By testing our approach in the setup of phase retrieval, we show the superior convergence speed of the proposed algorithm.
△ Less
Submitted 9 October, 2024; v1 submitted 30 August, 2024;
originally announced August 2024.
-
L0-regularized compressed sensing with Mean-field Coherent Ising Machines
Authors:
Mastiyage Don Sudeera Hasaranga Gunathilaka,
Yoshitaka Inui,
Satoshi Kako,
Kazushi Mimura,
Masato Okada,
Yoshihisa Yamamoto,
Toru Aonishi
Abstract:
Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the…
▽ More
Coherent Ising Machine (CIM) is a network of optical parametric oscillators that solves combinatorial optimization problems by finding the ground state of an Ising Hamiltonian. As a practical application of CIM, Aonishi et al. proposed a quantum-classical hybrid system to solve optimization problems of L0-regularization-based compressed sensing (L0RBCS). Gunathilaka et al. has further enhanced the accuracy of the system. However, the computationally expensive CIM's stochastic differential equations (SDEs) limit the use of digital hardware implementations. As an alternative to Gunathilaka et al.'s CIM SDEs used previously, we propose using the mean-field CIM (MF-CIM) model, which is a physics-inspired heuristic solver without quantum noise. MF-CIM surmounts the high computational cost due to the simple nature of the differential equations (DEs). Furthermore, our results indicate that the proposed model has similar performance to physically accurate SDEs in both artificial and magnetic resonance imaging data, paving the way for implementing CIM-based L0RBCS on digital hardware such as Field Programmable Gate Arrays (FPGAs).
△ Less
Submitted 17 June, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Bayesian Inference for Small-Angle Scattering Data
Authors:
Yui Hayashi,
Shun Katakami,
Shigeo Kuwamoto,
Kenji Nagata,
Masaichiro Mizumaki,
Masato Okada
Abstract:
In this paper, we propose a method for estimating model parameters using Small-Angle Scattering (SAS) data based on the Bayesian inference. Conventional SAS data analyses involve processes of manual parameter adjustment by analysts or optimization using gradient methods. These analysis processes tend to involve heuristic approaches and may lead to local solutions.Furthermore, it is difficult to ev…
▽ More
In this paper, we propose a method for estimating model parameters using Small-Angle Scattering (SAS) data based on the Bayesian inference. Conventional SAS data analyses involve processes of manual parameter adjustment by analysts or optimization using gradient methods. These analysis processes tend to involve heuristic approaches and may lead to local solutions.Furthermore, it is difficult to evaluate the reliability of the results obtained by conventional analysis methods. Our method solves these problems by estimating model parameters as probability distributions from SAS data using the framework of the Bayesian inference. We evaluate the performance of our method through numerical experiments using artificial data of representative measurement target models.From the results of the numerical experiments, we show that our method provides not only high accuracy and reliability of estimation, but also perspectives on the transition point of estimability with respect to the measurement time and the lower bound of the angular domain of the measured data.
△ Less
Submitted 28 July, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Bayesian Inference of Absorption Spectra Based on Binomial Distribution
Authors:
Tomohiro Nabika,
Kenji Nagata,
Shun Katakami,
Masaichiro Mizumaki,
Masato Okada
Abstract:
In this paper, we propose a Bayesian spectral deconvolution method for absorption spectra. In conventional analysis, the noise mechanism of absorption spectral data is never considered appropriately. In that analysis, the least-squares method, which assumes Gaussian noise from the perspective of Bayesian statistics, is frequently used. Since Bayesian inference is possible by introducing an appropr…
▽ More
In this paper, we propose a Bayesian spectral deconvolution method for absorption spectra. In conventional analysis, the noise mechanism of absorption spectral data is never considered appropriately. In that analysis, the least-squares method, which assumes Gaussian noise from the perspective of Bayesian statistics, is frequently used. Since Bayesian inference is possible by introducing an appropriate noise model for the data, we consider the absorption process of a single photon to be a Bernoulli trial and develop a Bayesian spectral deconvolution method based on binomial distribution. We have evaluated our method on artificial data under several conditions by numerical experiments. The results show that our method not only allows us to estimate parameters with high accuracy from absorption spectral data, but also to infer them even from absorption spectral data with large absorption rates where the spectral structure is flattened, which was previously impossible to analyze.
△ Less
Submitted 20 April, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Bayesian Spectral Deconvolution of X-Ray Absorption Near Edge Structure Discriminating High- and Low-Energy Domains
Authors:
Shuhei Kashiwamura,
Shun Katakami,
Ryo Yamagami,
Kazunori Iwamitsu,
Hiroyuki Kumazoe,
Kenji Nagata,
Toshihiro Okajima,
Ichiro Akai,
Masato Okada
Abstract:
In this paper, we propose a Bayesian spectral deconvolution considering the properties of peaks in different energy domains. Bayesian spectral deconvolution regresses spectral data into the sum of multiple basis functions. Conventional methods use a model that treats all peaks equally. However, in X-ray absorption near edge structure (XANES) spectra, the properties of the peaks differ depending on…
▽ More
In this paper, we propose a Bayesian spectral deconvolution considering the properties of peaks in different energy domains. Bayesian spectral deconvolution regresses spectral data into the sum of multiple basis functions. Conventional methods use a model that treats all peaks equally. However, in X-ray absorption near edge structure (XANES) spectra, the properties of the peaks differ depending on the energy domain, and the specific energy domain of XANES is essential in condensed matter physics. We propose a model that discriminates between the low- and high-energy domains. We also propose a prior distribution that reflects the physical properties. We compare the conventional and proposed models in terms of computational efficiency, estimation accuracy, and model evidence. We demonstrate that our method effectively estimates the number of transition components in the important energy domain, on which the material scientists focus for mapping the electronic transition analysis by first-principles simulation.
△ Less
Submitted 11 July, 2022; v1 submitted 18 March, 2022;
originally announced March 2022.
-
Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks
Authors:
Haruka Asanuma,
Shiro Takagi,
Yoshihiro Nagano,
Yuki Yoshida,
Yasuhiko Igarashi,
Masato Okada
Abstract:
When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences. This phenomenon is called catastrophic forgetting. While a line of studies has been proposed with respect to avoiding catastrophic forgetting, most of the methods are based on intuitive insights into the phenomenon, and their performances have been evaluated by numerical expe…
▽ More
When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences. This phenomenon is called catastrophic forgetting. While a line of studies has been proposed with respect to avoiding catastrophic forgetting, most of the methods are based on intuitive insights into the phenomenon, and their performances have been evaluated by numerical experiments using benchmark datasets. Therefore, in this study, we provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning. Teacher-student learning is a framework in which we introduce two neural networks: one neural network is a target function in supervised learning, and the other is a learning neural network. To analyze continual learning in the teacher-student framework, we introduce the similarity of the input distribution and the input-output relationship of the target functions as the similarity of tasks. In this theoretical framework, we also provide a qualitative understanding of how a single-layer linear learning neural network forgets tasks. Based on the analysis, we find that the network can avoid catastrophic forgetting when the similarity among input distributions is small and that of the input-output relationship of the target functions is large. The analysis also suggests that a system often exhibits a characteristic phenomenon called overshoot, which means that even if the learning network has once undergone catastrophic forgetting, it is possible that the network may perform reasonably well after further learning of the current task.
△ Less
Submitted 16 May, 2021;
originally announced May 2021.
-
Fast Bayesian Deconvolution using Simple Reversible Jump Moves
Authors:
Koki Okajima,
Kenji Nagata,
Masato Okada
Abstract:
We propose a Markov chain Monte Carlo-based deconvolution method designed to estimate the number of peaks in spectral data, along with the optimal parameters of each radial basis function. Assuming cases where the number of peaks is unknown, and a sweep simulation on all candidate models is computationally unrealistic, the proposed method efficiently searches over the probable candidates via trans…
▽ More
We propose a Markov chain Monte Carlo-based deconvolution method designed to estimate the number of peaks in spectral data, along with the optimal parameters of each radial basis function. Assuming cases where the number of peaks is unknown, and a sweep simulation on all candidate models is computationally unrealistic, the proposed method efficiently searches over the probable candidates via trans-dimensional moves assisted by annealing effects from replica exchange Monte Carlo moves. Through simulation using synthetic data, the proposed method demonstrates its advantages over conventional sweep simulations, particularly in model selection problems. Application to a set of olivine reflectance spectral data with varying forsterite and fayalite mixture ratios reproduced results obtained from previous mineralogical research, indicating that our method is applicable to deconvolution on real data sets.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction
Authors:
Masashi Okada,
Tadahiro Taniguchi
Abstract:
In the present paper, we propose a decoder-free extension of Dreamer, a leading model-based reinforcement learning (MBRL) method from pixels. Dreamer is a sample- and cost-efficient solution to robot learning, as it is used to train latent state-space models based on a variational autoencoder and to conduct policy optimization by latent trajectory imagination. However, this autoencoding based appr…
▽ More
In the present paper, we propose a decoder-free extension of Dreamer, a leading model-based reinforcement learning (MBRL) method from pixels. Dreamer is a sample- and cost-efficient solution to robot learning, as it is used to train latent state-space models based on a variational autoencoder and to conduct policy optimization by latent trajectory imagination. However, this autoencoding based approach often causes object vanishing, in which the autoencoder fails to perceives key objects for solving control tasks, and thus significantly limiting Dreamer's potential. This work aims to relieve this Dreamer's bottleneck and enhance its performance by means of removing the decoder. For this purpose, we firstly derive a likelihood-free and InfoMax objective of contrastive learning from the evidence lower bound of Dreamer. Secondly, we incorporate two components, (i) independent linear dynamics and (ii) the random crop data augmentation, to the learning scheme so as to improve the training performance. In comparison to Dreamer and other recent model-free reinforcement learning methods, our newly devised Dreamer with InfoMax and without generative decoder (Dreaming) achieves the best scores on 5 difficult simulated robotics tasks, in which Dreamer suffers from object vanishing.
△ Less
Submitted 11 March, 2021; v1 submitted 28 July, 2020;
originally announced July 2020.
-
PlaNet of the Bayesians: Reconsidering and Improving Deep Planning Network by Incorporating Bayesian Inference
Authors:
Masashi Okada,
Norio Kosaka,
Tadahiro Taniguchi
Abstract:
In the present paper, we propose an extension of the Deep Planning Network (PlaNet), also referred to as PlaNet of the Bayesians (PlaNet-Bayes). There has been a growing demand in model predictive control (MPC) in partially observable environments in which complete information is unavailable because of, for example, lack of expensive sensors. PlaNet is a promising solution to realize such latent M…
▽ More
In the present paper, we propose an extension of the Deep Planning Network (PlaNet), also referred to as PlaNet of the Bayesians (PlaNet-Bayes). There has been a growing demand in model predictive control (MPC) in partially observable environments in which complete information is unavailable because of, for example, lack of expensive sensors. PlaNet is a promising solution to realize such latent MPC, as it is used to train state-space models via model-based reinforcement learning (MBRL) and to conduct planning in the latent space. However, recent state-of-the-art strategies mentioned in MBRR literature, such as involving uncertainty into training and planning, have not been considered, significantly suppressing the training performance. The proposed extension is to make PlaNet uncertainty-aware on the basis of Bayesian inference, in which both model and action uncertainty are incorporated. Uncertainty in latent models is represented using a neural network ensemble to approximately infer model posteriors. The ensemble of optimal action candidates is also employed to capture multimodal uncertainty in the optimality. The concept of the action ensemble relies on a general variational inference MPC (VI-MPC) framework and its instance, probabilistic action ensemble with trajectory sampling (PaETS). In this paper, we extend VI-MPC and PaETS, which have been originally introduced in previous literature, to address partially observable cases. We experimentally compare the performances on continuous control tasks, and conclude that our method can consistently improve the asymptotic performance compared with PlaNet.
△ Less
Submitted 29 February, 2020;
originally announced March 2020.
-
Domain-Adversarial and Conditional State Space Model for Imitation Learning
Authors:
Ryo Okumura,
Masashi Okada,
Tadahiro Taniguchi
Abstract:
State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods…
▽ More
State representation learning (SRL) in partially observable Markov decision processes has been studied to learn abstract features of data useful for robot control tasks. For SRL, acquiring domain-agnostic states is essential for achieving efficient imitation learning. Without these states, imitation learning is hampered by domain-dependent information useless for control. However, existing methods fail to remove such disturbances from the states when the data from experts and agents show large domain shifts. To overcome this issue, we propose a domain-adversarial and conditional state space model (DAC-SSM) that enables control systems to obtain domain-agnostic and task- and dynamics-aware states. DAC-SSM jointly optimizes the state inference, observation reconstruction, forward dynamics, and reward models. To remove domain-dependent information from the states, the model is trained with domain discriminators in an adversarial manner, and the reconstruction is conditioned on domain labels. We experimentally evaluated the model predictive control performance via imitation learning for continuous control of sparse reward tasks in simulators and compared it with the performance of the existing SRL method. The agents from DAC-SSM achieved performance comparable to experts and more than twice the baselines. We conclude domain-agnostic states are essential for imitation learning that has large domain shifts and can be obtained using DAC-SSM.
△ Less
Submitted 4 June, 2021; v1 submitted 30 January, 2020;
originally announced January 2020.
-
Data-Dependence of Plateau Phenomenon in Learning with Neural Network --- Statistical Mechanical Analysis
Authors:
Yuki Yoshida,
Masato Okada
Abstract:
The plateau phenomenon, wherein the loss value stops decreasing during the process of learning, has been reported by various researchers. The phenomenon is actively inspected in the 1990s and found to be due to the fundamental hierarchical structure of neural network models. Then the phenomenon has been thought as inevitable. However, the phenomenon seldom occurs in the context of recent deep lear…
▽ More
The plateau phenomenon, wherein the loss value stops decreasing during the process of learning, has been reported by various researchers. The phenomenon is actively inspected in the 1990s and found to be due to the fundamental hierarchical structure of neural network models. Then the phenomenon has been thought as inevitable. However, the phenomenon seldom occurs in the context of recent deep learning. There is a gap between theory and reality. In this paper, using statistical mechanical formulation, we clarified the relationship between the plateau phenomenon and the statistical property of the data learned. It is shown that the data whose covariance has small and dispersed eigenvalues tend to make the plateau phenomenon inconspicuous.
△ Less
Submitted 10 January, 2020;
originally announced January 2020.
-
Variational Inference MPC for Bayesian Model-based Reinforcement Learning
Authors:
Masashi Okada,
Tadahiro Taniguchi
Abstract:
In recent studies on model-based reinforcement learning (MBRL), incorporating uncertainty in forward dynamics is a state-of-the-art strategy to enhance learning performance, making MBRLs competitive to cutting-edge model free methods, especially in simulated robotics tasks. Probabilistic ensembles with trajectory sampling (PETS) is a leading type of MBRL, which employs Bayesian inference to dynami…
▽ More
In recent studies on model-based reinforcement learning (MBRL), incorporating uncertainty in forward dynamics is a state-of-the-art strategy to enhance learning performance, making MBRLs competitive to cutting-edge model free methods, especially in simulated robotics tasks. Probabilistic ensembles with trajectory sampling (PETS) is a leading type of MBRL, which employs Bayesian inference to dynamics modeling and model predictive control (MPC) with stochastic optimization via the cross entropy method (CEM). In this paper, we propose a novel extension to the uncertainty-aware MBRL. Our main contributions are twofold: Firstly, we introduce a variational inference MPC, which reformulates various stochastic methods, including CEM, in a Bayesian fashion. Secondly, we propose a novel instance of the framework, called probabilistic action ensembles with trajectory sampling (PaETS). As a result, our Bayesian MBRL can involve multimodal uncertainties both in dynamics and optimal trajectories. In comparison to PETS, our method consistently improves asymptotic performance on several challenging locomotion tasks.
△ Less
Submitted 6 October, 2019; v1 submitted 7 July, 2019;
originally announced July 2019.
-
Long-tailed distributions of inter-event times as mixtures of exponential distributions
Authors:
Makoto Okada,
Kenji Yamanishi,
Naoki Masuda
Abstract:
Inter-event times of various human behavior are apparently non-Poissonian and obey long-tailed distributions as opposed to exponential distributions, which correspond to Poisson processes. It has been suggested that human individuals may switch between different states in each of which they are regarded to generate events obeying a Poisson process. If this is the case, inter-event times should app…
▽ More
Inter-event times of various human behavior are apparently non-Poissonian and obey long-tailed distributions as opposed to exponential distributions, which correspond to Poisson processes. It has been suggested that human individuals may switch between different states in each of which they are regarded to generate events obeying a Poisson process. If this is the case, inter-event times should approximately obey a mixture of exponential distributions with different parameter values. In the present study, we introduce the minimum description length principle to compare mixtures of exponential distributions with different numbers of components (i.e., constituent exponential distributions). Because these distributions violate the identifiability property, one is mathematically not allowed to apply the Akaike or Bayes information criteria to their maximum likelihood estimator to carry out model selection. We overcome this theoretical barrier by applying a minimum description principle to joint likelihoods of the data and latent variables. We show that mixtures of exponential distributions with a few components are selected as opposed to more complex mixtures in various data sets and that the fitting accuracy is comparable to that of state-of-the-art algorithms to fit power-law distributions to data. Our results lend support to Poissonian explanations of apparently non-Poissonian human behavior.
△ Less
Submitted 26 February, 2020; v1 submitted 30 April, 2019;
originally announced May 2019.
-
Bayesian Spectral Deconvolution Based on Poisson Distribution: Bayesian Measurement and Virtual Measurement Analytics (VMA)
Authors:
Kenji Nagata,
Yoh-ichi Mototake,
Rei Muraoka,
Takehiko Sasaki,
Masato Okada
Abstract:
In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physica…
▽ More
In this paper, we propose a new method of Bayesian measurement for spectral deconvolution, which regresses spectral data into the sum of unimodal basis function such as Gaussian or Lorentzian functions. Bayesian measurement is a framework for considering not only the target physical model but also the measurement model as a probabilistic model, and enables us to estimate the parameter of a physical model with its confidence interval through a Bayesian posterior distribution given a measurement data set. The measurement with Poisson noise is one of the most effective system to apply our proposed method. Since the measurement time is strongly related to the signal-to-noise ratio for the Poisson noise model, Bayesian measurement with Poisson noise model enables us to clarify the relationship between the measurement time and the limit of estimation. In this study, we establish the probabilistic model with Poisson noise for spectral deconvolution. Bayesian measurement enables us to perform virtual and computer simulation for a certain measurement through the established probabilistic model. This property is called "Virtual Measurement Analytics(VMA)" in this paper. We also show that the relationship between the measurement time and the limit of estimation can be extracted by using the proposed method in a simulation of synthetic data and real data for XPS measurement of MoS$_2$.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Statistical mechanical analysis of sparse linear regression as a variable selection problem
Authors:
Tomoyuki Obuchi,
Yoshinori Nakanishi-Ohno,
Masato Okada,
Yoshiyuki Kabashima
Abstract:
An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value…
▽ More
An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value of fit error to given data which is assumed to be generated from a linear process using the design matrix. This yields the typical achievable limit of the fit error when solving a representative $\ell_0$ problem and includes the presence of unfavourable phase transitions preventing local search algorithms from reaching the minimum-error configuration. The associated phase diagrams are presented. A noteworthy outcome of the phase diagrams is that there exists a wide parameter region where any phase transition is absent from the high temperature to the lowest temperature at which the minimum-error configuration or the ground state is reached. This implies that certain local search algorithms can find the ground state with moderate computational costs in that region. Another noteworthy result is the presence of the random first-order transition in the strong noise case. The theoretical evaluation of the entropy is confirmed by extensive numerical methods using the exchange Monte Carlo and the multi-histogram methods. Another numerical test based on a metaheuristic optimisation algorithm called simulated annealing is conducted, which well supports the theoretical predictions on the local search algorithms. In the successful region with no phase transition, the computational cost of the simulated annealing to reach the ground state is estimated as the third order polynomial of the model dimensionality.
△ Less
Submitted 10 September, 2018; v1 submitted 29 May, 2018;
originally announced May 2018.
-
Concept Formation and Dynamics of Repeated Inference in Deep Generative Models
Authors:
Yoshihiro Nagano,
Ryo Karakida,
Masato Okada
Abstract:
Deep generative models are reported to be useful in broad applications including image generation. Repeated inference between data space and latent space in these models can denoise cluttered images and improve the quality of inferred results. However, previous studies only qualitatively evaluated image outputs in data space, and the mechanism behind the inference has not been investigated. The pu…
▽ More
Deep generative models are reported to be useful in broad applications including image generation. Repeated inference between data space and latent space in these models can denoise cluttered images and improve the quality of inferred results. However, previous studies only qualitatively evaluated image outputs in data space, and the mechanism behind the inference has not been investigated. The purpose of the current study is to numerically analyze changes in activity patterns of neurons in the latent space of a deep generative model called a "variational auto-encoder" (VAE). What kinds of inference dynamics the VAE demonstrates when noise is added to the input data are identified. The VAE embeds a dataset with clear cluster structures in the latent space and the center of each cluster of multiple correlated data points (memories) is referred as the concept. Our study demonstrated that transient dynamics of inference first approaches a concept, and then moves close to a memory. Moreover, the VAE revealed that the inference dynamics approaches a more abstract concept to the extent that the uncertainty of input data increases due to noise. It was demonstrated that by increasing the number of the latent variables, the trend of the inference dynamics to approach a concept can be enhanced, and the generalization ability of the VAE can be improved.
△ Less
Submitted 12 December, 2017;
originally announced December 2017.
-
Exhaustive search for sparse variable selection in linear regression
Authors:
Yasuhiko Igarashi,
Hikaru Takenaka,
Yoshinori Nakanishi-Ohno,
Makoto Uemura,
Shiro Ikeda,
Masato Okada
Abstract:
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods…
▽ More
We propose a K-sparse exhaustive search (ES-K) method and a K-sparse approximate exhaustive search method (AES-K) for selecting variables in linear regression. With these methods, K-sparse combinations of variables are tested exhaustively assuming that the optimal combination of explanatory variables is K-sparse. By collecting the results of exhaustively computing ES-K, various approximate methods for selecting sparse variables can be summarized as density of states. With this density of states, we can compare different methods for selecting sparse variables such as relaxation and sampling. For large problems where the combinatorial explosion of explanatory variables is crucial, the AES-K method enables density of states to be effectively reconstructed by using the replica-exchange Monte Carlo method and the multiple histogram method. Applying the ES-K and AES-K methods to type Ia supernova data, we confirmed the conventional understanding in astronomy when an appropriate K is given beforehand. However, we found the difficulty to determine K from the data. Using virtual measurement and analysis, we argue that this is caused by data shortage.
△ Less
Submitted 7 July, 2017;
originally announced July 2017.
-
Statistical Mechanics of Node-perturbation Learning with Noisy Baseline
Authors:
Kazuyuki Hara,
Kentaro Katahira,
Masato Okada
Abstract:
Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is call…
▽ More
Node-perturbation learning is a type of statistical gradient descent algorithm that can be applied to problems where the objective function is not explicitly formulated, including reinforcement learning. It estimates the gradient of an objective function by using the change in the object function in response to the perturbation. The value of the objective function for an unperturbed output is called a baseline. Cho et al. proposed node-perturbation learning with a noisy baseline. In this paper, we report on building the statistical mechanics of Cho's model and on deriving coupled differential equations of order parameters that depict learning dynamics. We also show how to derive the generalization error by solving the differential equations of order parameters. On the basis of the results, we show that Cho's results are also apply in general cases and show some general performances of Cho's model.
△ Less
Submitted 20 June, 2017;
originally announced June 2017.
-
Simultaneous Estimation of Noise Variance and Number of Peaks in Bayesian Spectral Deconvolution
Authors:
Satoru Tokuda,
Kenji Nagata,
Masato Okada
Abstract:
The heuristic identification of peaks from noisy complex spectra often leads to misunderstanding of the physical and chemical properties of matter. In this paper, we propose a framework based on Bayesian inference, which enables us to separate multipeak spectra into single peaks statistically and consists of two steps. The first step is estimating both the noise variance and the number of peaks as…
▽ More
The heuristic identification of peaks from noisy complex spectra often leads to misunderstanding of the physical and chemical properties of matter. In this paper, we propose a framework based on Bayesian inference, which enables us to separate multipeak spectra into single peaks statistically and consists of two steps. The first step is estimating both the noise variance and the number of peaks as hyperparameters based on Bayes free energy, which generally is not analytically tractable. The second step is fitting the parameters of each peak function to the given spectrum by calculating the posterior density, which has a problem of local minima and saddles since multipeak models are nonlinear and hierarchical. Our framework enables the escape from local minima or saddles by using the exchange Monte Carlo method and calculates Bayes free energy via the multiple histogram method. We discuss a simulation demonstrating how efficient our framework is and show that estimating both the noise variance and the number of peaks prevents overfitting, overpenalizing, and misunderstanding the precision of parameter estimation.
△ Less
Submitted 15 December, 2016; v1 submitted 26 July, 2016;
originally announced July 2016.