Search | arXiv e-print repository

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model

Authors: Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka

Abstract: A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain… ▽ More A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds. △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2411.19553 [pdf, other]

Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm

Authors: Xiaosi Gu, Tomoyuki Obuchi

Abstract: Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by m… ▽ More Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message passing and state evolution methods, which are widely used in high-dimensional settings and originate from statistical mechanics. We deal with two estimation approaches: the Bayesian one and the $\ell_2$-regularized maximum likelihood estimation (RMLE). We conduct a comprehensive comparison between these two approaches, examining aspects such as the global phase diagram, estimation error for the parameters, and prediction error for the labels. A specific comparison is made between the Bayes-optimal (BO) estimator and RMLE, as the BO setting provides optimal estimation performance and is ideal as a benchmark. Our analysis shows that with appropriate regularizations, RMLE can achieve near-optimal performance in terms of both the estimation error and prediction error, especially when there is a large amount of unlabeled data. These results demonstrate that the $\ell_2$ regularization term plays an effective role in estimation and prediction in SSL approaches. △ Less

Submitted 12 March, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

Comments: 48 pages, 16 figures

arXiv:2409.17704 [pdf, other]

Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis

Authors: Koki Okajima, Tomoyuki Obuchi

Abstract: Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Such methods have been adopted in the context of high-dimensional sparse regression, and some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso are such examples. These algorithms require the statistician to select hyperparameter… ▽ More Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Such methods have been adopted in the context of high-dimensional sparse regression, and some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso are such examples. These algorithms require the statistician to select hyperparameters that control the extent and type of information transfer from related datasets. However, selection strategies for these hyperparameters, as well as the impact of these choices on the algorithm's performance, have been largely unexplored. To address this, we conduct a thorough, precise study of the algorithm in a high-dimensional setting via an asymptotic analysis using the replica method. Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance, implying that efforts for hyperparameter selection can be significantly reduced. Our theoretical findings are also empirically supported by applications on real-world and semi-artificial datasets using the IMDb and MNIST datasets, respectively. △ Less

Submitted 30 January, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

Comments: 23 pages, 9 figures

Journal ref: Transactions on Machine Learning Research (2025). < https://openreview.net/forum?id=ccu0M3nmlF>

arXiv:2409.05598 [pdf, ps, other]

When resampling/reweighting improves feature learning in imbalanced classification?: A toy-model study

Authors: Tomoyuki Obuchi, Toshiyuki Tanaka

Abstract: A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mec… ▽ More A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mechanics is employed. The result shows that there exists a case in which the no resampling/reweighting situation gives the best feature learning performance irrespectively of the choice of losses or classifiers, supporting recent findings in Cao et al. (2019); Kang et al. (2019). It is also revealed that the key of the result is the symmetry of the loss and the problem setting. Inspired by this, we propose a further simplified model exhibiting the same property in the multiclass setting. These clarify when the class-wise resampling/reweighting becomes effective in imbalanced classification. △ Less

Submitted 22 April, 2025; v1 submitted 9 September, 2024; originally announced September 2024.

Comments: 33 pages, 14 figures

Journal ref: Transactions on Machine Learning Research, 2025. Available at: https://openreview.net/forum?id=spqbyeGyLR

arXiv:2111.04931 [pdf, other]

doi 10.1103/PhysRevE.105.034403

Assessing transfer entropy from biochemical data

Authors: Takuya Imaizumi, Nobuhisa Umeki, Ryo Yoshizawa, Tomoyuki Obuchi, Yasushi Sako, Yoshiyuki Kabashima

Abstract: We address the problem of evaluating the transfer entropy (TE) produced by biochemical reactions from experimentally measured data. Although these reactions are generally non-linear and non-stationary processes making it challenging to achieve accurate modeling, Gaussian approximation can facilitate the TE assessment only by estimating covariance matrices using multiple data obtained from simultan… ▽ More We address the problem of evaluating the transfer entropy (TE) produced by biochemical reactions from experimentally measured data. Although these reactions are generally non-linear and non-stationary processes making it challenging to achieve accurate modeling, Gaussian approximation can facilitate the TE assessment only by estimating covariance matrices using multiple data obtained from simultaneously measured time series representing the activation levels of biomolecules such as proteins. Nevertheless, the non-stationary nature of biochemical signals makes it difficult to theoretically assess the sampling distributions of TE, which are necessary for evaluating the statistical confidence and significance of the data-driven estimates. We resolve this difficulty by computationally assessing the sampling distributions using techniques from computational statistics. The computational methods are tested by using them in analyzing data generated from a theoretically tractable time-varying signal model, which leads to the development of a method to screen only statistically significant estimates. The usefulness of the developed method is examined by applying it to real biological data experimentally measured from the ERBB-RAS-MAPK system that superintends diverse cell fate decisions. A comparison between cells containing wild-type and mutant proteins exhibits a distinct difference in the time evolution of TE while apparent difference is hardly found in average profiles of the raw signals. Such comparison may help in unveiling important pathways of biochemical reactions. △ Less

Submitted 8 March, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: 30 pages, 11 figures

Journal ref: Physical Review E 105, 034403 (2022)

arXiv:2110.08500 [pdf, other]

On Model Selection Consistency of Lasso for High-Dimensional Ising Models

Authors: Xiangming Meng, Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: We theoretically analyze the model selection consistency of least absolute shrinkage and selection operator (Lasso), both with and without post-thresholding, for high-dimensional Ising models. For random regular (RR) graphs of size $p$ with regular node degree $d$ and uniform couplings $θ_0$, it is rigorously proved that Lasso \textit{without post-thresholding} is model selection consistent in the… ▽ More We theoretically analyze the model selection consistency of least absolute shrinkage and selection operator (Lasso), both with and without post-thresholding, for high-dimensional Ising models. For random regular (RR) graphs of size $p$ with regular node degree $d$ and uniform couplings $θ_0$, it is rigorously proved that Lasso \textit{without post-thresholding} is model selection consistent in the whole paramagnetic phase with the same order of sample complexity $n=Ω{(d^3\log{p})}$ as that of $\ell_1$-regularized logistic regression ($\ell_1$-LogR). This result is consistent with the conjecture in Meng, Obuchi, and Kabashima 2021 using the non-rigorous replica method from statistical physics and thus complements it with a rigorous proof. For general tree-like graphs, it is demonstrated that the same result as RR graphs can be obtained under mild assumptions of the dependency condition and incoherence condition. Moreover, we provide a rigorous proof of the model selection consistency of Lasso with post-thresholding for general tree-like graphs in the paramagnetic phase without further assumptions on the dependency and incoherence conditions. Experimental results agree well with our theoretical analysis. △ Less

Submitted 17 February, 2023; v1 submitted 16 October, 2021; originally announced October 2021.

Comments: AISTATS2023, camera-ready version

arXiv:2102.03988 [pdf, other]

doi 10.1088/1742-5468/ac9831

Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis

Authors: Xiangming Meng, Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: We theoretically analyze the typical learning performance of $\ell_{1}$-regularized linear regression ($\ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $\ell_1$-LinR is obtained. Remarkably, despite the model misspecification, $\ell_1$-L… ▽ More We theoretically analyze the typical learning performance of $\ell_{1}$-regularized linear regression ($\ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $\ell_1$-LinR is obtained. Remarkably, despite the model misspecification, $\ell_1$-LinR is model selection consistent with the same order of sample complexity as $\ell_{1}$-regularized logistic regression ($\ell_1$-LogR), i.e., $M=\mathcal{O}\left(\log N\right)$, where $N$ is the number of variables of the Ising model. Moreover, we provide an efficient method to accurately predict the non-asymptotic behavior of $\ell_1$-LinR for moderate $M, N$, such as precision and recall. Simulations show a fairly good agreement between theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper mainly focuses on $\ell_1$-LinR, our method is readily applicable for precisely characterizing the typical learning performances of a wide class of $\ell_{1}$-regularized $M$-estimators including $\ell_1$-LogR and interaction screening. △ Less

Submitted 1 November, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

Comments: Accepted to NeurIPS 2021. Camera-ready version with supplementary materials

arXiv:2008.08342 [pdf, other]

doi 10.1088/1742-5468/abfa10

Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized Linear Estimator

Authors: Xiangming Meng, Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under t… ▽ More The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins $N$ is smaller than the dataset size $M$, in the thermodynamic limit $N\to \infty$. Further, to access the underdetermined region $M < N$, we examine the effect of the $\ell_2$ regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a two-stage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0<M/N<1$. Results of extensive numerical experiments support these findings. △ Less

Submitted 23 November, 2020; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: 35 pages, 8 figures

arXiv:2008.03175 [pdf, other]

doi 10.7566/JPSJ.89.124802

Reconstructing Sparse Signals via Greedy Monte-Carlo Search

Authors: Kao Hayashi, Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorith… ▽ More We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorithm is called the greedy Monte-Carlo (GMC) search algorithm. Its performance is examined via numerical experiments, which suggests that in the noiseless case, GMC can achieve perfect reconstruction in undersampling situations of a reasonable level: it can outperform the $\ell_1$ relaxation but does not reach the algorithmic limit of MC-based methods theoretically clarified by an earlier analysis. The necessary computational time is also examined and compared with that of an algorithm using simulated annealing. Additionally, experiments on the noisy case are conducted on synthetic datasets and on a real-world dataset, supporting the practicality of GMC. △ Less

Submitted 29 January, 2021; v1 submitted 7 August, 2020; originally announced August 2020.

Comments: 15 pages, 4 figures

arXiv:1912.11591 [pdf, other]

doi 10.1088/1742-5468/ab8c3a

Learning performance in inverse Ising problems with sparse teacher couplings

Authors: Alia Abbara, Yoshiyuki Kabashima, Tomoyuki Obuchi, Yingying Xu

Abstract: We investigate the learning performance of the pseudolikelihood maximization method for inverse Ising problems. In the teacher-student scenario under the assumption that the teacher's couplings are sparse and the student does not know the graphical structure, the learning curve and order parameters are assessed in the typical case using the replica and cavity methods from statistical mechanics. Ou… ▽ More We investigate the learning performance of the pseudolikelihood maximization method for inverse Ising problems. In the teacher-student scenario under the assumption that the teacher's couplings are sparse and the student does not know the graphical structure, the learning curve and order parameters are assessed in the typical case using the replica and cavity methods from statistical mechanics. Our formulation is also applicable to a certain class of cost functions having locality; the standard likelihood does not belong to that class. The derived analytical formulas indicate that the perfect inference of the presence/absence of the teacher's couplings is possible in the thermodynamic limit taking the number of spins $N$ as infinity while keeping the dataset size $M$ proportional to $N$, as long as $α=M/N > 2$. Meanwhile, the formulas also show that the estimated coupling values corresponding to the truly existing ones in the teacher tend to be overestimated in the absolute value, manifesting the presence of estimation bias. These results are considered to be exact in the thermodynamic limit on locally tree-like networks, such as the regular random or Erdős--Rényi graphs. Numerical simulation results fully support the theoretical predictions. Additional biases in the estimators on loopy graphs are also discussed. △ Less

Submitted 1 May, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: 29 pages, 8 figures

arXiv:1906.06002 [pdf, ps, other]

doi 10.1088/1751-8121/ab57a7

Empirical Bayes Method for Boltzmann Machines

Authors: Muneki Yasuda, Tomoyuki Obuchi

Abstract: In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empiric… ▽ More In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empirical Bayes likelihood function involves intractable integrations of the partition function. The proposed algorithm avoids this computational problem by using the replica method and the Plefka expansion. Our method does not require any iterative procedures and is quite simple and fast, though it introduces a bias to the estimate, which exhibits an unnatural behavior with respect to the size of the dataset. This peculiar behavior is supposed to be due to the approximate treatment by the Plefka expansion. A possible extension to overcome this behavior is also discussed. △ Less

Submitted 7 September, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

Journal ref: Journal of Physics A: Mathematical and Theoretical, vol.53, 014004, 2019

arXiv:1902.10375 [pdf, ps, other]

doi 10.1088/1751-8121/ab3e89

Cross validation in sparse linear regression with piecewise continuous nonconvex penalties and its acceleration

Authors: Tomoyuki Obuchi, Ayaka Sakata

Abstract: We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the SCAD penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each… ▽ More We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the SCAD penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each component of the design matrix is given as an independent and identically distributed (i.i.d.) Gaussian variable. This clarifies the superiority of the SCAD estimator compared with $\ell_1$ in a wide parameter range, although the nonconvex nature of the penalty tends to lead to solution multiplicity in certain regions. This multiplicity is shown to be connected to replica symmetry breaking in the spin-glass theory. We also show that the global minimum of the mean square error between the estimator and the true signal is located in the replica symmetric phase. Second, we develop an approximate formula efficiently computing the cross-validation error without actually conducting the cross-validation, which is also applicable to the non-i.i.d. design matrices. It is shown that this formula is only applicable to the unique solution region and tends to be unstable in the multiple solution region. We implement instability detection procedures, which allows the approximate formula to stand alone and resultantly enables us to draw phase diagrams for any specific dataset. Third, we propose an annealing procedure, called nonconvexity annealing, to obtain the solution path efficiently. Numerical simulations are conducted on simulated datasets to examine these results to verify the theoretical results consistency and the approximate formula efficiency. Another numerical experiment on a real-world dataset is conducted; its results are consistent with those of earlier studies using the $\ell_0$ formulation. △ Less

Submitted 25 December, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

Comments: 33 pages, 18 figures. MATLAB codes implementing the proposed method are distributed in https://github.com/T-Obuchi/SLRpackage_AcceleratedCV_matlab

arXiv:1902.07436 [pdf, other]

Perfect reconstruction of sparse signals with piecewise continuous nonconvex penalties and nonconvexity control

Authors: Ayaka Sakata, Tomoyuki Obuchi

Abstract: We consider compressed sensing formulated as a minimization problem of nonconvex sparse penalties, Smoothly Clipped Absolute deviation (SCAD) and Minimax Concave Penalty (MCP). The nonconvexity of these penalties is controlled by nonconvexity parameters, and L1 penalty is contained as a limit with respect to these parameters. The analytically derived reconstruction limit overcomes that of L1 and t… ▽ More We consider compressed sensing formulated as a minimization problem of nonconvex sparse penalties, Smoothly Clipped Absolute deviation (SCAD) and Minimax Concave Penalty (MCP). The nonconvexity of these penalties is controlled by nonconvexity parameters, and L1 penalty is contained as a limit with respect to these parameters. The analytically derived reconstruction limit overcomes that of L1 and the algorithmic limit in the Bayes-optimal setting, when the nonconvexity parameters have suitable values. However, for small nonconvexity parameters, where the reconstruction of the relatively dense signals is theoretically guaranteed, the corresponding approximate message passing (AMP) cannot achieve perfect reconstruction. We identify that the shrinks in the basin of attraction to the perfect reconstruction causes the discrepancy between the AMP and corresponding theory using state evolution. A part of the discrepancy is resolved by introducing the control of the nonconvexity parameters to guide the AMP trajectory to the basin of the attraction. △ Less

Submitted 5 June, 2021; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: 25 pages, 17 figures

arXiv:1810.11908 [pdf, other]

doi 10.1088/1742-5468/ab3456

Mean-field theory of graph neural networks in graph partitioning

Authors: Tatsuro Kawamoto, Masashi Tsubaki, Tomoyuki Obuchi

Abstract: A theoretical performance analysis of the graph neural network (GNN) is presented. For classification tasks, the neural network approach has the advantage in terms of flexibility that it can be employed in a data-driven manner, whereas Bayesian inference requires the assumption of a specific model. A fundamental question is then whether GNN has a high accuracy in addition to this flexibility. More… ▽ More A theoretical performance analysis of the graph neural network (GNN) is presented. For classification tasks, the neural network approach has the advantage in terms of flexibility that it can be employed in a data-driven manner, whereas Bayesian inference requires the assumption of a specific model. A fundamental question is then whether GNN has a high accuracy in addition to this flexibility. Moreover, whether the achieved performance is predominately a result of the backpropagation or the architecture itself is a matter of considerable interest. To gain a better insight into these questions, a mean-field theory of a minimal GNN architecture is developed for the graph partitioning problem. This demonstrates a good agreement with numerical experiments. △ Less

Submitted 28 October, 2018; originally announced October 2018.

Comments: 16 pages, 6 figures, Thirty-second Conference on Neural Information Processing Systems (NIPS2018)

arXiv:1805.11259 [pdf, other]

doi 10.1088/1742-5468/aae02c

Statistical mechanical analysis of sparse linear regression as a variable selection problem

Authors: Tomoyuki Obuchi, Yoshinori Nakanishi-Ohno, Masato Okada, Yoshiyuki Kabashima

Abstract: An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value… ▽ More An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value of fit error to given data which is assumed to be generated from a linear process using the design matrix. This yields the typical achievable limit of the fit error when solving a representative $\ell_0$ problem and includes the presence of unfavourable phase transitions preventing local search algorithms from reaching the minimum-error configuration. The associated phase diagrams are presented. A noteworthy outcome of the phase diagrams is that there exists a wide parameter region where any phase transition is absent from the high temperature to the lowest temperature at which the minimum-error configuration or the ground state is reached. This implies that certain local search algorithms can find the ground state with moderate computational costs in that region. Another noteworthy result is the presence of the random first-order transition in the strong noise case. The theoretical evaluation of the entropy is confirmed by extensive numerical methods using the exchange Monte Carlo and the multi-histogram methods. Another numerical test based on a metaheuristic optimisation algorithm called simulated annealing is conducted, which well supports the theoretical predictions on the local search algorithms. In the successful region with no phase transition, the computational cost of the simulated annealing to reach the ground state is estimated as the third order polynomial of the model dimensionality. △ Less

Submitted 10 September, 2018; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: 39 pages, 14 figures

arXiv:1805.07061 [pdf, ps, other]

doi 10.1088/1742-5468/ab3219

Objective and efficient inference for couplings in neuronal networks

Authors: Yu Terada, Tomoyuki Obuchi, Takuya Isomura, Yoshiyuki Kabashima

Abstract: Inferring directional couplings from the spike data of networks is desired in various scientific fields such as neuroscience. Here, we apply a recently proposed objective procedure to the spike data obtained from the Hodgkin--Huxley type models and in vitro neuronal networks cultured in a circular structure. As a result, we succeed in reconstructing synaptic connections accurately from the evoked… ▽ More Inferring directional couplings from the spike data of networks is desired in various scientific fields such as neuroscience. Here, we apply a recently proposed objective procedure to the spike data obtained from the Hodgkin--Huxley type models and in vitro neuronal networks cultured in a circular structure. As a result, we succeed in reconstructing synaptic connections accurately from the evoked activity as well as the spontaneous one. To obtain the results, we invent an analytic formula approximately implementing a method of screening relevant couplings. This significantly reduces the computational cost of the screening method employed in the proposed objective procedure, making it possible to treat large-size systems as in this study. △ Less

Submitted 18 May, 2018; originally announced May 2018.

arXiv:1803.04738 [pdf, ps, other]

Inferring neuronal couplings from spiking data using a systematic procedure with a statistical criterion

Authors: Yu Terada, Tomoyuki Obuchi, Takuya Isomura, Yoshiyuki Kabashima

Abstract: Recent remarkable advances in the experimental techniques have provided a background for inferring neuronal couplings from point process data that includes a great number of neurons. Here, we propose a systematic procedure for pre- and post-processing generic point process data in an objective manner, to handle data in the framework of a binary simple statistical model, the Ising or generalized Mc… ▽ More Recent remarkable advances in the experimental techniques have provided a background for inferring neuronal couplings from point process data that includes a great number of neurons. Here, we propose a systematic procedure for pre- and post-processing generic point process data in an objective manner, to handle data in the framework of a binary simple statistical model, the Ising or generalized McCulloch--Pitts model. The procedure involves two steps: (1) determining time-bin size for transforming the point-process data into discrete-time binary data and (2) screening relevant couplings from the estimated couplings. For the first step, we decide the optimal time-bin size by introducing the null hypothesis that all neurons would fire independently, then choosing a time-bin size so that the null hypothesis is rejected with the most strict criterion. The likelihood associated with the null hypothesis is analytically evaluated and used for the rejection process. For the second post-processing step, after a certain estimator of coupling is obtained based on the pre-processed dataset, the estimate is compared with many other estimates derived from datasets obtained by randomizing the original dataset in the time direction. We accept the original estimate as relevant only if its absolute value is sufficiently larger than them of randomized datasets. These manipulations suppress false positive couplings induced by statistical noise. We apply this inference procedure to spiking data from synthetic and in vitro neuronal networks. The results show that the proposed procedure identifies the presence/absence of synaptic couplings fairly well including their signs, for the synthetic and experimental data. In particular, the results support that we can infer the physical connections of underlying systems in favorable situations, even when using the simple statistical model. △ Less

Submitted 7 July, 2020; v1 submitted 13 March, 2018; originally announced March 2018.

arXiv:1802.10254 [pdf, ps, other]

Semi-Analytic Resampling in Lasso

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The… ▽ More An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The proposed method is based on a message passing type algorithm, and its fast convergence is guaranteed by the state evolution analysis, when covariates are provided as zero-mean independently and identically distributed Gaussian random variables. It is employed to implement bootstrapped Lasso (Bolasso) and stability selection, both of which are variable selection methods using resampling in conjunction with Lasso, and resolves their disadvantage regarding computational cost. To examine approximation accuracy and efficiency, numerical experiments were carried out using simulated datasets. Moreover, an application to a real-world dataset, the wine quality dataset, is presented. To process such real-world datasets, an objective criterion for determining the relevance of selected variables is also introduced by the addition of noise variables and resampling. △ Less

Submitted 10 December, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

Comments: 33 pages, 10 figures, MATLAB codes implementing the proposed method are distributed in https://github.com/T-Obuchi/AMPR_lasso_matlab

arXiv:1711.05420 [pdf, ps, other]

Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the… ▽ More We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. An extension to the elastic net regularization is also addressed. The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository. △ Less

Submitted 18 September, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

Comments: 30 pages, 9 figures. MATLAB and python codes implementing the formula derived in the manuscript are distributed in https://github.com/T-Obuchi/AcceleratedCVonMLR_matlab and https://github.com/T-Obuchi/AcceleratedCVonMLR_python

arXiv:1702.05396 [pdf, ps, other]

doi 10.1103/PhysRevB.95.174305

Complex semiclassical analysis of the Loschmidt amplitude and dynamical quantum phase transitions

Authors: Tomoyuki Obuchi, Sei Suzuki, Kazutaka Takahashi

Abstract: We propose a new computational method of the Loschmidt amplitude in a generic spin system on the basis of the complex semiclassical analysis on the spin-coherent state path integral. We demonstrate how the dynamical transitions emerge in the time evolution of the Loschmidt amplitude for the infinite-range transverse Ising model with a longitudinal field, exposed by a quantum quench of the transver… ▽ More We propose a new computational method of the Loschmidt amplitude in a generic spin system on the basis of the complex semiclassical analysis on the spin-coherent state path integral. We demonstrate how the dynamical transitions emerge in the time evolution of the Loschmidt amplitude for the infinite-range transverse Ising model with a longitudinal field, exposed by a quantum quench of the transverse field $Γ$ from $\infty$ or $0$. For both initial conditions, we obtain the dynamical phase diagrams that show the presence or absence of the dynamical transition in the plane of transverse field after a quantum quench and the longitudinal field. The results of semiclassical analysis are verified by numerical experiments. Experimental observation of our findings on the dynamical transition is also discussed. △ Less

Submitted 31 May, 2017; v1 submitted 17 February, 2017; originally announced February 2017.

Comments: 12 pages, 11 figures

Journal ref: Phys. Rev. B 95, 174305 (2017)

arXiv:1612.02807 [pdf, other]

doi 10.1103/PhysRevE.95.042321

Random versus maximum entropy models of neural population activity

Authors: Ulisse Ferrari, Tomoyuki Obuchi, Thierry Mora

Abstract: The principle of maximum entropy provides a useful method for inferring statistical mechanics models from observations in correlated systems, and is widely used in a variety of fields where accurate data are available. While the assumptions underlying maximum entropy are intuitive and appealing, its adequacy for describing complex empirical data has been little studied in comparison to alternative… ▽ More The principle of maximum entropy provides a useful method for inferring statistical mechanics models from observations in correlated systems, and is widely used in a variety of fields where accurate data are available. While the assumptions underlying maximum entropy are intuitive and appealing, its adequacy for describing complex empirical data has been little studied in comparison to alternative approaches. Here data from the collective spiking activity of retinal neurons is reanalysed. The accuracy of the maximum entropy distribution constrained by mean firing rates and pairwise correlations is compared to a random ensemble of distributions constrained by the same observables. In general, maximum entropy approximates the true distribution better than the typical or mean distribution from that ensemble. This advantage improves with population size, with groups as small as 8 being almost always better described by maximum entropy. Failure of maximum entropy to outperform random models is found to be associated with strong correlations in the population. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Journal ref: Phys. Rev. E 95, 042321 (2017)

arXiv:1611.07197 [pdf, ps, other]

doi 10.1371/journal.pone.0188012

Accelerating cross-validation with total variation and its application to super-resolution imaging

Authors: Tomoyuki Obuchi, Shiro Ikeda, Kazunori Akiyama, Yoshiyuki Kabashima

Abstract: We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by $\ell_1$-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality… ▽ More We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by $\ell_1$-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision. △ Less

Submitted 20 November, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

Comments: 14 pages, 4 figures. A Matlab package implementing the approximation formula is available from https://github.com/T-Obuchi/AcceleratedCVon2DTVLR

Journal ref: PLoS ONE 12(12): e0188012 (2017)

arXiv:1610.07733 [pdf, ps, other]

Approximate cross-validation formula for Bayesian linear regression

Authors: Yoshiyuki Kabashima, Tomoyuki Obuchi, Makoto Uemura

Abstract: Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size grows. To resolve this difficulty in the case of Bayesian linear regression, we develop a formula for evaluating the leave-one-out CV error approximately without a… ▽ More Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size grows. To resolve this difficulty in the case of Bayesian linear regression, we develop a formula for evaluating the leave-one-out CV error approximately without actually performing CV. The usefulness of the developed formula is tested by statistical mechanical analysis for a synthetic model. This is confirmed by application to a real-world supernova data set as well. △ Less

Submitted 25 October, 2016; originally announced October 2016.

Comments: 5 pages, 2 figures, invited paper for Allerton2016 conference

arXiv:1605.09490 [pdf, ps, other]

doi 10.1088/1742-5468/2016/11/113502

Relative species abundance of replicator dynamics with sparse interactions

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima, Kei Tokita

Abstract: A theory of relative species abundance on sparsely-connected networks is presented by investigating the replicator dynamics with symmetric interactions. Sparseness of a network involves difficulty in analyzing the fixed points of the equation, and we avoid this problem by treating large self interaction $u$, which allows us to construct a perturbative expansion. Based on this perturbation, we find… ▽ More A theory of relative species abundance on sparsely-connected networks is presented by investigating the replicator dynamics with symmetric interactions. Sparseness of a network involves difficulty in analyzing the fixed points of the equation, and we avoid this problem by treating large self interaction $u$, which allows us to construct a perturbative expansion. Based on this perturbation, we find that the nature of the interactions is directly connected to the abundance distribution, and some characteristic behaviors, such as multiple peaks in the abundance distribution and all species coexistence at moderate values of $u$, are discovered in a wide class of the distribution of the interactions. The all species coexistence collapses at a critical value of $u$, $u_c$, and this collapsing is regarded as a phase transition. To get more quantitative information, we also construct a non-perturbative theory on random graphs based on techniques of statistical mechanics. The result shows those characteristic behaviors are sustained well even for not large $u$. For even smaller values of $u$, extinct species start to appear and the abundance distribution becomes rounded and closer to a standard functional form. Another interesting finding is the non-monotonic behavior of diversity, which quantifies the number of coexisting species, when changing the ratio of mutualistic relations $Δ$. These results are examined by numerical simulations, and the multiple peaks in the abundance distribution are confirmed to be robust against a certain level of modifications of the problem. The numerical results also show that our theory is exact for the case without extinct species, but becomes less and less precise as the proportion of extinct species grows. △ Less

Submitted 31 May, 2016; originally announced May 2016.

Comments: 27 pages, 14 figures

arXiv:1605.09106 [pdf, ps, other]

doi 10.1103/PhysRevE.94.022312

Multiple peaks of species abundance distributions induced by sparse interactions

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima, Kei Tokita

Abstract: We investigate the replicator dynamics with "sparse" symmetric interactions which represent specialist-specialist interactions in ecological communities. By considering a large self interaction $u$, we conduct a perturbative expansion which manifests that the nature of the interactions has a direct impact on the species abundance distribution. The central results are all species coexistence in a r… ▽ More We investigate the replicator dynamics with "sparse" symmetric interactions which represent specialist-specialist interactions in ecological communities. By considering a large self interaction $u$, we conduct a perturbative expansion which manifests that the nature of the interactions has a direct impact on the species abundance distribution. The central results are all species coexistence in a realistic range of the model parameters and that a certain discrete nature of the interactions induces multiple peaks in the species abundance distribution, providing the possibility of theoretically explaining multiple peaks observed in various field studies. To get more quantitative information, we also construct a non-perturbative theory which becomes exact on tree-like networks if all the species coexist, providing exact critical values of $u$ below which extinct species emerge. Numerical simulations in various different situations are conducted and they clarify the robustness of the presented mechanism of all species coexistence and multiple peaks in the species abundance distributions. △ Less

Submitted 30 May, 2016; originally announced May 2016.

Comments: 6 pages, 5 figures

arXiv:1603.01399 [pdf, ps, other]

Sampling approach to sparse approximation problem: determining degrees of freedom by simulated annealing

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: The approximation of a high-dimensional vector by a small combination of column vectors selected from a fixed matrix has been actively debated in several different disciplines. In this paper, a sampling approach based on the Monte Carlo method is presented as an efficient solver for such problems. Especially, the use of simulated annealing (SA), a metaheuristic optimization algorithm, for determin… ▽ More The approximation of a high-dimensional vector by a small combination of column vectors selected from a fixed matrix has been actively debated in several different disciplines. In this paper, a sampling approach based on the Monte Carlo method is presented as an efficient solver for such problems. Especially, the use of simulated annealing (SA), a metaheuristic optimization algorithm, for determining degrees of freedom (the number of used columns) by cross validation is focused on and tested. Test on a synthetic model indicates that our SA-based approach can find a nearly optimal solution for the approximation problem and, when combined with the CV framework, it can optimize the generalization ability. Its utility is also confirmed by application to a real-world supernova data set. △ Less

Submitted 4 October, 2016; v1 submitted 4 March, 2016; originally announced March 2016.

Comments: 5 pages, 3 figures, Proceedings of Eusipco 2016

arXiv:1601.01074 [pdf, ps, other]

doi 10.1088/1742-6596/699/1/012017

Sparse approximation problem: how rapid simulated annealing succeeds and fails

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: Information processing techniques based on sparseness have been actively studied in several disciplines. Among them, a mathematical framework to approximately express a given dataset by a combination of a small number of basis vectors of an overcomplete basis is termed the {\em sparse approximation}. In this paper, we apply simulated annealing, a metaheuristic algorithm for general optimization pr… ▽ More Information processing techniques based on sparseness have been actively studied in several disciplines. Among them, a mathematical framework to approximately express a given dataset by a combination of a small number of basis vectors of an overcomplete basis is termed the {\em sparse approximation}. In this paper, we apply simulated annealing, a metaheuristic algorithm for general optimization problems, to sparse approximation in the situation where the given data have a planted sparse representation and noise is present. The result in the noiseless case shows that our simulated annealing works well in a reasonable parameter region: the planted solution is found fairly rapidly. This is true even in the case where a common relaxation of the sparse approximation problem, the $\ell_1$-relaxation, is ineffective. On the other hand, when the dimensionality of the data is close to the number of non-zero components, another metastable state emerges, and our algorithm fails to find the planted solution. This phenomenon is associated with a first-order phase transition. In the case of very strong noise, it is no longer meaningful to search for the planted solution. In this situation, our algorithm determines a solution with close-to-minimum distortion fairly quickly. △ Less

Submitted 4 March, 2016; v1 submitted 5 January, 2016; originally announced January 2016.

Comments: 12 pages, 7 figures, a proceedings of HD^3-2015

arXiv:1601.00881 [pdf, ps, other]

doi 10.1088/1742-5468/2016/05/053304

Cross validation in LASSO and its acceleration

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: We investigate leave-one-out cross validation (CV) as a determinator of the weight of the penalty term in the least absolute shrinkage and selection operator (LASSO). First, on the basis of the message passing algorithm and a perturbative discussion assuming that the number of observations is sufficiently large, we provide simple formulas for approximately assessing two types of CV errors, which e… ▽ More We investigate leave-one-out cross validation (CV) as a determinator of the weight of the penalty term in the least absolute shrinkage and selection operator (LASSO). First, on the basis of the message passing algorithm and a perturbative discussion assuming that the number of observations is sufficiently large, we provide simple formulas for approximately assessing two types of CV errors, which enable us to significantly reduce the necessary cost of computation. These formulas also provide a simple connection of the CV errors to the residual sums of squares between the reconstructed and the given measurements. Second, on the basis of this finding, we analytically evaluate the CV errors when the design matrix is given as a simple random matrix in the large size limit by using the replica method. Finally, these results are compared with those of numerical simulations on finite-size systems and are confirmed to be correct. We also apply the simple formulas of the first type of CV error to an actual dataset of the supernovae. △ Less

Submitted 4 March, 2016; v1 submitted 28 December, 2015; originally announced January 2016.

Comments: 32 pages, 7 figures

arXiv:1510.02189 [pdf, ps, other]

doi 10.1088/1742-5468/2016/06/063302

Sparse approximation based on a random overcomplete basis

Authors: Yoshinori Nakanishi-Ohno, Tomoyuki Obuchi, Masato Okada, Yoshiyuki Kabashima

Abstract: We discuss a strategy of sparse approximation that is based on the use of an overcomplete basis, and evaluate its performance when a random matrix is used as this basis. A small combination of basis vectors is chosen from a given overcomplete basis, according to a given compression rate, such that they compactly represent the target data with as small a distortion as possible. As a selection metho… ▽ More We discuss a strategy of sparse approximation that is based on the use of an overcomplete basis, and evaluate its performance when a random matrix is used as this basis. A small combination of basis vectors is chosen from a given overcomplete basis, according to a given compression rate, such that they compactly represent the target data with as small a distortion as possible. As a selection method, we study the $\ell_0$- and $\ell_1$-based methods, which employ the exhaustive search and $\ell_1$-norm regularization techniques, respectively. The performance is assessed in terms of the trade-off relation between the representation distortion and the compression rate. First, we evaluate the performance analytically in the case that the methods are carried out ideally, using methods of statistical mechanics. Our result clarifies the fact that the $\ell_0$-based method greatly outperforms the $\ell_1$-based one. Second, we examine the practical performances of two well-known algorithms, orthogonal matching pursuit and approximate message passing, when they are used to execute the $\ell_0$- and $\ell_1$-based methods, respectively. Our examination shows that orthogonal matching pursuit achieves a much better performance than the exact execution of the $\ell_1$-based method, as well as approximate message passing. However, regarding the $\ell_0$-based method, there is still room to design more effective greedy algorithms than orthogonal matching pursuit. Finally, we evaluate the performances of the algorithms when they are applied to image data compression. △ Less

Submitted 2 March, 2016; v1 submitted 7 October, 2015; originally announced October 2015.

Comments: 35 pages, 11 figures

arXiv:1508.05225 [pdf, ps, other]

Role of the Finite Replica Analysis in the Mean-Field Theory of Spin Glasses

Authors: Tomoyuki Obuchi

Abstract: In this thesis, we review and examine the replica method from several viewpoints. The replica method is a mathematical technique to calculate general moments of stochastic variables. This method provides a systematic way to evaluate physical quantities and becomes one of the most important tools in the theory of spin glasses and in the related discipline including information processing tasks. I… ▽ More In this thesis, we review and examine the replica method from several viewpoints. The replica method is a mathematical technique to calculate general moments of stochastic variables. This method provides a systematic way to evaluate physical quantities and becomes one of the most important tools in the theory of spin glasses and in the related discipline including information processing tasks. In spite of the effectiveness of the replica method, it is known that several problems exist in the procedures of the method itself. The replica symmetry breaking is the central topic of those problems and is the main issue of this thesis. To elucidate this point, we review the recent progress about the replica symmetry breaking including its physical and mathematical descriptions in detail. Based on those descriptions, several spin-glass models and Ising perceptron are deeply investigated. △ Less

Submitted 21 August, 2015; originally announced August 2015.

Comments: A Thesis submitted in fulfillment of the requirements of Ph.D. in Tokyo Tech. in 2010. 148 pages. Note that evaluation of $y_{AT}$ in Section 4 contains some mistakes (the conclusion is unchanged)

arXiv:1503.02802 [pdf, ps, other]

doi 10.1007/s10955-015-1341-7

Learning probabilities from random observables in high dimensions: the maximum entropy distribution and others

Authors: Tomoyuki Obuchi, Simona Cocco, Rémi Monasson

Abstract: We consider the problem of learning a target probability distribution over a set of $N$ binary variables from the knowledge of the expectation values (with this target distribution) of $M$ observables, drawn uniformly at random. The space of all probability distributions compatible with these $M$ expectation values within some fixed accuracy, called version space, is studied. We introduce a biased… ▽ More We consider the problem of learning a target probability distribution over a set of $N$ binary variables from the knowledge of the expectation values (with this target distribution) of $M$ observables, drawn uniformly at random. The space of all probability distributions compatible with these $M$ expectation values within some fixed accuracy, called version space, is studied. We introduce a biased measure over the version space, which gives a boost increasing exponentially with the entropy of the distributions and with an arbitrary inverse `temperature' $Γ$. The choice of $Γ$ allows us to interpolate smoothly between the unbiased measure over all distributions in the version space ($Γ=0$) and the pointwise measure concentrated at the maximum entropy distribution ($Γ\to \infty$). Using the replica method we compute the volume of the version space and other quantities of interest, such as the distance $R$ between the target distribution and the center-of-mass distribution over the version space, as functions of $α=(\log M)/N$ and $Γ$ for large $N$. Phase transitions at critical values of $α$ are found, corresponding to qualitative improvements in the learning of the target distribution and to the decrease of the distance $R$. However, for fixed $α$, the distance $R$ does not vary with $Γ$, which means that the maximum entropy distribution is not closer to the target distribution than any other distribution compatible with the observable values. Our results are confirmed by Monte Carlo sampling of the version space for small system sizes ($N\le 10$). △ Less

Submitted 21 July, 2015; v1 submitted 10 March, 2015; originally announced March 2015.

Comments: 30 pages, 13 figures

arXiv:1412.7012 [pdf, ps, other]

doi 10.7566/JPSJ.85.114803

Boltzmann-Machine Learning of Prior Distributions of Binarized Natural Images

Authors: Tomoyuki Obuchi, Hirokazu Koma, Muneki Yasuda

Abstract: Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process.… ▽ More Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process. Meanwhile, in a longer spatial scale, a longer-range, although still rapidly decaying, ferromagnetic interaction commonly appears in all cases. The characteristic length scale of the interactions is universally up to approximately four lattice spacings $ξ\approx 4$. These results are derived by using the mean-field method, which effectively reduces the computational time required in a Boltzmann machine. An improved mean-field method called the Bethe approximation also gives the same results, as well as the Monte Carlo method does for small size images. These reinforce the validity of our analysis and findings. Relations to criticality, frustration, and simple-cell receptive fields are also discussed. △ Less

Submitted 23 October, 2016; v1 submitted 15 December, 2014; originally announced December 2014.

Comments: 32 pages, 33 figures

Journal ref: J. Phys. Soc. Jpn. 85 (2016) 114803

arXiv:1309.0076 [pdf, ps, other]

doi 10.1088/1742-6596/473/1/012023

Zeros of the partition function and dynamical singularities in spin-glass systems

Authors: Kazutaka Takahashi, Tomoyuki Obuchi

Abstract: We study spin-glass systems characterized by continuous occurrence of singularities. The theory of Lee-Yang zeros is used to find the singularities. By using the replica method in mean-field systems, we show that two-dimensional distributions of zeros of the partition function in a complex parameter plane are characteristic feature of random systems. The results of several models indicate that the… ▽ More We study spin-glass systems characterized by continuous occurrence of singularities. The theory of Lee-Yang zeros is used to find the singularities. By using the replica method in mean-field systems, we show that two-dimensional distributions of zeros of the partition function in a complex parameter plane are characteristic feature of random systems. The results of several models indicate that the concept of chaos in the spin-glass state is different from that of the replica symmetry breaking. We discuss that a chaotic phase at imaginary temperature is different from the spin-glass phase and is accessible by quantum dynamics in a quenching protocol. △ Less

Submitted 31 August, 2013; originally announced September 2013.

Comments: 11 pages, 6 figures, proceedings of the ICSG2013

Journal ref: J. Phys.: Conf. Ser. 473, 012023 (2013)

arXiv:1212.3804 [pdf, ps, other]

doi 10.1103/PhysRevB.87.174438

Monte Carlo simulations of the three-dimensional XY spin glass focusing on the chiral and the spin order

Authors: Tomoyuki Obuchi, Hikaru Kawamura

Abstract: The ordering of the three-dimensional isotropic {\it XY} spin glass with the nearest-neighbor random Gaussian coupling is studied by extensive Monte Carlo simulations. To investigate the ordering of the spin and the chirality, we compute several independent physical quantities including the glass order parameter, the Binder parameter, the correlation-length ratio, the overlap distribution and the… ▽ More The ordering of the three-dimensional isotropic {\it XY} spin glass with the nearest-neighbor random Gaussian coupling is studied by extensive Monte Carlo simulations. To investigate the ordering of the spin and the chirality, we compute several independent physical quantities including the glass order parameter, the Binder parameter, the correlation-length ratio, the overlap distribution and the non-self-averageness parameter, {\it etc}, for both the spin-glass (SG) and the chiral-glass (CG) degrees of freedom. Evidence of the spin-chirality decoupling, {\it i.e.}, the CG and the SG order occurring at two separated temperatures, $0<T_{SG}<T_{CG}$, is obtained from the glass order parameter, which is fully corroborated by the Binder parameter. By contrast, the CG correlation-length ratio yields a rather pathological and inconsistent result in the range of sizes we studied, which may originate from the finite-size effect associated with a significant short-length drop-off of the spatial CG correlations. Finite-size-scaling analysis yields the CG exponents $ν_{CG}=1.36^{+0.15}_{-0.37}$ and $η_{CG}=0.26^{+0.29}_{-0.26}$, and the SG exponents $ν_{SG}=1.22^{+0.26}_{-0.06}$ and $η_{SG}=-0.54^{+0.24}_{-0.52}$. The obtained exponents are close to those of the Heisenberg SG, but are largely different from those of the Ising SG. The chiral overlap distribution and the chiral Binder parameter exhibit the feature of a continuous one-step replica-symmetry breaking (1RSB), consistently with the previous reports. Such a 1RSB feature is again in common with that of the Heisenberg SG, but is different from the Ising one, which may be the cause of the difference in the CG critical properties from the Ising SG ones despite of a common $Z_2$ symmetry. △ Less

Submitted 31 May, 2013; v1 submitted 16 December, 2012; originally announced December 2012.

Comments: 15 pages, 31 figures

Journal ref: Phys. Rev. B 87, 174438 (2013)

arXiv:1208.2800 [pdf, ps, other]

doi 10.1103/PhysRevE.86.051125

Dynamical Singularities of Glassy Systems in a Quantum Quench

Authors: Tomoyuki Obuchi, Kazutaka Takahashi

Abstract: We present a prototype of behavior of glassy systems driven by quantum dynamics in a quenching protocol by analyzing the random energy model in a transverse field. We calculate several types of dynamical quantum amplitude and find a freezing transition at some critical time. The behavior is understood by the partition-function zeros in the complex temperature plane. We discuss the properties of th… ▽ More We present a prototype of behavior of glassy systems driven by quantum dynamics in a quenching protocol by analyzing the random energy model in a transverse field. We calculate several types of dynamical quantum amplitude and find a freezing transition at some critical time. The behavior is understood by the partition-function zeros in the complex temperature plane. We discuss the properties of the freezing phase as a dynamical chaotic phase, which are contrasted to those of the spin-glass phase in the static system. △ Less

Submitted 13 December, 2012; v1 submitted 14 August, 2012; originally announced August 2012.

Comments: 6 pages, 5 figures

Journal ref: Phys. Rev. E 86, 051125 (2012)

arXiv:1202.1042 [pdf, ps, other]

doi 10.1143/JPSJ.81.054003

Spin and chiral orderings of the antiferromagnetic XY model on the triangular lattice and their critical properties

Authors: Tomoyuki Obuchi, Hikaru Kawamura

Abstract: We study the antiferromagnetic {\it XY} model on a triangular lattice by extensive Monte Carlo simulations, focusing on its ordering and critical properties. Our result clearly shows that two separate transitions occur at two distinct temperatures, the one at a higher temperature is associated with a $Z_2$-symmetry breaking driven by the chirality, and the one at a lower temperature is associated… ▽ More We study the antiferromagnetic {\it XY} model on a triangular lattice by extensive Monte Carlo simulations, focusing on its ordering and critical properties. Our result clearly shows that two separate transitions occur at two distinct temperatures, the one at a higher temperature is associated with a $Z_2$-symmetry breaking driven by the chirality, and the one at a lower temperature is associated with the onset of the quasi-long-range order of the {\it XY} spin. We carefully examine the critical properties of each transition to find that the criticality of the chiral transition is consistent with the standard two-dimensional Ising universality class, whereas that of the spin transition might differ from the conventional Kosterlitz-Thouless (KT) one. The observed non-KT nature of the spin criticality is consistent with the most recent simulation result on the fully-frustrated {\it XY} model on a square lattice. △ Less

Submitted 23 February, 2012; v1 submitted 5 February, 2012; originally announced February 2012.

Comments: 10 pages, 20 figures, replaced because of a format error

arXiv:1110.0942 [pdf, ps, other]

doi 10.1088/1751-8113/45/12/125003

Partition-function zeros of spherical spin glasses and their relevance to chaos

Authors: Tomoyuki Obuchi, Kazutaka Takahashi

Abstract: We investigate partition-function zeros of the many-body interacting spherical spin glass, the so-called $p$-spin spherical model, with respect to the complex temperature in the thermodynamic limit. We use the replica method and extend the procedure of the replica symmetry breaking ansatz to be applicable in the complex-parameter case. We derive the phase diagrams in the complex-temperature plane… ▽ More We investigate partition-function zeros of the many-body interacting spherical spin glass, the so-called $p$-spin spherical model, with respect to the complex temperature in the thermodynamic limit. We use the replica method and extend the procedure of the replica symmetry breaking ansatz to be applicable in the complex-parameter case. We derive the phase diagrams in the complex-temperature plane and calculate the density of zeros in each phase. Near the imaginary axis away from the origin, there is a replica symmetric phase having a large density. On the other hand, we observe no density in the spin-glass phases, irrespective of the replica symmetry breaking. We speculate that this suggests the absence of the temperature chaos. To confirm this, we investigate the multiple many-body interacting case which is known to exhibit the chaos effect. The result shows that the density of zeros actually takes finite values in the spin-glass phase, even on the real axis. These observations indicate that the density of zeros is more closely connected to the chaos effect than the replica symmetry breaking. △ Less

Submitted 15 February, 2012; v1 submitted 5 October, 2011; originally announced October 2011.

Comments: 22 pages, 8 figures

Journal ref: J. Phys. A: Math. Theor. 45 (2012) 125003

arXiv:1011.3722 [pdf, ps, other]

doi 10.1088/1751-8113/44/8/085002

Statistical mechanical analysis of a hierarchical random code ensemble in signal processing

Authors: Tomoyuki Obuchi, Kazutaka Takahashi, Koujin Takeda

Abstract: We study a random code ensemble with a hierarchical structure, which is closely related to the generalized random energy model with discrete energy values. Based on this correspondence, we analyze the hierarchical random code ensemble by using the replica method in two situations: lossy data compression and channel coding. For both the situations, the exponents of large deviation analysis characte… ▽ More We study a random code ensemble with a hierarchical structure, which is closely related to the generalized random energy model with discrete energy values. Based on this correspondence, we analyze the hierarchical random code ensemble by using the replica method in two situations: lossy data compression and channel coding. For both the situations, the exponents of large deviation analysis characterizing the performance of the ensemble, the distortion rate of lossy data compression and the error exponent of channel coding in Gallager's formalism, are accessible by a generating function of the generalized random energy model. We discuss that the transitions of those exponents observed in the preceding work can be interpreted as phase transitions with respect to the replica number. We also show that the replica symmetry breaking plays an essential role in these transitions. △ Less

Submitted 2 February, 2011; v1 submitted 16 November, 2010; originally announced November 2010.

Comments: 24 pages, 4 figures

Journal ref: J. Phys. A: Math. Theor. 44 (2011) 085002

arXiv:1007.1531 [pdf, ps, other]

doi 10.1088/1751-8113/43/48/485004

Replica symmetry breaking, complexity and spin representation in the generalized random energy model

Authors: Tomoyuki Obuchi, Kazutaka Takahashi, Koujin Takeda

Abstract: We study the random energy model with a hierarchical structure known as the generalized random energy model (GREM). In contrast to the original analysis by the microcanonical ensemble formalism, we investigate the GREM by the canonical ensemble formalism in conjunction with the replica method. In this analysis, spin-glass-order parameters are defined for respective hierarchy level, and all possibl… ▽ More We study the random energy model with a hierarchical structure known as the generalized random energy model (GREM). In contrast to the original analysis by the microcanonical ensemble formalism, we investigate the GREM by the canonical ensemble formalism in conjunction with the replica method. In this analysis, spin-glass-order parameters are defined for respective hierarchy level, and all possible patterns of replica symmetry breaking (RSB) are taken into account. As a result, we find that the higher step RSB ansatz is useful for describing spin-glass phases in this system. For investigating the nature of the higher step RSB, we generalize the notion of complexity developed for the one-step RSB to the higher step and demonstrate how the GREM is characterized by the generalized complexity. In addition, we propose a novel mean-field spin-glass model with a hierarchical structure, which is equivalent to the GREM at a certain limit. We also show that the same hierarchical structure can be implemented to other mean-field spin models than the GREM. Such models with hierarchy exhibit phase transitions of multiple steps in common. △ Less

Submitted 26 October, 2010; v1 submitted 9 July, 2010; originally announced July 2010.

Comments: 30 pages, 11 figures; minor changes

Journal ref: J. Phys. A: Math. Theor. 43 (2010) 485004

arXiv:1004.3118 [pdf, ps, other]

doi 10.1016/j.physe.2010.07.052

Zero-Temperature Complex Replica Zeros of the $\pm J$ Ising Spin Glass on Mean-Field Systems and Beyond

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima, Hidetoshi Nishimori, Masayuki Ohzeki

Abstract: Zeros of the moment of the partition function $[Z^n]_{\bm{J}}$ with respect to complex $n$ are investigated in the zero temperature limit $β\to \infty$, $n\to 0$ keeping $y=βn \approx O(1)$. We numerically investigate the zeros of the $\pm J$ Ising spin glass models on several Cayley trees and hierarchical lattices and compare those results. In both lattices, the calculations are carried out with… ▽ More Zeros of the moment of the partition function $[Z^n]_{\bm{J}}$ with respect to complex $n$ are investigated in the zero temperature limit $β\to \infty$, $n\to 0$ keeping $y=βn \approx O(1)$. We numerically investigate the zeros of the $\pm J$ Ising spin glass models on several Cayley trees and hierarchical lattices and compare those results. In both lattices, the calculations are carried out with feasible computational costs by using recursion relations originated from the structures of those lattices. The results for Cayley trees show that a sequence of the zeros approaches the real axis of $y$ implying that a certain type of analyticity breaking actually occurs, although it is irrelevant for any known replica symmetry breaking. The result of hierarchical lattices also shows the presence of analyticity breaking, even in the two dimensional case in which there is no finite-temperature spin-glass transition, which implies the existence of the zero-temperature phase transition in the system. A notable tendency of hierarchical lattices is that the zeros spread in a wide region of the complex $y$ plane in comparison with the case of Cayley trees, which may reflect the difference between the mean-field and finite-dimensional systems. △ Less

Submitted 19 April, 2010; originally announced April 2010.

Comments: 4 pages, 4 figures

arXiv:1001.4873 [pdf, ps, other]

doi 10.1088/1751-8113/43/28/285002

Distribution of partition function zeros of the $\pm J$ model on the Bethe lattice

Authors: Yoshiki Matsuda, Markus Mueller, Hidetoshi Nishimori, Tomoyuki Obuchi, Antonello Scardicchio

Abstract: The distribution of partition function zeros is studied for the $\pm J$ model of spin glasses on the Bethe lattice. We find a relation between the distribution of complex cavity fields and the density of zeros, which enables us to obtain the density of zeros for the infinite system size by using the cavity method. The phase boundaries thus derived from the location of the zeros are consistent with… ▽ More The distribution of partition function zeros is studied for the $\pm J$ model of spin glasses on the Bethe lattice. We find a relation between the distribution of complex cavity fields and the density of zeros, which enables us to obtain the density of zeros for the infinite system size by using the cavity method. The phase boundaries thus derived from the location of the zeros are consistent with the results of direct analytical calculations. This is the first example in which the spin glass transition is related to the distribution of zeros directly in the thermodynamical limit. We clarify how the spin glass transition is characterized by the zeros of the partition function. It is also shown that in the spin glass phase a continuous distribution of singularities touches the axes of real field and temperature. △ Less

Submitted 15 June, 2010; v1 submitted 27 January, 2010; originally announced January 2010.

Comments: 23 pages, 12 figures

Journal ref: J. Phys. A: Math. Theor. 43 (2010) 285002

arXiv:0910.2281 [pdf, ps, other]

doi 10.1088/1742-5468/2009/12/P12014

Weight space structure and analysis using a finite replica number in the Ising perceptron

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima

Abstract: The weight space of the Ising perceptron in which a set of random patterns is stored is examined using the generating function of the partition function $φ(n)=(1/N)\log [Z^n]$ as the dimension of the weight vector $N$ tends to infinity, where $Z$ is the partition function and $[ ... ]$ represents the configurational average. We utilize $φ(n)$ for two purposes, depending on the value of the ratio… ▽ More The weight space of the Ising perceptron in which a set of random patterns is stored is examined using the generating function of the partition function $φ(n)=(1/N)\log [Z^n]$ as the dimension of the weight vector $N$ tends to infinity, where $Z$ is the partition function and $[ ... ]$ represents the configurational average. We utilize $φ(n)$ for two purposes, depending on the value of the ratio $α=M/N$, where $M$ is the number of random patterns. For $α< α_{\rm s}=0.833 ...$, we employ $φ(n)$, in conjunction with Parisi's one-step replica symmetry breaking scheme in the limit of $n \to 0$, to evaluate the complexity that characterizes the number of disjoint clusters of weights that are compatible with a given set of random patterns, which indicates that, in typical cases, the weight space is equally dominated by a single large cluster of exponentially many weights and exponentially many small clusters of a single weight. For $α> α_{\rm s}$, on the other hand, $φ(n)$ is used to assess the rate function of a small probability that a given set of random patterns is atypically separable by the Ising perceptrons. We show that the analyticity of the rate function changes at $α= α_{\rm GD}=1.245 ... $, which implies that the dominant configuration of the atypically separable patterns exhibits a phase transition at this critical ratio. Extensive numerical experiments are conducted to support the theoretical predictions. △ Less

Submitted 20 November, 2009; v1 submitted 13 October, 2009; originally announced October 2009.

Comments: 21 pages, 11 figures, Added references, some comments, and corrections to minor errors

Journal ref: J. Stat. Mech. (2009) P12014

arXiv:0809.2635 [pdf, ps, other]

doi 10.1088/1751-8113/42/7/075004

Complex Replica Zeros of $\pm J$ Ising Spin Glass at Zero Temperature

Authors: Tomoyuki Obuchi, Yoshiyuki Kabashima, Hidetoshi Nishimori

Abstract: Zeros of the $n$th moment of the partition function $[Z^n]$ are investigated in a vanishing temperature limit $β\to \infty$, $n \to 0$ keeping $y=βn \sim O(1)$. In this limit, the moment parameterized by $y$ characterizes the distribution of the ground-state energy. We numerically investigate the zeros for $\pm J$ Ising spin glass models with several ladder and tree systems, which can be carried… ▽ More Zeros of the $n$th moment of the partition function $[Z^n]$ are investigated in a vanishing temperature limit $β\to \infty$, $n \to 0$ keeping $y=βn \sim O(1)$. In this limit, the moment parameterized by $y$ characterizes the distribution of the ground-state energy. We numerically investigate the zeros for $\pm J$ Ising spin glass models with several ladder and tree systems, which can be carried out with a feasible computational cost by a symbolic operation based on the Bethe--Peierls method. For several tree systems we find that the zeros tend to approach the real axis of $y$ in the thermodynamic limit implying that the moment cannot be described by a single analytic function of $y$ as the system size tends to infinity, which may be associated with breaking of the replica symmetry. However, examination of the analytical properties of the moment function and assessment of the spin-glass susceptibility indicate that the breaking of analyticity is relevant to neither one-step or full replica symmetry breaking. △ Less

Submitted 14 November, 2008; v1 submitted 16 September, 2008; originally announced September 2008.

Comments: 27 pages, 13 figures. Added references, some comments, and corrections to minor errors

Journal ref: J. Phys. A: Math. Theor. 42 (2009) 075004

arXiv:cond-mat/0611168 [pdf, ps, other]

doi 10.1143/JPSJ.76.054002

Phase diagram of the $p$-spin-interacting spin glass with ferromagnetic bias and a transverse field in the infinite-$p$ limit

Authors: Tomoyuki Obuchi, Hidetoshi Nishimori, David Sherrington

Abstract: The phase diagram of the $p$-spin-interacting spin glass model in a transverse field is investigated in the limit $p \to \infty$ under the presence of ferromagnetic bias. Using the replica method and the static approximation, we show that the phase diagram consists of four phases: Quantum paramagnetic, classical paramagnetic, ferromagnetic, and spin-glass phases. We also show that the static app… ▽ More The phase diagram of the $p$-spin-interacting spin glass model in a transverse field is investigated in the limit $p \to \infty$ under the presence of ferromagnetic bias. Using the replica method and the static approximation, we show that the phase diagram consists of four phases: Quantum paramagnetic, classical paramagnetic, ferromagnetic, and spin-glass phases. We also show that the static approximation is valid in the ferromagnetic phase in the limit $p \to \infty$ by using the large-$p$ expansion. Since the same approximation is already known to be valid in other phases, we conclude that the obtained phase diagram is exact. △ Less

Submitted 22 March, 2007; v1 submitted 6 November, 2006; originally announced November 2006.

Comments: 16 pages, 4 figures. another additional author, some amendments

Journal ref: J. Phys. Soc. Jpn. 76 (2007) 054002

Showing 1–44 of 44 results for author: Obuchi, T