-
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
Authors:
Chuang Ma,
Tomoyuki Obuchi,
Toshiyuki Tanaka
Abstract:
A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain…
▽ More
A phenomenon known as ''Neural Collapse (NC)'' in deep classification tasks, in which the penultimate-layer features and the final classifiers exhibit an extremely simple geometric structure, has recently attracted considerable attention, with the expectation that it can deepen our understanding of how deep neural networks behave. The Unconstrained Feature Model (UFM) has been proposed to explain NC theoretically, and there emerges a growing body of work that extends NC to tasks other than classification and leverages it for practical applications. In this study, we investigate whether a similar phenomenon arises in deep Ordinal Regression (OR) tasks, via combining the cumulative link model for OR and UFM. We show that a phenomenon we call Ordinal Neural Collapse (ONC) indeed emerges and is characterized by the following three properties: (ONC1) all optimal features in the same class collapse to their within-class mean when regularization is applied; (ONC2) these class means align with the classifier, meaning that they collapse onto a one-dimensional subspace; (ONC3) the optimal latent variables (corresponding to logits or preactivations in classification tasks) are aligned according to the class order, and in particular, in the zero-regularization limit, a highly local and simple geometric relationship emerges between the latent variables and the threshold values. We prove these properties analytically within the UFM framework with fixed threshold values and corroborate them empirically across a variety of datasets. We also discuss how these insights can be leveraged in OR, highlighting the use of fixed thresholds.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
Analysis of High-dimensional Gaussian Labeled-unlabeled Mixture Model via Message-passing Algorithm
Authors:
Xiaosi Gu,
Tomoyuki Obuchi
Abstract:
Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by m…
▽ More
Semi-supervised learning (SSL) is a machine learning methodology that leverages unlabeled data in conjunction with a limited amount of labeled data. Although SSL has been applied in various applications and its effectiveness has been empirically demonstrated, it is still not fully understood when and why SSL performs well. Some existing theoretical studies have attempted to address this issue by modeling classification problems using the so-called Gaussian Mixture Model (GMM). These studies provide notable and insightful interpretations. However, their analyses are focused on specific purposes, and a thorough investigation of the properties of GMM in the context of SSL has been lacking. In this paper, we conduct such a detailed analysis of the properties of the high-dimensional GMM for binary classification in the SSL setting. To this end, we employ the approximate message passing and state evolution methods, which are widely used in high-dimensional settings and originate from statistical mechanics. We deal with two estimation approaches: the Bayesian one and the $\ell_2$-regularized maximum likelihood estimation (RMLE). We conduct a comprehensive comparison between these two approaches, examining aspects such as the global phase diagram, estimation error for the parameters, and prediction error for the labels. A specific comparison is made between the Bayes-optimal (BO) estimator and RMLE, as the BO setting provides optimal estimation performance and is ideal as a benchmark. Our analysis shows that with appropriate regularizations, RMLE can achieve near-optimal performance in terms of both the estimation error and prediction error, especially when there is a large amount of unlabeled data. These results demonstrate that the $\ell_2$ regularization term plays an effective role in estimation and prediction in SSL approaches.
△ Less
Submitted 12 March, 2025; v1 submitted 29 November, 2024;
originally announced November 2024.
-
Transfer Learning in $\ell_1$ Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis
Authors:
Koki Okajima,
Tomoyuki Obuchi
Abstract:
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Such methods have been adopted in the context of high-dimensional sparse regression, and some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso are such examples. These algorithms require the statistician to select hyperparameter…
▽ More
Transfer learning techniques aim to leverage information from multiple related datasets to enhance prediction quality against a target dataset. Such methods have been adopted in the context of high-dimensional sparse regression, and some Lasso-based algorithms have been invented: Trans-Lasso and Pretraining Lasso are such examples. These algorithms require the statistician to select hyperparameters that control the extent and type of information transfer from related datasets. However, selection strategies for these hyperparameters, as well as the impact of these choices on the algorithm's performance, have been largely unexplored. To address this, we conduct a thorough, precise study of the algorithm in a high-dimensional setting via an asymptotic analysis using the replica method. Our approach reveals a surprisingly simple behavior of the algorithm: Ignoring one of the two types of information transferred to the fine-tuning stage has little effect on generalization performance, implying that efforts for hyperparameter selection can be significantly reduced. Our theoretical findings are also empirically supported by applications on real-world and semi-artificial datasets using the IMDb and MNIST datasets, respectively.
△ Less
Submitted 30 January, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
When resampling/reweighting improves feature learning in imbalanced classification?: A toy-model study
Authors:
Tomoyuki Obuchi,
Toshiyuki Tanaka
Abstract:
A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mec…
▽ More
A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of the input space is taken while keeping the ratio of the dataset size against the input dimension finite and the non-rigorous replica method from statistical mechanics is employed. The result shows that there exists a case in which the no resampling/reweighting situation gives the best feature learning performance irrespectively of the choice of losses or classifiers, supporting recent findings in Cao et al. (2019); Kang et al. (2019). It is also revealed that the key of the result is the symmetry of the loss and the problem setting. Inspired by this, we propose a further simplified model exhibiting the same property in the multiclass setting. These clarify when the class-wise resampling/reweighting becomes effective in imbalanced classification.
△ Less
Submitted 22 April, 2025; v1 submitted 9 September, 2024;
originally announced September 2024.
-
On Model Selection Consistency of Lasso for High-Dimensional Ising Models
Authors:
Xiangming Meng,
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
We theoretically analyze the model selection consistency of least absolute shrinkage and selection operator (Lasso), both with and without post-thresholding, for high-dimensional Ising models. For random regular (RR) graphs of size $p$ with regular node degree $d$ and uniform couplings $θ_0$, it is rigorously proved that Lasso \textit{without post-thresholding} is model selection consistent in the…
▽ More
We theoretically analyze the model selection consistency of least absolute shrinkage and selection operator (Lasso), both with and without post-thresholding, for high-dimensional Ising models. For random regular (RR) graphs of size $p$ with regular node degree $d$ and uniform couplings $θ_0$, it is rigorously proved that Lasso \textit{without post-thresholding} is model selection consistent in the whole paramagnetic phase with the same order of sample complexity $n=Ω{(d^3\log{p})}$ as that of $\ell_1$-regularized logistic regression ($\ell_1$-LogR). This result is consistent with the conjecture in Meng, Obuchi, and Kabashima 2021 using the non-rigorous replica method from statistical physics and thus complements it with a rigorous proof. For general tree-like graphs, it is demonstrated that the same result as RR graphs can be obtained under mild assumptions of the dependency condition and incoherence condition. Moreover, we provide a rigorous proof of the model selection consistency of Lasso with post-thresholding for general tree-like graphs in the paramagnetic phase without further assumptions on the dependency and incoherence conditions. Experimental results agree well with our theoretical analysis.
△ Less
Submitted 17 February, 2023; v1 submitted 16 October, 2021;
originally announced October 2021.
-
Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis
Authors:
Xiangming Meng,
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
We theoretically analyze the typical learning performance of $\ell_{1}$-regularized linear regression ($\ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $\ell_1$-LinR is obtained. Remarkably, despite the model misspecification, $\ell_1$-L…
▽ More
We theoretically analyze the typical learning performance of $\ell_{1}$-regularized linear regression ($\ell_1$-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of $\ell_1$-LinR is obtained. Remarkably, despite the model misspecification, $\ell_1$-LinR is model selection consistent with the same order of sample complexity as $\ell_{1}$-regularized logistic regression ($\ell_1$-LogR), i.e., $M=\mathcal{O}\left(\log N\right)$, where $N$ is the number of variables of the Ising model. Moreover, we provide an efficient method to accurately predict the non-asymptotic behavior of $\ell_1$-LinR for moderate $M, N$, such as precision and recall. Simulations show a fairly good agreement between theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper mainly focuses on $\ell_1$-LinR, our method is readily applicable for precisely characterizing the typical learning performances of a wide class of $\ell_{1}$-regularized $M$-estimators including $\ell_1$-LogR and interaction screening.
△ Less
Submitted 1 November, 2021; v1 submitted 7 February, 2021;
originally announced February 2021.
-
Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized Linear Estimator
Authors:
Xiangming Meng,
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under t…
▽ More
The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins $N$ is smaller than the dataset size $M$, in the thermodynamic limit $N\to \infty$. Further, to access the underdetermined region $M < N$, we examine the effect of the $\ell_2$ regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a two-stage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0<M/N<1$. Results of extensive numerical experiments support these findings.
△ Less
Submitted 23 November, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.
-
Reconstructing Sparse Signals via Greedy Monte-Carlo Search
Authors:
Kao Hayashi,
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorith…
▽ More
We propose a Monte-Carlo-based method for reconstructing sparse signals in the formulation of sparse linear regression in a high-dimensional setting. The basic idea of this algorithm is to explicitly select variables or covariates to represent a given data vector or responses and accept randomly generated updates of that selection if and only if the energy or cost function decreases. This algorithm is called the greedy Monte-Carlo (GMC) search algorithm. Its performance is examined via numerical experiments, which suggests that in the noiseless case, GMC can achieve perfect reconstruction in undersampling situations of a reasonable level: it can outperform the $\ell_1$ relaxation but does not reach the algorithmic limit of MC-based methods theoretically clarified by an earlier analysis. The necessary computational time is also examined and compared with that of an algorithm using simulated annealing. Additionally, experiments on the noisy case are conducted on synthetic datasets and on a real-world dataset, supporting the practicality of GMC.
△ Less
Submitted 29 January, 2021; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Learning performance in inverse Ising problems with sparse teacher couplings
Authors:
Alia Abbara,
Yoshiyuki Kabashima,
Tomoyuki Obuchi,
Yingying Xu
Abstract:
We investigate the learning performance of the pseudolikelihood maximization method for inverse Ising problems. In the teacher-student scenario under the assumption that the teacher's couplings are sparse and the student does not know the graphical structure, the learning curve and order parameters are assessed in the typical case using the replica and cavity methods from statistical mechanics. Ou…
▽ More
We investigate the learning performance of the pseudolikelihood maximization method for inverse Ising problems. In the teacher-student scenario under the assumption that the teacher's couplings are sparse and the student does not know the graphical structure, the learning curve and order parameters are assessed in the typical case using the replica and cavity methods from statistical mechanics. Our formulation is also applicable to a certain class of cost functions having locality; the standard likelihood does not belong to that class. The derived analytical formulas indicate that the perfect inference of the presence/absence of the teacher's couplings is possible in the thermodynamic limit taking the number of spins $N$ as infinity while keeping the dataset size $M$ proportional to $N$, as long as $α=M/N > 2$. Meanwhile, the formulas also show that the estimated coupling values corresponding to the truly existing ones in the teacher tend to be overestimated in the absolute value, manifesting the presence of estimation bias. These results are considered to be exact in the thermodynamic limit on locally tree-like networks, such as the regular random or Erdős--Rényi graphs. Numerical simulation results fully support the theoretical predictions. Additional biases in the estimators on loopy graphs are also discussed.
△ Less
Submitted 1 May, 2020; v1 submitted 24 December, 2019;
originally announced December 2019.
-
Empirical Bayes Method for Boltzmann Machines
Authors:
Muneki Yasuda,
Tomoyuki Obuchi
Abstract:
In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empiric…
▽ More
In this study, we consider an empirical Bayes method for Boltzmann machines and propose an algorithm for it. The empirical Bayes method allows estimation of the values of the hyperparameters of the Boltzmann machine by maximizing a specific likelihood function referred to as the empirical Bayes likelihood function in this study. However, the maximization is computationally hard because the empirical Bayes likelihood function involves intractable integrations of the partition function. The proposed algorithm avoids this computational problem by using the replica method and the Plefka expansion. Our method does not require any iterative procedures and is quite simple and fast, though it introduces a bias to the estimate, which exhibits an unnatural behavior with respect to the size of the dataset. This peculiar behavior is supposed to be due to the approximate treatment by the Plefka expansion. A possible extension to overcome this behavior is also discussed.
△ Less
Submitted 7 September, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
Cross validation in sparse linear regression with piecewise continuous nonconvex penalties and its acceleration
Authors:
Tomoyuki Obuchi,
Ayaka Sakata
Abstract:
We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the SCAD penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each…
▽ More
We investigate the signal reconstruction performance of sparse linear regression in the presence of noise when piecewise continuous nonconvex penalties are used. Among such penalties, we focus on the SCAD penalty. The contributions of this study are three-fold: We first present a theoretical analysis of a typical reconstruction performance, using the replica method, under the assumption that each component of the design matrix is given as an independent and identically distributed (i.i.d.) Gaussian variable. This clarifies the superiority of the SCAD estimator compared with $\ell_1$ in a wide parameter range, although the nonconvex nature of the penalty tends to lead to solution multiplicity in certain regions. This multiplicity is shown to be connected to replica symmetry breaking in the spin-glass theory. We also show that the global minimum of the mean square error between the estimator and the true signal is located in the replica symmetric phase. Second, we develop an approximate formula efficiently computing the cross-validation error without actually conducting the cross-validation, which is also applicable to the non-i.i.d. design matrices. It is shown that this formula is only applicable to the unique solution region and tends to be unstable in the multiple solution region. We implement instability detection procedures, which allows the approximate formula to stand alone and resultantly enables us to draw phase diagrams for any specific dataset. Third, we propose an annealing procedure, called nonconvexity annealing, to obtain the solution path efficiently. Numerical simulations are conducted on simulated datasets to examine these results to verify the theoretical results consistency and the approximate formula efficiency. Another numerical experiment on a real-world dataset is conducted; its results are consistent with those of earlier studies using the $\ell_0$ formulation.
△ Less
Submitted 25 December, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Perfect reconstruction of sparse signals with piecewise continuous nonconvex penalties and nonconvexity control
Authors:
Ayaka Sakata,
Tomoyuki Obuchi
Abstract:
We consider compressed sensing formulated as a minimization problem of nonconvex sparse penalties, Smoothly Clipped Absolute deviation (SCAD) and Minimax Concave Penalty (MCP). The nonconvexity of these penalties is controlled by nonconvexity parameters, and L1 penalty is contained as a limit with respect to these parameters. The analytically derived reconstruction limit overcomes that of L1 and t…
▽ More
We consider compressed sensing formulated as a minimization problem of nonconvex sparse penalties, Smoothly Clipped Absolute deviation (SCAD) and Minimax Concave Penalty (MCP). The nonconvexity of these penalties is controlled by nonconvexity parameters, and L1 penalty is contained as a limit with respect to these parameters. The analytically derived reconstruction limit overcomes that of L1 and the algorithmic limit in the Bayes-optimal setting, when the nonconvexity parameters have suitable values. However, for small nonconvexity parameters, where the reconstruction of the relatively dense signals is theoretically guaranteed, the corresponding approximate message passing (AMP) cannot achieve perfect reconstruction. We identify that the shrinks in the basin of attraction to the perfect reconstruction causes the discrepancy between the AMP and corresponding theory using state evolution. A part of the discrepancy is resolved by introducing the control of the nonconvexity parameters to guide the AMP trajectory to the basin of the attraction.
△ Less
Submitted 5 June, 2021; v1 submitted 20 February, 2019;
originally announced February 2019.
-
Mean-field theory of graph neural networks in graph partitioning
Authors:
Tatsuro Kawamoto,
Masashi Tsubaki,
Tomoyuki Obuchi
Abstract:
A theoretical performance analysis of the graph neural network (GNN) is presented. For classification tasks, the neural network approach has the advantage in terms of flexibility that it can be employed in a data-driven manner, whereas Bayesian inference requires the assumption of a specific model. A fundamental question is then whether GNN has a high accuracy in addition to this flexibility. More…
▽ More
A theoretical performance analysis of the graph neural network (GNN) is presented. For classification tasks, the neural network approach has the advantage in terms of flexibility that it can be employed in a data-driven manner, whereas Bayesian inference requires the assumption of a specific model. A fundamental question is then whether GNN has a high accuracy in addition to this flexibility. Moreover, whether the achieved performance is predominately a result of the backpropagation or the architecture itself is a matter of considerable interest. To gain a better insight into these questions, a mean-field theory of a minimal GNN architecture is developed for the graph partitioning problem. This demonstrates a good agreement with numerical experiments.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
Statistical mechanical analysis of sparse linear regression as a variable selection problem
Authors:
Tomoyuki Obuchi,
Yoshinori Nakanishi-Ohno,
Masato Okada,
Yoshiyuki Kabashima
Abstract:
An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value…
▽ More
An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value of fit error to given data which is assumed to be generated from a linear process using the design matrix. This yields the typical achievable limit of the fit error when solving a representative $\ell_0$ problem and includes the presence of unfavourable phase transitions preventing local search algorithms from reaching the minimum-error configuration. The associated phase diagrams are presented. A noteworthy outcome of the phase diagrams is that there exists a wide parameter region where any phase transition is absent from the high temperature to the lowest temperature at which the minimum-error configuration or the ground state is reached. This implies that certain local search algorithms can find the ground state with moderate computational costs in that region. Another noteworthy result is the presence of the random first-order transition in the strong noise case. The theoretical evaluation of the entropy is confirmed by extensive numerical methods using the exchange Monte Carlo and the multi-histogram methods. Another numerical test based on a metaheuristic optimisation algorithm called simulated annealing is conducted, which well supports the theoretical predictions on the local search algorithms. In the successful region with no phase transition, the computational cost of the simulated annealing to reach the ground state is estimated as the third order polynomial of the model dimensionality.
△ Less
Submitted 10 September, 2018; v1 submitted 29 May, 2018;
originally announced May 2018.
-
Semi-Analytic Resampling in Lasso
Authors:
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The…
▽ More
An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The proposed method is based on a message passing type algorithm, and its fast convergence is guaranteed by the state evolution analysis, when covariates are provided as zero-mean independently and identically distributed Gaussian random variables. It is employed to implement bootstrapped Lasso (Bolasso) and stability selection, both of which are variable selection methods using resampling in conjunction with Lasso, and resolves their disadvantage regarding computational cost. To examine approximation accuracy and efficiency, numerical experiments were carried out using simulated datasets. Moreover, an application to a real-world dataset, the wine quality dataset, is presented. To process such real-world datasets, an objective criterion for determining the relevance of selected variables is also introduced by the addition of noise variables and resampling.
△ Less
Submitted 10 December, 2018; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Accelerating Cross-Validation in Multinomial Logistic Regression with $\ell_1$-Regularization
Authors:
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the…
▽ More
We develop an approximate formula for evaluating a cross-validation estimator of predictive likelihood for multinomial logistic regression regularized by an $\ell_1$-norm. This allows us to avoid repeated optimizations required for literally conducting cross-validation; hence, the computational time can be significantly reduced. The formula is derived through a perturbative approach employing the largeness of the data size and the model dimensionality. An extension to the elastic net regularization is also addressed. The usefulness of the approximate formula is demonstrated on simulated data and the ISOLET dataset from the UCI machine learning repository.
△ Less
Submitted 18 September, 2018; v1 submitted 15 November, 2017;
originally announced November 2017.
-
Accelerating cross-validation with total variation and its application to super-resolution imaging
Authors:
Tomoyuki Obuchi,
Shiro Ikeda,
Kazunori Akiyama,
Yoshiyuki Kabashima
Abstract:
We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by $\ell_1$-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality…
▽ More
We develop an approximation formula for the cross-validation error (CVE) of a sparse linear regression penalized by $\ell_1$-norm and total variation terms, which is based on a perturbative expansion utilizing the largeness of both the data dimensionality and the model. The developed formula allows us to reduce the necessary computational cost of the CVE evaluation significantly. The practicality of the formula is tested through application to simulated black-hole image reconstruction on the event-horizon scale with super resolution. The results demonstrate that our approximation reproduces the CVE values obtained via literally conducted cross-validation with reasonably good precision.
△ Less
Submitted 20 November, 2017; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Approximate cross-validation formula for Bayesian linear regression
Authors:
Yoshiyuki Kabashima,
Tomoyuki Obuchi,
Makoto Uemura
Abstract:
Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size grows. To resolve this difficulty in the case of Bayesian linear regression, we develop a formula for evaluating the leave-one-out CV error approximately without a…
▽ More
Cross-validation (CV) is a technique for evaluating the ability of statistical models/learning systems based on a given data set. Despite its wide applicability, the rather heavy computational cost can prevent its use as the system size grows. To resolve this difficulty in the case of Bayesian linear regression, we develop a formula for evaluating the leave-one-out CV error approximately without actually performing CV. The usefulness of the developed formula is tested by statistical mechanical analysis for a synthetic model. This is confirmed by application to a real-world supernova data set as well.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
Sampling approach to sparse approximation problem: determining degrees of freedom by simulated annealing
Authors:
Tomoyuki Obuchi,
Yoshiyuki Kabashima
Abstract:
The approximation of a high-dimensional vector by a small combination of column vectors selected from a fixed matrix has been actively debated in several different disciplines. In this paper, a sampling approach based on the Monte Carlo method is presented as an efficient solver for such problems. Especially, the use of simulated annealing (SA), a metaheuristic optimization algorithm, for determin…
▽ More
The approximation of a high-dimensional vector by a small combination of column vectors selected from a fixed matrix has been actively debated in several different disciplines. In this paper, a sampling approach based on the Monte Carlo method is presented as an efficient solver for such problems. Especially, the use of simulated annealing (SA), a metaheuristic optimization algorithm, for determining degrees of freedom (the number of used columns) by cross validation is focused on and tested. Test on a synthetic model indicates that our SA-based approach can find a nearly optimal solution for the approximation problem and, when combined with the CV framework, it can optimize the generalization ability. Its utility is also confirmed by application to a real-world supernova data set.
△ Less
Submitted 4 October, 2016; v1 submitted 4 March, 2016;
originally announced March 2016.
-
Learning probabilities from random observables in high dimensions: the maximum entropy distribution and others
Authors:
Tomoyuki Obuchi,
Simona Cocco,
Rémi Monasson
Abstract:
We consider the problem of learning a target probability distribution over a set of $N$ binary variables from the knowledge of the expectation values (with this target distribution) of $M$ observables, drawn uniformly at random. The space of all probability distributions compatible with these $M$ expectation values within some fixed accuracy, called version space, is studied. We introduce a biased…
▽ More
We consider the problem of learning a target probability distribution over a set of $N$ binary variables from the knowledge of the expectation values (with this target distribution) of $M$ observables, drawn uniformly at random. The space of all probability distributions compatible with these $M$ expectation values within some fixed accuracy, called version space, is studied. We introduce a biased measure over the version space, which gives a boost increasing exponentially with the entropy of the distributions and with an arbitrary inverse `temperature' $Γ$. The choice of $Γ$ allows us to interpolate smoothly between the unbiased measure over all distributions in the version space ($Γ=0$) and the pointwise measure concentrated at the maximum entropy distribution ($Γ\to \infty$). Using the replica method we compute the volume of the version space and other quantities of interest, such as the distance $R$ between the target distribution and the center-of-mass distribution over the version space, as functions of $α=(\log M)/N$ and $Γ$ for large $N$. Phase transitions at critical values of $α$ are found, corresponding to qualitative improvements in the learning of the target distribution and to the decrease of the distance $R$. However, for fixed $α$, the distance $R$ does not vary with $Γ$, which means that the maximum entropy distribution is not closer to the target distribution than any other distribution compatible with the observable values. Our results are confirmed by Monte Carlo sampling of the version space for small system sizes ($N\le 10$).
△ Less
Submitted 21 July, 2015; v1 submitted 10 March, 2015;
originally announced March 2015.
-
Boltzmann-Machine Learning of Prior Distributions of Binarized Natural Images
Authors:
Tomoyuki Obuchi,
Hirokazu Koma,
Muneki Yasuda
Abstract:
Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process.…
▽ More
Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process. Meanwhile, in a longer spatial scale, a longer-range, although still rapidly decaying, ferromagnetic interaction commonly appears in all cases. The characteristic length scale of the interactions is universally up to approximately four lattice spacings $ξ\approx 4$. These results are derived by using the mean-field method, which effectively reduces the computational time required in a Boltzmann machine. An improved mean-field method called the Bethe approximation also gives the same results, as well as the Monte Carlo method does for small size images. These reinforce the validity of our analysis and findings. Relations to criticality, frustration, and simple-cell receptive fields are also discussed.
△ Less
Submitted 23 October, 2016; v1 submitted 15 December, 2014;
originally announced December 2014.