Search | arXiv e-print repository

Knots in random neural networks

Authors: Kevin K. Chen, Anthony C. Gamst, Alden K. Walker

Abstract: The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parame… ▽ More The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parameters do not immediately overfit. We analyze the properties of random scalar-input feed-forward rectified linear unit architectures, which are random linear splines. With weights and biases sampled from certain common distributions, empirical tests show that the number of knots in the spline produced by the network is equal to the number of neurons, to very close approximation. We describe our progress towards a completely analytic explanation of this phenomenon. In particular, we show that random single-layer neural networks are equivalent to integrated random walks with variable step sizes. That each neuron produces one knot on average is equivalent to the associated integrated random walk having one zero crossing on average. We explore how properties of the integrated random walk, including the step sizes and initial conditions, affect the number of crossings. The number of knots in random neural networks can be related to the behavior of extreme learning machines, but it also establishes a prior preventing optimizers from immediately overfitting to noisy training data. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Presented at the Workshop on Bayesian Deep Learning, NIPS 2016, Barcelona, Spain

arXiv:1710.08952 [pdf, other]

Estimating the Operating Characteristics of Ensemble Methods

Authors: Anthony Gamst, Jay-Calvin Reyes, Alden Walker

Abstract: In this paper we present a technique for using the bootstrap to estimate the operating characteristics and their variability for certain types of ensemble methods. Bootstrapping a model can require a huge amount of work if the training data set is large. Fortunately in many cases the technique lets us determine the effect of infinite resampling without actually refitting a single model. We apply t… ▽ More In this paper we present a technique for using the bootstrap to estimate the operating characteristics and their variability for certain types of ensemble methods. Bootstrapping a model can require a huge amount of work if the training data set is large. Fortunately in many cases the technique lets us determine the effect of infinite resampling without actually refitting a single model. We apply the technique to the study of meta-parameter selection for random forests. We demonstrate that alternatives to bootstrap aggregation and to considering \sqrt{d} features to split each node, where d is the number of features, can produce improvements in predictive accuracy. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: 17 pages, 8 figures

arXiv:1706.07101 [pdf, other]

The energy landscape of a simple neural network

Authors: Anthony Collins Gamst, Alden Walker

Abstract: We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models. We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models. △ Less

Submitted 21 June, 2017; originally announced June 2017.

Comments: 17 pages, 15 figures

arXiv:1611.09444 [pdf, other]

The empirical size of trained neural networks

Authors: Kevin K. Chen, Anthony Gamst, Alden Walker

Abstract: ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count wo… ▽ More ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count would suggest and can impart odd features to the trained network. However, we also show the forced simplicity is beneficial and, indeed, critical for the wide success of these networks. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: 6 pages, 5 figures

arXiv:1602.07559 [pdf, other]

Regression of ranked responses when raw responses are censored

Authors: Michael C. Donohue, Anthony C. Gamst, Robert A. Rissman, Ian Abramson

Abstract: We discuss semiparametric regression when only the ranks of responses are observed. The model is $Y_i = F (\mathbf{x}_i'{\boldsymbolβ}_0 + \varepsilon_i)$, where $Y_i$ is the unobserved response, $F$ is a monotone increasing function, $\mathbf{x}_i$ is a known $p-$vector of covariates, ${\boldsymbolβ}_0$ is an unknown $p$-vector of interest, and $\varepsilon_i$ is an error term independent of… ▽ More We discuss semiparametric regression when only the ranks of responses are observed. The model is $Y_i = F (\mathbf{x}_i'{\boldsymbolβ}_0 + \varepsilon_i)$, where $Y_i$ is the unobserved response, $F$ is a monotone increasing function, $\mathbf{x}_i$ is a known $p-$vector of covariates, ${\boldsymbolβ}_0$ is an unknown $p$-vector of interest, and $\varepsilon_i$ is an error term independent of $\mathbf{x}_i$. We observe $\{(\mathbf{x}_i,R_n(Y_i)) : i = 1,\ldots ,n\}$, where $R_n$ is the ordinal rank function. We explore a novel estimator under Gaussian assumptions. We discuss the literature, apply the method to an Alzheimer's disease biomarker, conduct simulation studies, and prove consistency and asymptotic normality. △ Less

Submitted 24 February, 2016; originally announced February 2016.

Comments: 33 pages, 6 figures

Showing 1–5 of 5 results for author: Gamst, A