-
Knots in random neural networks
Authors:
Kevin K. Chen,
Anthony C. Gamst,
Alden K. Walker
Abstract:
The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parame…
▽ More
The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parameters do not immediately overfit. We analyze the properties of random scalar-input feed-forward rectified linear unit architectures, which are random linear splines. With weights and biases sampled from certain common distributions, empirical tests show that the number of knots in the spline produced by the network is equal to the number of neurons, to very close approximation. We describe our progress towards a completely analytic explanation of this phenomenon. In particular, we show that random single-layer neural networks are equivalent to integrated random walks with variable step sizes. That each neuron produces one knot on average is equivalent to the associated integrated random walk having one zero crossing on average. We explore how properties of the integrated random walk, including the step sizes and initial conditions, affect the number of crossings. The number of knots in random neural networks can be related to the behavior of extreme learning machines, but it also establishes a prior preventing optimizers from immediately overfitting to noisy training data.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Estimating the Operating Characteristics of Ensemble Methods
Authors:
Anthony Gamst,
Jay-Calvin Reyes,
Alden Walker
Abstract:
In this paper we present a technique for using the bootstrap to estimate the operating characteristics and their variability for certain types of ensemble methods. Bootstrapping a model can require a huge amount of work if the training data set is large. Fortunately in many cases the technique lets us determine the effect of infinite resampling without actually refitting a single model. We apply t…
▽ More
In this paper we present a technique for using the bootstrap to estimate the operating characteristics and their variability for certain types of ensemble methods. Bootstrapping a model can require a huge amount of work if the training data set is large. Fortunately in many cases the technique lets us determine the effect of infinite resampling without actually refitting a single model. We apply the technique to the study of meta-parameter selection for random forests. We demonstrate that alternatives to bootstrap aggregation and to considering \sqrt{d} features to split each node, where d is the number of features, can produce improvements in predictive accuracy.
△ Less
Submitted 24 October, 2017;
originally announced October 2017.
-
The energy landscape of a simple neural network
Authors:
Anthony Collins Gamst,
Alden Walker
Abstract:
We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models.
We explore the energy landscape of a simple neural network. In particular, we expand upon previous work demonstrating that the empirical complexity of fitted neural networks is vastly less than a naive parameter count would suggest and that this implicit regularization is actually beneficial for generalization from fitted models.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
The empirical size of trained neural networks
Authors:
Kevin K. Chen,
Anthony Gamst,
Alden Walker
Abstract:
ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count wo…
▽ More
ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count would suggest and can impart odd features to the trained network. However, we also show the forced simplicity is beneficial and, indeed, critical for the wide success of these networks.
△ Less
Submitted 28 November, 2016;
originally announced November 2016.
-
Regression of ranked responses when raw responses are censored
Authors:
Michael C. Donohue,
Anthony C. Gamst,
Robert A. Rissman,
Ian Abramson
Abstract:
We discuss semiparametric regression when only the ranks of responses are observed. The model is $Y_i = F (\mathbf{x}_i'{\boldsymbolβ}_0 + \varepsilon_i)$, where $Y_i$ is the unobserved response, $F$ is a monotone increasing function, $\mathbf{x}_i$ is a known $p-$vector of covariates, ${\boldsymbolβ}_0$ is an unknown $p$-vector of interest, and $\varepsilon_i$ is an error term independent of…
▽ More
We discuss semiparametric regression when only the ranks of responses are observed. The model is $Y_i = F (\mathbf{x}_i'{\boldsymbolβ}_0 + \varepsilon_i)$, where $Y_i$ is the unobserved response, $F$ is a monotone increasing function, $\mathbf{x}_i$ is a known $p-$vector of covariates, ${\boldsymbolβ}_0$ is an unknown $p$-vector of interest, and $\varepsilon_i$ is an error term independent of $\mathbf{x}_i$. We observe $\{(\mathbf{x}_i,R_n(Y_i)) : i = 1,\ldots ,n\}$, where $R_n$ is the ordinal rank function. We explore a novel estimator under Gaussian assumptions. We discuss the literature, apply the method to an Alzheimer's disease biomarker, conduct simulation studies, and prove consistency and asymptotic normality.
△ Less
Submitted 24 February, 2016;
originally announced February 2016.