Search | arXiv e-print repository

Knots in random neural networks

Authors: Kevin K. Chen, Anthony C. Gamst, Alden K. Walker

Abstract: The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parame… ▽ More The weights of a neural network are typically initialized at random, and one can think of the functions produced by such a network as having been generated by a prior over some function space. Studying random networks, then, is useful for a Bayesian understanding of the network evolution in early stages of training. In particular, one can investigate why neural networks with huge numbers of parameters do not immediately overfit. We analyze the properties of random scalar-input feed-forward rectified linear unit architectures, which are random linear splines. With weights and biases sampled from certain common distributions, empirical tests show that the number of knots in the spline produced by the network is equal to the number of neurons, to very close approximation. We describe our progress towards a completely analytic explanation of this phenomenon. In particular, we show that random single-layer neural networks are equivalent to integrated random walks with variable step sizes. That each neuron produces one knot on average is equivalent to the associated integrated random walk having one zero crossing on average. We explore how properties of the integrated random walk, including the step sizes and initial conditions, affect the number of crossings. The number of knots in random neural networks can be related to the behavior of extreme learning machines, but it also establishes a prior preventing optimizers from immediately overfitting to noisy training data. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: Presented at the Workshop on Bayesian Deep Learning, NIPS 2016, Barcelona, Spain

arXiv:1611.09448 [pdf, other]

The Upper Bound on Knots in Neural Networks

Authors: Kevin K. Chen

Abstract: Neural networks with rectified linear unit activations are essentially multivariate linear splines. As such, one of many ways to measure the "complexity" or "expressivity" of a neural network is to count the number of knots in the spline model. We study the number of knots in fully-connected feedforward neural networks with rectified linear unit activation functions. We intentionally keep the neur… ▽ More Neural networks with rectified linear unit activations are essentially multivariate linear splines. As such, one of many ways to measure the "complexity" or "expressivity" of a neural network is to count the number of knots in the spline model. We study the number of knots in fully-connected feedforward neural networks with rectified linear unit activation functions. We intentionally keep the neural networks very simple, so as to make theoretical analyses more approachable. An induction on the number of layers $l$ reveals a tight upper bound on the number of knots in $\mathbb{R} \to \mathbb{R}^p$ deep neural networks. With $n_i \gg 1$ neurons in layer $i = 1, \dots, l$, the upper bound is approximately $n_1 \dots n_l$. We then show that the exact upper bound is tight, and we demonstrate the upper bound with an example. The purpose of these analyses is to pave a path for understanding the behavior of general $\mathbb{R}^q \to \mathbb{R}^p$ neural networks. △ Less

Submitted 29 November, 2016; v1 submitted 28 November, 2016; originally announced November 2016.

Comments: 19 pages, 8 figures

arXiv:1611.09444 [pdf, other]

The empirical size of trained neural networks

Authors: Kevin K. Chen, Anthony Gamst, Alden Walker

Abstract: ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count wo… ▽ More ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count would suggest and can impart odd features to the trained network. However, we also show the forced simplicity is beneficial and, indeed, critical for the wide success of these networks. △ Less

Submitted 28 November, 2016; originally announced November 2016.

Comments: 6 pages, 5 figures

Showing 1–3 of 3 results for author: Chen, K K