Why bigger is not always better: on finite and infinite neural networks

Aitchison, Laurence

Statistics > Machine Learning

arXiv:1910.08013 (stat)

[Submitted on 17 Oct 2019 (v1), last revised 24 Jun 2020 (this version, v3)]

Title:Why bigger is not always better: on finite and infinite neural networks

Authors:Laurence Aitchison

View PDF

Abstract:Recent work has argued that neural networks can be understood theoretically by taking the number of channels to infinity, at which point the outputs become Gaussian process (GP) distributed. However, we note that infinite Bayesian neural networks lack a key facet of the behaviour of real neural networks: the fixed kernel, determined only by network hyperparameters, implies that they cannot do any form of representation learning. The lack of representation or equivalently kernel learning leads to less flexibility and hence worse performance, giving a potential explanation for the inferior performance of infinite networks observed in the literature (e.g. Novak et al. 2019). We give analytic results characterising the prior over representations and representation learning in finite deep linear networks. We show empirically that the representations in SOTA architectures such as ResNets trained with SGD are much closer to those suggested by our deep linear results than by the corresponding infinite network. This motivates the introduction of a new class of network: infinite networks with bottlenecks, which inherit the theoretical tractability of infinite networks while at the same time allowing representation learning.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1910.08013 [stat.ML]
	(or arXiv:1910.08013v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1910.08013
Journal reference:	ICML 2020

Submission history

From: Laurence Aitchison [view email]
[v1] Thu, 17 Oct 2019 16:33:34 UTC (535 KB)
[v2] Sun, 10 Nov 2019 11:51:36 UTC (534 KB)
[v3] Wed, 24 Jun 2020 08:53:07 UTC (545 KB)

Statistics > Machine Learning

Title:Why bigger is not always better: on finite and infinite neural networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Why bigger is not always better: on finite and infinite neural networks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators