Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

Ding, Tian; Li, Dawei; Sun, Ruoyu

Computer Science > Machine Learning

arXiv:1911.01413 (cs)

[Submitted on 4 Nov 2019 (v1), last revised 14 Nov 2020 (this version, v3)]

Title:Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

Authors:Tian Ding, Dawei Li, Ruoyu Sun

View PDF

Abstract:Does over-parameterization eliminate sub-optimal local minima for neural networks? An affirmative answer was given by a classical result in [59] for 1-hidden-layer wide neural networks. A few recent works have extended the setting to multi-layer neural networks, but none of them has proved every local minimum is global. Why is this result never extended to deep networks?
In this paper, we show that the task is impossible because the original result for 1-hidden-layer network in [59] can not hold. More specifically, we prove that for any multi-layer network with generic input data and non-linear activation functions, sub-optimal local minima can exist, no matter how wide the network is (as long as the last hidden layer has at least two neurons). While the result of [59] assumes sigmoid activation, our counter-example covers a large set of activation functions (dense in the set of continuous functions), indicating that the limitation is not due to the specific activation. Our result indicates that "no bad local-min" may be unable to explain the benefit of over-parameterization for training neural nets.

Comments:	58 pages. The main theorem is strengthened. An early version was submitted to Optimization Online on October 4, 2019
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1911.01413 [cs.LG]
	(or arXiv:1911.01413v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1911.01413

Submission history

From: Tian Ding [view email]
[v1] Mon, 4 Nov 2019 18:56:58 UTC (31 KB)
[v2] Wed, 26 Feb 2020 18:33:55 UTC (31 KB)
[v3] Sat, 14 Nov 2020 17:05:52 UTC (58 KB)

Computer Science > Machine Learning

Title:Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators