Empirical study towards understanding line search approximations for training neural networks

Chae, Younghwan; Wilke, Daniel N.

Abstract:Choosing appropriate step sizes is critical for reducing the computational cost of training large-scale neural network models. Mini-batch sub-sampling (MBSS) is often employed for computational tractability. However, MBSS introduces a sampling error, that can manifest as a bias or variance in a line search. This is because MBSS can be performed statically, where the mini-batch is updated only when the search direction changes, or dynamically, where the mini-batch is updated every-time the function is evaluated. Static MBSS results in a smooth loss function along a search direction, reflecting low variance but large bias in the estimated "true" (or full batch) minimum. Conversely, dynamic MBSS results in a point-wise discontinuous function, with computable gradients using backpropagation, along a search direction, reflecting high variance but lower bias in the estimated "true" (or full batch) minimum. In this study, quadratic line search approximations are considered to study the quality of function and derivative information to construct approximations for dynamic MBSS loss functions. An empirical study is conducted where function and derivative information are enforced in various ways for the quadratic approximations. The results for various neural network problems show that being selective on what information is enforced helps to reduce the variance of predicted step sizes.

Comments:	30 pages, 20 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
MSC classes:	90C15, 90C59, 90C30, 90C26, 90C56
Cite as:	arXiv:1909.06893 [stat.ML]
	(or arXiv:1909.06893v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1909.06893

Statistics > Machine Learning

Title:Empirical study towards understanding line search approximations for training neural networks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators