Skip to main content

Showing 1–6 of 6 results for author: Golikov, E

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.06765  [pdf, other

    cs.LG cs.AI stat.ML

    A Generalization Bound for Nearly-Linear Networks

    Authors: Eugene Golikov

    Abstract: We consider nonlinear networks as perturbations of linear ones. Based on this approach, we present novel generalization bounds that become non-vacuous for networks that are close to being linear. The main advantage over the previous works which propose non-vacuous generalization bounds is that our bounds are a-priori: performing the actual training is not required for evaluating the bounds. To the… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 22 pages, 9 figures

  2. arXiv:2205.15809  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

    Authors: Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

    Abstract: We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  3. arXiv:2006.06574  [pdf, other

    cs.LG stat.ML

    Dynamically Stable Infinite-Width Limits of Neural Classifiers

    Authors: Eugene A. Golikov

    Abstract: Recent research has been focused on two different approaches to studying neural networks training in the limit of infinite width (1) a mean-field (MF) and (2) a constant neural tangent kernel (NTK) approximations. These two approaches have different scaling of hyperparameters with the width of a network layer and as a result, different infinite-width limit models. We propose a general framework to… ▽ More

    Submitted 22 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: 26 pages, 7 figures

  4. arXiv:2003.05884  [pdf, other

    stat.ML cs.LG

    Towards a General Theory of Infinite-Width Limits of Neural Classifiers

    Authors: Eugene A. Golikov

    Abstract: Obtaining theoretical guarantees for neural networks training appears to be a hard problem in a general case. Recent research has been focused on studying this problem in the limit of infinite width and two different theories have been developed: a mean-field (MF) and a constant kernel (NTK) limit theories. We propose a general framework that provides a link between these seemingly distinct theori… ▽ More

    Submitted 23 October, 2020; v1 submitted 12 March, 2020; originally announced March 2020.

    Comments: 27 pages, 7 figures, accepted to ICML'2020

  5. arXiv:1905.07187  [pdf, other

    cs.LG stat.ML

    An Essay on Optimization Mystery of Deep Learning

    Authors: Eugene Golikov

    Abstract: Despite the huge empirical success of deep learning, theoretical understanding of neural networks learning process is still lacking. This is the reason, why some of its features seem "mysterious". We emphasize two mysteries of deep learning: generalization mystery, and optimization mystery. In this essay we review and draw connections between several selected works concerning the latter.

    Submitted 17 May, 2019; originally announced May 2019.

  6. arXiv:1812.02769  [pdf, other

    cs.LG stat.ML

    Embedding-reparameterization procedure for manifold-valued latent variables in generative models

    Authors: Eugene Golikov, Maksim Kretov

    Abstract: Conventional prior for Variational Auto-Encoder (VAE) is a Gaussian distribution. Recent works demonstrated that choice of prior distribution affects learning capacity of VAE models. We propose a general technique (embedding-reparameterization procedure, or ER) for introducing arbitrary manifold-valued variables in VAE model. We compare our technique with a conventional VAE on a toy benchmark prob… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: Presented at Bayesian Deep Learning workshop (NeurIPS 2018)