Skip to main content

Showing 1–7 of 7 results for author: Hongler, C

Searching in archive stat. Search in all archives.
.
  1. arXiv:2205.15809  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity

    Authors: Arthur Jacot, Eugene Golikov, Clément Hongler, Franck Gabriel

    Abstract: We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the… ▽ More

    Submitted 13 October, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

  2. arXiv:2106.15933  [pdf, other

    stat.ML cs.LG

    Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity

    Authors: Arthur Jacot, François Ged, Berfin Şimşek, Clément Hongler, Franck Gabriel

    Abstract: The dynamics of Deep Linear Networks (DLNs) is dramatically affected by the variance $σ^2$ of the parameters at initialization $θ_0$. For DLNs of width $w$, we show a phase transition w.r.t. the scaling $γ$ of the variance $σ^2=w^{-γ}$ as $w\to\infty$: for large variance ($γ<1$), $θ_0$ is very close to a global minimum but far from any saddle point, and for small variance ($γ>1$), $θ_0$ is close t… ▽ More

    Submitted 31 January, 2022; v1 submitted 30 June, 2021; originally announced June 2021.

  3. arXiv:2006.09796  [pdf, other

    stat.ML cs.LG math.PR

    Kernel Alignment Risk Estimator: Risk Prediction from Training Data

    Authors: Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

    Abstract: We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $λ>0$ and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,λ}$ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  4. arXiv:2002.08404  [pdf, other

    stat.ML cs.LG

    Implicit Regularization of Random Feature Models

    Authors: Arthur Jacot, Berfin Şimşek, Francesco Spadaro, Clément Hongler, Franck Gabriel

    Abstract: Random Feature (RF) models are used as efficient parametric approximations of kernel methods. We investigate, by means of random matrix theory, the connection between Gaussian RF models and Kernel Ridge Regression (KRR). For a Gaussian RF model with $P$ features, $N$ data points, and a ridge $λ$, we show that the average (i.e. expected) RF predictor is close to a KRR predictor with an effective ri… ▽ More

    Submitted 23 September, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Journal ref: Proceedings of the International Conference on Machine Learning, 2020, pp. 7397-7406

  5. arXiv:1910.02875  [pdf, other

    cs.LG cs.NE stat.ML

    The asymptotic spectrum of the Hessian of DNN throughout training

    Authors: Arthur Jacot, Franck Gabriel, Clément Hongler

    Abstract: The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs. When the NTK is fixed during training, we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training. In the so-called mean-… ▽ More

    Submitted 10 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  6. arXiv:1907.05715  [pdf, other

    cs.LG stat.ML

    Order and Chaos: NTK views on DNN Normalization, Checkerboard and Boundary Artifacts

    Authors: Arthur Jacot, Franck Gabriel, François Ged, Clément Hongler

    Abstract: We analyze architectural features of Deep Neural Networks (DNNs) using the so-called Neural Tangent Kernel (NTK), which describes the training and generalization of DNNs in the infinite-width setting. In this setting, we show that for fully-connected DNNs, as the depth grows, two regimes appear: "order", where the (scaled) NTK converges to a constant, and "chaos", where it converges to a Kronecker… ▽ More

    Submitted 22 June, 2020; v1 submitted 11 July, 2019; originally announced July 2019.

  7. arXiv:1806.07572  [pdf, other

    cs.LG cs.NE math.PR stat.ML

    Neural Tangent Kernel: Convergence and Generalization in Neural Networks

    Authors: Arthur Jacot, Franck Gabriel, Clément Hongler

    Abstract: At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function $f_θ$ (which maps input vectors to output vectors) follows the kernel gradient… ▽ More

    Submitted 10 February, 2020; v1 submitted 20 June, 2018; originally announced June 2018.

    Journal ref: In Advances in neural information processing systems (pp. 8571-8580) 2018