Skip to main content

Showing 1–4 of 4 results for author: Massei, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2312.15295  [pdf, other

    stat.ML cs.LG math.OC

    AdamL: A fast adaptive gradient method incorporating loss function

    Authors: Lu Xia, Stefano Massei

    Abstract: Adaptive first-order optimizers are fundamental tools in deep learning, although they may suffer from poor generalization due to the nonuniform gradient scaling. In this work, we propose AdamL, a novel variant of the Adam optimizer, that takes into account the loss function information to attain better generalization results. We provide sufficient conditions that together with the Polyak-Lojasiewi… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  2. arXiv:2301.09511  [pdf, other

    stat.ML cs.LG math.NA math.OC

    On the Convergence of the Gradient Descent Method with Stochastic Fixed-point Rounding Errors under the Polyak-Lojasiewicz Inequality

    Authors: Lu Xia, Michiel E. Hochstenbach, Stefano Massei

    Abstract: When training neural networks with low-precision computation, rounding errors often cause stagnation or are detrimental to the convergence of the optimizers; in this paper we study the influence of rounding errors on the convergence of the gradient descent method for problems satisfying the Polyak-\Lojasiewicz inequality. Within this context, we show that, in contrast, biased stochastic rounding e… ▽ More

    Submitted 18 January, 2025; v1 submitted 23 January, 2023; originally announced January 2023.

  3. arXiv:2202.12276  [pdf, other

    cs.LG math.NA stat.ML

    On the influence of stochastic roundoff errors and their bias on the convergence of the gradient descent method with low-precision floating-point computation

    Authors: Lu Xia, Stefano Massei, Michiel E. Hochstenbach, Barry Koren

    Abstract: When implementing the gradient descent method in low precision, the employment of stochastic rounding schemes helps to prevent stagnation of convergence caused by the vanishing gradient effect. Unbiased stochastic rounding yields zero bias by preserving small updates with probabilities proportional to their relative magnitudes. This study provides a theoretical explanation for the stagnation of th… ▽ More

    Submitted 25 February, 2023; v1 submitted 24 February, 2022; originally announced February 2022.

  4. arXiv:2001.09187  [pdf, ps, other

    math.NA cs.LG stat.CO

    Certified and fast computations with shallow covariance kernels

    Authors: Daniel Kressner, Jonas Latz, Stefano Massei, Elisabeth Ullmann

    Abstract: Many techniques for data science and uncertainty quantification demand efficient tools to handle Gaussian random fields, which are defined in terms of their mean functions and covariance operators. Recently, parameterized Gaussian random fields have gained increased attention, due to their higher degree of flexibility. However, especially if the random field is parameterized through its covariance… ▽ More

    Submitted 12 November, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

    MSC Class: 62M40; 65R20; 65C20; 65D15; 65G20

    Journal ref: Foundations of Data Science 2(4): 487-512, 2020