Skip to main content

Showing 1–13 of 13 results for author: Salzo, S

Searching in archive stat. Search in all archives.
.
  1. arXiv:2403.11687  [pdf, other

    stat.ML cs.LG math.OC

    Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: We study the problem of efficiently computing the derivative of the fixed-point of a parametric nondifferentiable contraction map. This problem has wide applications in machine learning, including hyperparameter optimization, meta-learning and data poisoning attacks. We analyze two popular approaches: iterative differentiation (ITD) and approximate implicit differentiation (AID). A key challenge b… ▽ More

    Submitted 4 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: ICML 2024. Code at github.com/prolearner/nonsmooth_implicit_diff

  2. arXiv:2208.08567  [pdf, other

    math.OC stat.ML

    High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise

    Authors: Daniela A. Parletta, Andrea Paudice, Massimiliano Pontil, Saverio Salzo

    Abstract: In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient e… ▽ More

    Submitted 14 April, 2024; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: 39 pages

    MSC Class: 90C25; 62L20

  3. arXiv:2202.03397  [pdf, other

    stat.ML cs.LG math.OC

    Bilevel Optimization with a Lower-level Contraction: Optimal Sample Complexity without Warm-start

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: We analyse a general class of bilevel problems, in which the upper-level problem consists in the minimization of a smooth objective function and the lower-level problem is to find the fixed point of a smooth contraction map. This type of problems include instances of meta-learning, equilibrium models, hyperparameter optimization and data poisoning adversarial attacks. Several recent works have pro… ▽ More

    Submitted 16 November, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Corrected Remark 18 + other small edits. Code at https://github.com/CSML-IIT-UCL/bioptexps

    Journal ref: Journal of Machine Learning Research, volume 24, number 167, pages 1-37, year 2023

  4. arXiv:2112.00838  [pdf, ps, other

    stat.ML cs.LG math.OC

    Convergence of Batch Greenkhorn for Regularized Multimarginal Optimal Transport

    Authors: Vladimir Kostic, Saverio Salzo, Massimilano Pontil

    Abstract: In this work we propose a batch version of the Greenkhorn algorithm for multimarginal regularized optimal transport problems. Our framework is general enough to cover, as particular cases, some existing algorithms like Sinkhorn and Greenkhorn algorithm for the bi-marginal setting, and (greedy) MultiSinkhorn for multimarginal optimal transport. We provide a complete convergence analysis, which is b… ▽ More

    Submitted 3 December, 2021; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: 30 pages

  5. arXiv:2011.07122  [pdf, other

    stat.ML cs.LG

    Convergence Properties of Stochastic Hypergradients

    Authors: Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

    Abstract: Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important wh… ▽ More

    Submitted 17 May, 2025; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: fixed a small mistake in the proof of Theorem 5.1

    Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021), PMLR 130:3826-3834

  6. arXiv:2006.16218  [pdf, other

    stat.ML cs.LG

    On the Iteration Complexity of Hypergradient Computation

    Authors: Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

    Abstract: We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or… ▽ More

    Submitted 10 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: accepted at ICML 2020; 19 pages, 4 figures; code at https://github.com/prolearner/hypertorch (corrected typos and one reference)

  7. arXiv:2003.10482  [pdf, other

    cs.LG cs.PF stat.ML

    Efficient Tensor Kernel methods for sparse regression

    Authors: Feliks Hibraj, Marcello Pelillo, Saverio Salzo, Massimiliano Pontil

    Abstract: Recently, classical kernel methods have been extended by the introduction of suitable tensor kernels so to promote sparsity in the solution of the underlying regression problem. Indeed, they solve an lp-norm regularization problem, with p=m/(m-1) and m even integer, which happens to be close to a lasso problem. However, a major drawback of the method is that storing tensors requires a considerable… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.

    Comments: M.Sc. Thesis introducing a novel layout to efficiently store symmetric tensor data

  8. arXiv:1905.13194  [pdf, other

    stat.ML cs.LG math.ST

    Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

    Authors: Giulia Luise, Saverio Salzo, Massimiliano Pontil, Carlo Ciliberto

    Abstract: We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 46 pages, 8 figures

  9. arXiv:1806.04941  [pdf, other

    cs.MS cs.LG stat.ML

    Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

    Authors: Luca Franceschi, Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo, Paolo Frasconi

    Abstract: In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize l… ▽ More

    Submitted 13 June, 2018; originally announced June 2018.

    Comments: This submission is a reduced version of (Franceschi et al., arXiv:1806.04910) which has been accepted at the main ICML 2018 conference. In this paper we illustrate the software framework, material that could not be included in the conference paper

  10. arXiv:1806.04910  [pdf, other

    stat.ML cs.LG

    Bilevel Programming for Hyperparameter Optimization and Meta-Learning

    Authors: Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimilano Pontil

    Abstract: We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised l… ▽ More

    Submitted 3 July, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: ICML 2018; code for replicating experiments at https://github.com/prolearner/hyper-representation, main package (Far-HO) at https://github.com/lucfra/FAR-HO

  11. Latent Variable Time-varying Network Inference

    Authors: Federico Tomasi, Veronica Tozzo, Saverio Salzo, Alessandro Verri

    Abstract: In many applications of finance, biology and sociology, complex systems involve entities interacting with each other. These processes have the peculiarity of evolving over time and of comprising latent factors, which influence the system without being explicitly measured. In this work we present latent variable time-varying graphical lasso (LTGL), a method for multivariate time-series graphical mo… ▽ More

    Submitted 2 August, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: 9 pages, 5 figures, 1 table

    Journal ref: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2018). ACM, New York, NY, USA, 2338-2346

  12. arXiv:1707.05609  [pdf, other

    stat.ML math.OC

    Solving $\ell^p\!$-norm regularization with tensor kernels

    Authors: Saverio Salzo, Johan A. K. Suykens, Lorenzo Rosasco

    Abstract: In this paper, we discuss how a suitable family of tensor kernels can be used to efficiently solve nonparametric extensions of $\ell^p$ regularized learning methods. Our main contribution is proposing a fast dual algorithm, and showing that it allows to solve the problem efficiently. Our results contrast recent findings suggesting kernel methods cannot be extended beyond Hilbert setting. Numerical… ▽ More

    Submitted 18 October, 2017; v1 submitted 18 July, 2017; originally announced July 2017.

  13. arXiv:1603.05876  [pdf, ps, other

    math.OC math.FA stat.ML

    Generalized support vector regression: duality and tensor-kernel representation

    Authors: Saverio Salzo, Johan A. K. Suykens

    Abstract: In this paper we study the variational problem associated to support vector regression in Banach function spaces. Using the Fenchel-Rockafellar duality theory, we give explicit formulation of the dual problem as well as of the related optimality conditions. Moreover, we provide a new computational framework for solving the problem which relies on a tensor-kernel representation. This analysis overc… ▽ More

    Submitted 5 May, 2017; v1 submitted 18 March, 2016; originally announced March 2016.

    MSC Class: 46E22; 46E15; 62G08; 65K10