Skip to main content

Showing 1–50 of 57 results for author: Rudi, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2507.05478  [pdf, ps, other

    cs.LG stat.ML

    Dynamic Regret Reduces to Kernelized Static Regret

    Authors: Andrew Jacobsen, Alessandro Rudi, Francesco Orabona, Nicolo Cesa-Bianchi

    Abstract: We study dynamic regret in online convex optimization, where the objective is to achieve low cumulative loss relative to an arbitrary benchmark sequence. By observing that competing with an arbitrary sequence of comparators $u_{1},\ldots,u_{T}$ in $\mathcal{W}\subseteq\mathbb{R}^{d}$ is equivalent to competing with a fixed comparator function $u:[1,T]\to \mathcal{W}$, we frame dynamic regret minim… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Comments: 38 pages, 2 figures

  2. arXiv:2507.05306  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Enjoying Non-linearity in Multinomial Logistic Bandits

    Authors: Pierre Boudart, Pierre Gaillard, Alessandro Rudi

    Abstract: We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment by selecting actions to maximize expected rewards based on probabilistic feedback from multiple possible outcomes. In the binary setting, recent work has focused on understanding the impact of the non-linearity of the logistic model (Faury et al., 2020; Abeille… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  3. arXiv:2506.02754  [pdf, ps, other

    stat.ML cs.LG

    Safely Learning Controlled Stochastic Dynamics

    Authors: Luc Brogat-Motte, Alessandro Rudi, Riccardo Bonalli

    Abstract: We address the problem of safely learning controlled stochastic dynamics from discrete-time trajectory observations, ensuring system trajectories remain within predefined safe regions during both training and deployment. Safety-critical constraints of this kind are crucial in applications such as autonomous robotics, finance, and biomedicine. We introduce a method that ensures safe exploration and… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Under review at NeurIPS 2025

  4. arXiv:2411.01982  [pdf, other

    stat.ML cs.LG

    Learning Controlled Stochastic Differential Equations

    Authors: Luc Brogat-Motte, Riccardo Bonalli, Alessandro Rudi

    Abstract: Identification of nonlinear dynamical systems is crucial across various fields, facilitating tasks such as control, prediction, optimization, and fault detection. Many applications require methods capable of handling complex systems while providing strong learning guarantees for safe and reliable performance. However, existing approaches often focus on simplified scenarios, such as deterministic m… ▽ More

    Submitted 4 November, 2024; originally announced November 2024.

    MSC Class: 68Q32; 60H10; 93B30

  5. arXiv:2407.17200  [pdf, ps, other

    stat.ML cs.LG math.OC stat.ME

    Generalization Bounds of Surrogate Policies for Combinatorial Optimization Problems

    Authors: Pierre-Cyril Aubin-Frankowski, Yohann De Castro, Axel Parmentier, Alessandro Rudi

    Abstract: A recent stream of structured learning approaches has improved the practical state of the art for a range of combinatorial optimization problems with complex objectives encountered in operations research. Such approaches train policies that chain a statistical model with a surrogate combinatorial optimization oracle to map any instance of the problem to a feasible solution. The key idea is to expl… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 10 pages main document, 3 pages supplement

  6. arXiv:2406.12366  [pdf, ps, other

    cs.LG math.ST stat.ML

    Structured Prediction in Online Learning

    Authors: Pierre Boudart, Alessandro Rudi, Pierre Gaillard

    Abstract: We study a theoretical and algorithmic framework for structured prediction in the online learning setting. The problem of structured prediction, i.e. estimating function where the output space lacks a vectorial structure, is well studied in the literature of supervised statistical learning. We show that our algorithm is a generalisation of optimal algorithms from the supervised learning setting, a… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages

  7. arXiv:2402.09796  [pdf, ps, other

    stat.ML cs.LG cs.RO

    Closed-form Filtering for Non-linear Systems

    Authors: Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

    Abstract: Sequential Bayesian Filtering aims to estimate the current state distribution of a Hidden Markov Model, given the past observations. The problem is well-known to be intractable for most application domains, except in notable cases such as the tabular setting or for linear dynamical systems with gaussian noise. In this work, we propose a new class of filters based on Gaussian PSD Models, which offe… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 38 pages

  8. arXiv:2303.17109  [pdf, ps, other

    stat.ML cs.LG

    Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models

    Authors: Anant Raj, Umut Şimşekli, Alessandro Rudi

    Abstract: This paper deals with the problem of efficient sampling from a stochastic differential equation, given the drift function and the diffusion matrix. The proposed approach leverages a recent model for probabilities \cite{rudi2021psd} (the positive semi-definite -- PSD model) from which it is possible to obtain independent and identically distributed (i.i.d.) samples at precision $\varepsilon$ with a… ▽ More

    Submitted 24 May, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

  9. arXiv:2211.08958  [pdf, other

    stat.ML cs.LG

    Vector-Valued Least-Squares Regression under Output Regularity Assumptions

    Authors: Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

    Abstract: We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output. We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method. Our analysis extends the interest of reduced-rank regression beyond the standard low-rank setting to more general output regularity… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

  10. arXiv:2205.13255  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Active Labeling: Streaming Stochastic Gradients

    Authors: Vivien Cabannes, Francis Bach, Vianney Perchet, Alessandro Rudi

    Abstract: The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the "active labeling" problem, which focuses on active learning… ▽ More

    Submitted 7 December, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: 38 pages (9 main pages), 9 figures

    MSC Class: 68T37 ACM Class: G.3

  11. arXiv:2202.13733  [pdf, other

    stat.ML cs.LG math.OC

    On the Benefits of Large Learning Rates for Kernel Methods

    Authors: Gaspard Beugnot, Julien Mairal, Alessandro Rudi

    Abstract: This paper studies an intriguing phenomenon related to the good generalization performance of estimators obtained by using large learning rates within gradient descent algorithms. First observed in the deep learning literature, we show that a phenomenon can be precisely characterized in the context of kernel methods, even though the resulting optimization problem is convex. Specifically, we consid… ▽ More

    Submitted 3 June, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Accepted paper at Conference COLT 2022. To be published to Proceedings of Machine Learning Research (PMLR)

  12. arXiv:2202.05614  [pdf, other

    stat.ML cs.LG

    Measuring dissimilarity with diffeomorphism invariance

    Authors: Théophile Cantelobre, Carlo Ciliberto, Benjamin Guedj, Alessandro Rudi

    Abstract: Measures of similarity (or dissimilarity) are a key ingredient to many machine learning algorithms. We introduce DID, a pairwise dissimilarity measure applicable to a wide range of data spaces, which leverages the data's internal structure to be invariant to diffeomorphisms. We prove that DID enjoys properties which make it relevant for theoretical study and practical use. By representing each dat… ▽ More

    Submitted 7 March, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: A pre-print

  13. arXiv:2201.13055  [pdf

    stat.ML cs.LG

    Nyström Kernel Mean Embeddings

    Authors: Antoine Chatalic, Nicolas Schreuder, Alessandro Rudi, Lorenzo Rosasco

    Abstract: Kernel mean embeddings are a powerful tool to represent probability distributions over arbitrary spaces as single points in a Hilbert space. Yet, the cost of computing and storing such embeddings prohibits their direct use in large-scale settings. We propose an efficient approximation procedure based on the Nyström method, which exploits a small random subset of the dataset. Our main result is an… ▽ More

    Submitted 15 June, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: 8 pages

    Journal ref: ICML 2022

  14. arXiv:2112.01907  [pdf, other

    stat.ML cs.LG math.ST

    Near-optimal estimation of smooth transport maps with kernel sums-of-squares

    Authors: Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

    Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  15. arXiv:2111.11306  [pdf, other

    stat.ML cs.LG

    Learning PSD-valued functions using kernel sums-of-squares

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kern… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

  16. arXiv:2110.03960  [pdf, other

    cs.LG math.ST stat.ML

    Mixability made efficient: Fast online multiclass logistic regression

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of $O(\log(Bn))$ whereas Online Newton Step achieves… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  17. arXiv:2106.16116  [pdf, ps, other

    cs.LG math.ST stat.ML

    PSD Representations for Effective Probability Models

    Authors: Alessandro Rudi, Carlo Ciliberto

    Abstract: Finding a good way to model probability densities is key to probabilistic inference. An ideal model should be able to concisely approximate any probability while being also compatible with two main operations: multiplications of two models (product rule) and marginalization with respect to a subset of the random variables (sum rule). In this work, we show that a recently proposed class of positive… ▽ More

    Submitted 24 November, 2021; v1 submitted 30 June, 2021; originally announced June 2021.

    Comments: 50 pages, 1 table

  18. arXiv:2106.09994  [pdf, other

    cs.LG stat.ML

    A Note on Optimizing Distributions using Kernel Mean Embeddings

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low… ▽ More

    Submitted 27 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  19. arXiv:2106.08855  [pdf, other

    cs.LG stat.ML

    Beyond Tikhonov: Faster Learning with Self-Concordant Losses via Iterative Regularization

    Authors: Gaspard Beugnot, Julien Mairal, Alessandro Rudi

    Abstract: The theory of spectral filtering is a remarkable tool to understand the statistical properties of learning with kernels. For least squares, it allows to derive various regularization schemes that yield faster convergence rates of the excess risk than with Tikhonov regularization. This is typically achieved by leveraging classical assumptions called source and capacity conditions, which characteriz… ▽ More

    Submitted 10 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: To be published in NeurIPS 2021

  20. arXiv:2105.15069  [pdf, other

    cs.LG stat.ML

    On the Consistency of Max-Margin Losses

    Authors: Alex Nowak-Vila, Alessandro Rudi, Francis Bach

    Abstract: The foundational concept of Max-Margin in machine learning is ill-posed for output spaces with more than two labels such as in structured prediction. In this paper, we show that the Max-Margin loss can only be consistent to the classification task under highly restrictive assumptions on the discrete loss measuring the error between outputs. These conditions are satisfied by distances defined in tr… ▽ More

    Submitted 21 March, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

  21. arXiv:2102.03594  [pdf, other

    math.ST cs.LG stat.ML

    Online nonparametric regression with Sobolev kernels

    Authors: Oleksandr Zadorozhnyi, Pierre Gaillard, Sebastien Gerschinovitz, Alessandro Rudi

    Abstract: In this work we investigate the variation of the online kernelized ridge regression algorithm in the setting of $d-$dimensional adversarial nonparametric regression. We derive the regret upper bounds on the classes of Sobolev spaces $W_{p}^β(\mathcal{X})$, $p\geq 2, β>\frac{d}{p}$. The upper bounds are supported by the minimax regret analysis, which reveals that in the cases $β> \frac{d}{2}$ or… ▽ More

    Submitted 13 July, 2021; v1 submitted 6 February, 2021; originally announced February 2021.

    Comments: 40 pages, 5 figures, 3 tables (version 2)

  22. arXiv:2102.02789  [pdf, other

    cs.LG cs.AI stat.ML

    Disambiguation of weak supervision with exponential convergence rates

    Authors: Vivien Cabannes, Francis Bach, Alessandro Rudi

    Abstract: Machine learning approached through supervised learning requires expensive annotation of data. This motivates weakly supervised learning, where data are annotated with incomplete yet discriminative information. In this paper, we focus on partial labelling, an instance of weak supervision where, from a given input, we are given a set of potential targets. We review a disambiguation principle to rec… ▽ More

    Submitted 15 July, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages; 6 figures

    MSC Class: 68Q32 ACM Class: I.2.6; G.3; F.2.2

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  23. arXiv:2102.00760  [pdf, ps, other

    stat.ML cs.AI cs.LG math.ST

    Fast rates in structured prediction

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Discrete supervised learning problems such as classification are often tackled by introducing a continuous surrogate problem akin to regression. Bounding the original error, between estimate and solution, by the surrogate error endows discrete problems with convergence rates already shown for continuous instances. Yet, current approaches do not leverage the fact that discrete problems are essentia… ▽ More

    Submitted 15 July, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: 14 main pages, 3 main figures, 43 pages, 4 figures (with appendix)

    MSC Class: 68T05 ACM Class: I.2.6; F.2.2; G.3

    Journal ref: Conference on Learning Theory, PMLR 134, 2021

  24. arXiv:2012.11978  [pdf, ps, other

    math.OC cs.LG stat.ML

    Finding Global Minima via Kernel Approximations

    Authors: Alessandro Rudi, Ulysse Marteau-Ferey, Francis Bach

    Abstract: We consider the global minimization of smooth functions based solely on function evaluations. Algorithms that achieve the optimal number of function evaluations for a given precision level typically rely on explicitly constructing an approximation of the function which is then minimized with algorithms that have exponential running-time complexity. In this paper, we consider an approach that joint… ▽ More

    Submitted 22 December, 2020; originally announced December 2020.

  25. arXiv:2009.04324  [pdf, other

    stat.ML cs.LG

    Overcoming the curse of dimensionality with Laplacian regularization in semi-supervised learning

    Authors: Vivien Cabannes, Loucas Pillaud-Vivien, Francis Bach, Alessandro Rudi

    Abstract: As annotations of data can be scarce in large-scale practical problems, leveraging unlabelled examples is one of the most important aspects of machine learning. This is the aim of semi-supervised learning. To benefit from the access to unlabelled data, it is natural to diffuse smoothly knowledge of labelled data to unlabelled one. This induces to the use of Laplacian regularization. Yet, current i… ▽ More

    Submitted 29 November, 2021; v1 submitted 9 September, 2020; originally announced September 2020.

    Comments: 38 pages, 6 figures

    Journal ref: NeurIPS 2021

  26. arXiv:2007.14703  [pdf, other

    stat.ML cs.LG

    Learning Output Embeddings in Structured Prediction

    Authors: Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

    Abstract: A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss… ▽ More

    Submitted 2 November, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

  27. arXiv:2007.01012  [pdf, other

    cs.LG stat.ML

    Consistent Structured Prediction with Max-Min Margin Markov Networks

    Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi

    Abstract: Max-margin methods for binary classification such as the support vector machine (SVM) have been extended to the structured prediction setting under the name of max-margin Markov networks ($M^3N$), or more generally structural SVMs. Unfortunately, these methods are statistically inconsistent when the relationship between inputs and labels is far from deterministic. We overcome such limitations by d… ▽ More

    Submitted 27 July, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

  28. arXiv:2006.10350  [pdf, other

    cs.LG stat.ML

    Kernel methods through the roof: handling billions of points efficiently

    Authors: Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi

    Abstract: Kernel methods provide an elegant and principled approach to nonparametric learning, but so far could hardly be used in large scale problems, since naïve implementations scale poorly with data size. Recent advances have shown the benefits of a number of algorithmic ideas, for example combining optimization, numerical linear algebra and random projections. Here, we push these efforts further to dev… ▽ More

    Submitted 26 November, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: 33 pages, 7 figures, NeurIPS 2020

  29. arXiv:2006.09984   

    stat.ML cs.LG math.NA

    Interpolation and Learning with Scale Dependent Kernels

    Authors: Nicolò Pagliana, Alessandro Rudi, Ernesto De Vito, Lorenzo Rosasco

    Abstract: We study the learning properties of nonparametric ridge-less least squares. In particular, we consider the common case of estimators defined by scale dependent kernels, and focus on the role of the scale. These estimators interpolate the data and the scale can be shown to control their stability through the condition number. Our analysis shows that are different regimes depending on the interplay… ▽ More

    Submitted 10 November, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

    Comments: The paper is not completed and contains parts which need to be modified

  30. arXiv:2006.09261  [pdf, other

    cs.LG cs.CV stat.ML

    Structured and Localized Image Restoration

    Authors: Thomas Eboli, Alex Nowak-Vila, Jian Sun, Francis Bach, Jean Ponce, Alessandro Rudi

    Abstract: We present a novel approach to image restoration that leverages ideas from localized structured prediction and non-linear multi-task learning. We optimize a penalized energy function regularized by a sum of terms measuring the distance between patches to be restored and clean patches from an external database gathered beforehand. The resulting estimator comes with strong statistical guarantees lev… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  31. arXiv:2003.08109  [pdf, other

    cs.LG math.ST stat.ML

    Efficient improper learning for online logistic regression

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential c… ▽ More

    Submitted 3 November, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

    Journal ref: Conference on Learning Theory 2020, Jul 2020, Graz, Austria

  32. arXiv:2003.00920  [pdf, other

    cs.LG cs.AI stat.ML

    Structured Prediction with Partial Labelling through the Infimum Loss

    Authors: Vivien Cabannes, Alessandro Rudi, Francis Bach

    Abstract: Annotating datasets is one of the main costs in nowadays supervised learning. The goal of weak supervision is to enable models to learn using only forms of labelling which are cheaper to collect, as partial labelling. This is a type of incomplete annotation where, for each datapoint, supervision is cast as a set of labels containing the real one. The problem of supervised learning with partial lab… ▽ More

    Submitted 9 September, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: 8 pages for main paper, 27 with main paper, 13 figures, 3 tables

    MSC Class: 68Q32 ACM Class: I.2.6; G.3

    Journal ref: Proceedings of the 37th International Conference on Machine Learning, PMLR 119:1230-1239, 2020

  33. arXiv:2002.05424  [pdf, ps, other

    stat.ML cs.LG math.ST

    A General Framework for Consistent Structured Prediction with Implicit Loss Embeddings

    Authors: Carlo Ciliberto, Lorenzo Rosasco, Alessandro Rudi

    Abstract: We propose and analyze a novel theoretical and algorithmic framework for structured prediction. While so far the term has referred to discrete output spaces, here we consider more general settings, such as manifolds or spaces of probability measures. We define structured prediction as a problem where the output space lacks a vectorial structure. We identify and study a large class of loss function… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: 53 pages

  34. arXiv:2001.10477  [pdf, ps, other

    quant-ph cs.LG stat.ML

    Statistical Limits of Supervised Quantum Learning

    Authors: Carlo Ciliberto, Andrea Rocchetto, Alessandro Rudi, Leonard Wossnig

    Abstract: Within the framework of statistical learning theory it is possible to bound the minimum number of samples required by a learner to reach a target accuracy. We show that if the bound on the accuracy is taken into account, quantum machine learning algorithms for supervised learning---for which statistical guarantees are available---cannot achieve polylogarithmic runtimes in the input dimension. We c… ▽ More

    Submitted 29 October, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: v3: 6 pages, journal version, title changed (previous title "The Statistical Limits of Supervised Quantum Learning"), other minor improvements; v2: 6 pages, title changed (previous title "Fast quantum learning with statistical guarantees"), format changed to two-columns, typos corrected, remarks that better clarify the limitations of our analysis added

    Journal ref: Phys. Rev. A 102, 042414 (2020)

  35. arXiv:1907.05226  [pdf, other

    stat.ML cs.LG math.ST

    Gain with no Pain: Efficient Kernel-PCA by Nyström Sampling

    Authors: Nicholas Sterge, Bharath Sriperumbudur, Lorenzo Rosasco, Alessandro Rudi

    Abstract: In this paper, we propose and study a Nyström based approach to efficient large scale kernel principal component analysis (PCA). The latter is a natural nonlinear extension of classical PCA based on considering a nonlinear feature map or the corresponding kernel. Like other kernel approaches, kernel PCA enjoys good mathematical and statistical properties but, numerically, it scales poorly with the… ▽ More

    Submitted 11 July, 2019; originally announced July 2019.

    Comments: 19 pages, 2 figures

    MSC Class: 62H25; 62H12; 46E22

  36. arXiv:1907.01771  [pdf, other

    math.OC cs.AI cs.LG stat.ML

    Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

    Authors: Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi

    Abstract: In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a c… ▽ More

    Submitted 21 November, 2019; v1 submitted 3 July, 2019; originally announced July 2019.

    Journal ref: NeurIPS 2019 - Conference on Neural Information Processing Systems, Dec 2019, Vancouver, Canada

  37. arXiv:1902.09917  [pdf, other

    stat.ML cs.LG math.ST

    Efficient online learning with kernels for adversarial large scale problems

    Authors: Rémi Jézéquel, Pierre Gaillard, Alessandro Rudi

    Abstract: We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression. Despite its simplicity, the algorithm we study is the first to achieve the optimal regret for a wide range of kernels with a per-round complexity of order… ▽ More

    Submitted 29 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

  38. arXiv:1902.03086  [pdf, ps, other

    math.ST math.PR stat.ML

    Affine Invariant Covariance Estimation for Heavy-Tailed Distributions

    Authors: Dmitrii Ostrovskii, Alessandro Rudi

    Abstract: In this work we provide an estimator for the covariance matrix of a heavy-tailed multivariate distributionWe prove that the proposed estimator $\widehat{\mathbf{S}}$ admits an \textit{affine-invariant} bound of the form \[(1-\varepsilon) \mathbf{S} \preccurlyeq \widehat{\mathbf{S}} \preccurlyeq (1+\varepsilon) \mathbf{S}\]in high probability, where $\mathbf{S}$ is the unknown covariance matrix, an… ▽ More

    Submitted 24 September, 2019; v1 submitted 8 February, 2019; originally announced February 2019.

    Journal ref: 32nd Annual Conference on Learning Theory (COLT), 2019, Jun 2019, Phoenix, United States

  39. arXiv:1902.01958  [pdf, other

    cs.LG cs.AI stat.ML

    A General Theory for Structured Prediction with Smooth Convex Surrogates

    Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi

    Abstract: In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e.g. logistic regression). The theory relies on a natural characterization of structural properties of the task loss and allows to derive statistical… ▽ More

    Submitted 13 February, 2019; v1 submitted 5 February, 2019; originally announced February 2019.

  40. arXiv:1812.05189  [pdf, other

    stat.ML cs.DS cs.LG math.OC

    Massively scalable Sinkhorn distances via the Nyström method

    Authors: Jason Altschuler, Francis Bach, Alessandro Rudi, Jonathan Niles-Weed

    Abstract: The Sinkhorn "distance", a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference. However, the time and memory requirements of standard algorithms for computing this distance grow quadratically with the size of the data, making them prohibitively expensive on massive data sets. In this work, we show that this… ▽ More

    Submitted 26 October, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

    Comments: to appear in NeurIPS 2019

    Journal ref: Advances in Neural Information Processing Systems 32 (NeurIPS 2019)

  41. arXiv:1810.13258  [pdf, other

    stat.ML cs.DS cs.LG

    On Fast Leverage Score Sampling and Optimal Learning

    Authors: Alessandro Rudi, Daniele Calandriello, Luigi Carratino, Lorenzo Rosasco

    Abstract: Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores sampling is a challenge in its own right requiring further approximations. In this paper, we study the problem of leverage score sampling for positive definite… ▽ More

    Submitted 24 January, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

  42. arXiv:1810.06839  [pdf, ps, other

    cs.LG cs.AI cs.CC math.ST stat.ML

    Sharp Analysis of Learning with Discrete Losses

    Authors: Alex Nowak-Vila, Francis Bach, Alessandro Rudi

    Abstract: The problem of devising learning strategies for discrete losses (e.g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss. In this paper we study a least-squares framework to systematically design learning algorithms for discrete losses, with quantitative characterizations in terms of statistical and computational complexity. In particular we… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

  43. arXiv:1807.06343  [pdf, other

    stat.ML cs.LG

    Learning with SGD and Random Features

    Authors: Luigi Carratino, Alessandro Rudi, Lorenzo Rosasco

    Abstract: Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching… ▽ More

    Submitted 24 January, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

  44. arXiv:1806.09908  [pdf, other

    stat.ML cs.LG

    Manifold Structured Prediction

    Authors: Alessandro Rudi, Carlo Ciliberto, Gian Maria Marconi, Lorenzo Rosasco

    Abstract: Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure. While classical approaches consider finite, albeit potentially huge, output spaces, in this paper we discuss how structured prediction can be extended to a continuous scenario. Specifically, we study a structured prediction approach to manifold valued regression.… ▽ More

    Submitted 26 June, 2018; originally announced June 2018.

  45. arXiv:1806.02402  [pdf, other

    stat.ML cs.LG

    Localized Structured Prediction

    Authors: Carlo Ciliberto, Francis Bach, Alessandro Rudi

    Abstract: Key to structured prediction is exploiting the problem structure to simplify the learning process. A major challenge arises when data exhibit a local structure (e.g., are made by "parts") that can be leveraged to better approximate the relation between (parts of) the input and (parts of) the output. Recent literature on signal processing, and in particular computer vision, has shown that capturing… ▽ More

    Submitted 30 May, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

    Comments: 53 pages, 7 figures, 1 algorithm

  46. arXiv:1805.11897  [pdf, other

    stat.ML cs.LG

    Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

    Authors: Giulia Luise, Alessandro Rudi, Massimiliano Pontil, Carlo Ciliberto

    Abstract: Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distan… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: 26 pages, 4 figures

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS), Dec 2018, Montréal, Canada

  47. arXiv:1805.10074  [pdf, other

    cs.LG math.OC math.ST stat.ML

    Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

    Authors: Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

    Abstract: We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, w… ▽ More

    Submitted 23 November, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Journal ref: Neural Information Processing Systems (NIPS), Dec 2018, Montr{é}al, Canada. 2018

  48. arXiv:1804.02484  [pdf, other

    quant-ph cs.DS cs.LG stat.ML

    Approximating Hamiltonian dynamics with the Nyström method

    Authors: Alessandro Rudi, Leonard Wossnig, Carlo Ciliberto, Andrea Rocchetto, Massimiliano Pontil, Simone Severini

    Abstract: Simulating the time-evolution of quantum mechanical systems is BQP-hard and expected to be one of the foremost applications of quantum computers. We consider classical algorithms for the approximation of Hamiltonian dynamics using subsampling methods from randomized numerical linear algebra. We derive a simulation technique whose runtime scales polynomially in the number of qubits and the Frobeniu… ▽ More

    Submitted 17 February, 2020; v1 submitted 6 April, 2018; originally announced April 2018.

    Comments: v2: 22 pages, fixed typos in Eq.27 and 28 + other minor changes to the presentation of the results; v3 final version accepted to Quantum; v4 DOIs added in order to comply with Quantum requirements

    Journal ref: Quantum 4, 234 (2020)

  49. arXiv:1801.06720  [pdf, ps, other

    stat.ML cs.LG math.FA

    Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

    Authors: Junhong Lin, Alessandro Rudi, Lorenzo Rosasco, Volkan Cevher

    Abstract: In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms f… ▽ More

    Submitted 15 July, 2022; v1 submitted 20 January, 2018; originally announced January 2018.

    Comments: Updating acknowledgments; Journal version

    Journal ref: Applied and Computational Harmonic Analysis 48 (2020) 868-890

  50. arXiv:1712.04755  [pdf, other

    cs.LG stat.ML

    Exponential convergence of testing error for stochastic gradient methods

    Authors: Loucas Pillaud-Vivien, Alessandro Rudi, Francis Bach

    Abstract: We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise condition… ▽ More

    Submitted 20 November, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

    Journal ref: Conference on Learning Theory (COLT), Jul 2018, Stockholm, Sweden