Search | arXiv e-print repository

Regularized least squares learning with heavy-tailed noise is minimax optimal

Authors: Mattes Mollenhauer, Nicole Mücke, Dimitri Meunier, Arthur Gretton

Abstract: This paper examines the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been… ▽ More This paper examines the performance of ridge regression in reproducing kernel Hilbert spaces in the presence of noise that exhibits a finite number of higher moments. We establish excess risk bounds consisting of subgaussian and polynomial terms based on the well known integral operator framework. The dominant subgaussian component allows to achieve convergence rates that have previously only been derived under subexponential noise - a prevalent assumption in related work from the last two decades. These rates are optimal under standard eigenvalue decay conditions, demonstrating the asymptotic robustness of regularized least squares against heavy-tailed noise. Our derivations are based on a Fuk-Nagaev inequality for Hilbert-space valued random variables. △ Less

Submitted 22 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

Comments: 32 pages, 1 figure

MSC Class: 62G08 (Primary) 62G35; 62J07 (Secondary)

arXiv:2306.11404 [pdf, ps, other]

On the concentration of subgaussian vectors and positive quadratic forms in Hilbert spaces

Authors: Mattes Mollenhauer, Claudia Schillings

Abstract: In these notes, we investigate the tail behaviour of the norm of subgaussian vectors in a Hilbert space. The subgaussian variance proxy is given as a trace class operator, allowing for a precise control of the moments along each dimension of the space. This leads to useful extensions and analogues of known Hoeffding-type inequalities and deviation bounds for positive random quadratic forms. We giv… ▽ More In these notes, we investigate the tail behaviour of the norm of subgaussian vectors in a Hilbert space. The subgaussian variance proxy is given as a trace class operator, allowing for a precise control of the moments along each dimension of the space. This leads to useful extensions and analogues of known Hoeffding-type inequalities and deviation bounds for positive random quadratic forms. We give a straightforward application in terms of a variance bound for the regularisation of statistical inverse problems. △ Less

Submitted 3 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

MSC Class: 60E15; 66G15; 60G50; 46N30

arXiv:2211.08875 [pdf, ps, other]

Learning linear operators: Infinite-dimensional regression as a well-behaved non-compact inverse problem

Authors: Mattes Mollenhauer, Nicole Mücke, T. J. Sullivan

Abstract: We consider the problem of learning a linear operator $θ$ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for $θ$ with the feature that its forward operator is generally non-compact (even if $θ$ is assumed to be compact or of $p$-Schatten class). However,… ▽ More We consider the problem of learning a linear operator $θ$ between two Hilbert spaces from empirical observations, which we interpret as least squares regression in infinite dimensions. We show that this goal can be reformulated as an inverse problem for $θ$ with the feature that its forward operator is generally non-compact (even if $θ$ is assumed to be compact or of $p$-Schatten class). However, we prove that, in terms of spectral properties and regularisation theory, this inverse problem is equivalent to the known compact inverse problem associated with scalar response regression. Our framework allows for the elegant derivation of dimension-free rates for generic learning algorithms under Hölder-type source conditions. The proofs rely on the combination of techniques from kernel regression with recent results on concentration of measure for sub-exponential Hilbertian random variables. The obtained rates hold for a variety of practically-relevant scenarios in functional regression as well as nonlinear regression with operator-valued kernels and match those of classical kernel regression with scalar response. △ Less

Submitted 10 July, 2024; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 40 pages, 1 figure

MSC Class: 62J05; 65J22; 47A52; 47A68

arXiv:2107.10158 [pdf, other]

Optimal Reaction Coordinates: Variational Characterization and Sparse Computation

Authors: Andreas Bittracher, Mattes Mollenhauer, Péter Koltai, Christof Schütte

Abstract: Reaction Coordinates (RCs) are indicators of hidden, low-dimensional mechanisms that govern the long-term behavior of high-dimensional stochastic processes. We present a novel and general variational characterization of optimal RCs and provide conditions for their existence. Optimal RCs are minimizers of a certain loss function and reduced models based on them guarantee very good approximation of… ▽ More Reaction Coordinates (RCs) are indicators of hidden, low-dimensional mechanisms that govern the long-term behavior of high-dimensional stochastic processes. We present a novel and general variational characterization of optimal RCs and provide conditions for their existence. Optimal RCs are minimizers of a certain loss function and reduced models based on them guarantee very good approximation of the long-term dynamics of the original high-dimensional process. We show that, for slow-fast systems, metastable systems, and other systems with known good RCs, the novel theory reproduces previous insight. Remarkably, the numerical effort required to evaluate the loss function scales only with the complexity of the underlying, low-dimensional mechanism, and not with that of the full system. The theory provided lays the foundation for an efficient and data-sparse computation of RCs via modern machine learning techniques. △ Less

Submitted 22 September, 2021; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2012.12917 [pdf, ps, other]

Nonparametric approximation of conditional expectation operators

Authors: Mattes Mollenhauer, Péter Koltai

Abstract: Given the joint distribution of two random variables $X,Y$ on some second countable locally compact Hausdorff space, we investigate the statistical approximation of the $L^2$-operator defined by $[Pf](x) := \mathbb{E}[ f(Y) \mid X = x ]$ under minimal assumptions. By modifying its domain, we prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on… ▽ More Given the joint distribution of two random variables $X,Y$ on some second countable locally compact Hausdorff space, we investigate the statistical approximation of the $L^2$-operator defined by $[Pf](x) := \mathbb{E}[ f(Y) \mid X = x ]$ under minimal assumptions. By modifying its domain, we prove that $P$ can be arbitrarily well approximated in operator norm by Hilbert-Schmidt operators acting on a reproducing kernel Hilbert space. This fact allows to estimate $P$ uniformly by finite-rank operators over a dense subspace even when $P$ is not compact. In terms of modes of convergence, we thereby obtain the superiority of kernel-based techniques over classically used parametric projection approaches such as Galerkin methods. This also provides a novel perspective on which limiting object the nonparametric estimate of $P$ converges to. As an application, we show that these results are particularly important for a large family of spectral analysis techniques for Markov transition operators. Our investigation also gives a new asymptotic perspective on the so-called kernel conditional mean embedding, which is the theoretical foundation of a wide variety of techniques in kernel-based nonparametric inference. △ Less

Submitted 5 August, 2023; v1 submitted 23 December, 2020; originally announced December 2020.

MSC Class: 46E22; 47A58; 46B28; 62J02; 62G05

arXiv:2004.00891 [pdf, ps, other]

Kernel Autocovariance Operators of Stationary Processes: Estimation and Convergence

Authors: Mattes Mollenhauer, Stefan Klus, Christof Schütte, Péter Koltai

Abstract: We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sampl… ▽ More We consider autocovariance operators of a stationary stochastic process on a Polish space that is embedded into a reproducing kernel Hilbert space. We investigate how empirical estimates of these operators converge along realizations of the process under various conditions. In particular, we examine ergodic and strongly mixing processes and obtain several asymptotic results as well as finite sample error bounds. We provide applications of our theory in terms of consistency results for kernel PCA with dependent data and the conditional mean embedding of transition probabilities. Finally, we use our approach to examine the nonparametric estimation of Markov transition operators and highlight how our theory can give a consistency analysis for a large family of spectral analysis methods including kernel-based dynamic mode decomposition. △ Less

Submitted 29 November, 2022; v1 submitted 2 April, 2020; originally announced April 2020.

Journal ref: Journal of Machine Learning Research 23(327) 1-34, 2022

arXiv:1905.11255 [pdf, other]

Kernel Conditional Density Operators

Authors: Ingmar Schuster, Mattes Mollenhauer, Stefan Klus, Krikamol Muandet

Abstract: We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean e… ▽ More We introduce a novel conditional density estimation model termed the conditional density operator (CDO). It naturally captures multivariate, multimodal output densities and shows performance that is competitive with recent neural conditional density models and Gaussian processes. The proposed model is based on a novel approach to the reconstruction of probability densities from their kernel mean embeddings by drawing connections to estimation of Radon-Nikodym derivatives in the reproducing kernel Hilbert space (RKHS). We prove finite sample bounds for the estimation error in a standard density reconstruction scenario, independent of problem dimensionality. Interestingly, when a kernel is used that is also a probability density, the CDO allows us to both evaluate and sample the output density efficiently. We demonstrate the versatility and performance of the proposed model on both synthetic and real-world data. △ Less

Submitted 29 October, 2019; v1 submitted 27 May, 2019; originally announced May 2019.

arXiv:1904.07752 [pdf, other]

doi 10.1063/1.5100267

Kernel methods for detecting coherent structures in dynamical data

Authors: Stefan Klus, Brooke E. Husic, Mattes Mollenhauer, Frank Noé

Abstract: We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space (RKHS) operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing… ▽ More We illustrate relationships between classical kernel-based dimensionality reduction techniques and eigendecompositions of empirical estimates of reproducing kernel Hilbert space (RKHS) operators associated with dynamical systems. In particular, we show that kernel canonical correlation analysis (CCA) can be interpreted in terms of kernel transfer operators and that it can be obtained by optimizing the variational approach for Markov processes (VAMP) score. As a result, we show that coherent sets of particle trajectories can be computed by kernel CCA. We demonstrate the efficiency of this approach with several examples, namely the well-known Bickley jet, ocean drifter data, and a molecular dynamics problem with a time-dependent potential. Finally, we propose a straightforward generalization of dynamic mode decomposition (DMD) called coherent mode decomposition (CMD). Our results provide a generic machine learning approach to the computation of coherent sets with an objective score that can be used for cross-validation and the comparison of different methods. △ Less

Submitted 7 October, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

arXiv:1807.09331 [pdf, other]

doi 10.1007/978-3-030-51264-4_5

Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces

Authors: Mattes Mollenhauer, Ingmar Schuster, Stefan Klus, Christof Schütte

Abstract: Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models… ▽ More Reproducing kernel Hilbert spaces (RKHSs) play an important role in many statistics and machine learning applications ranging from support vector machines to Gaussian processes and kernel embeddings of distributions. Operators acting on such spaces are, for instance, required to embed conditional probability distributions in order to implement the kernel Bayes rule and build sequential data models. It was recently shown that transfer operators such as the Perron-Frobenius or Koopman operator can also be approximated in a similar fashion using covariance and cross-covariance operators and that eigenfunctions of these operators can be obtained by solving associated matrix eigenvalue problems. The goal of this paper is to provide a solid functional analytic foundation for the eigenvalue decomposition of RKHS operators and to extend the approach to the singular value decomposition. The results are illustrated with simple guiding examples. △ Less

Submitted 16 March, 2020; v1 submitted 24 July, 2018; originally announced July 2018.

Showing 1–9 of 9 results for author: Mollenhauer, M